Imagine you're Spotify and you have a stream of user-song-timestamp triplets per listen. You'll likely want to transform it into features such as: top genre per user in last 30 days. As a data scientist, you'll write your transformations to do so and run it yourself on something like Spark and store it on Redis for inference and S3 for training. You have to keep track of your versioning, jobs, and transformations. You also can't easily share them across data scientists.
Featureform's library allows you to define your transformations, feature, and training sets. It will interface with Spark, Redis, etc. on your behalf to achieve your desired state. It'll also keep track of all the metadata for you and easily make it share-able and re-usable.
I’m a huge climate person. I come from ancestors that laid down the early climate change work. But This methodology seems kind whacky. I think this kind of thing hurts our credibility in the long run.
When is the humidity a problem? If I have my thermostat set to 73 and the system is maintaining that through the day and night, will humidity be an issue? A variable speed system in a typically insulated home will likely be running at a very low level most of the day.
I can see it being a problem if I come back from vacation and change the thermostat from 85 to 73 and ask the A/C to get there ASAP, but even then the controller could know how much moisture is in the air and make sure the temperature drop is slow enough to deal with condensation.
This sounds really bad. Even if you’re their boss. You’re dealing with professionals and part of that is considering their input.
I’d start with the assumption that they know what they’re doing and might have had a good reason and phrase it from there. Even if not true it sets the right tone of respect.
It can use less resources and can be faster than passing data between processes. There's really no reason to use multiple processes for IO-bound workloads, either, and even for some IO-bound workloads, a thread pool can be faster than processes.
The second you aren't just passing raw bytes around, you have to take into consideration what can and can't be sent between processes in Python, as some objects can't be pickled and thus can't be passed between processes.
You can concurrently load a lot more coroutines than processes and threads, as well.
If your task is io bound, aka, lots of network stuff, multi process is overkill. Also, asyncio can handle a _lot_ more tasks 100,000's as opposed to 10s. So it really shines in heavy io things.
Also, multiprocessing can not share memory, and that can be a pretty busy g disadvantage depending on the task.
One reason is that you often don't need to use locks. Between lines containing await (or async for or async with), you can be sure that this task won't be pre-empted to run another async task.
Another reason, if you're using the Trio async library, is that managing and cancelling multiple tasks is really easy, and you can be sure that none get lost. This update to Python brings some of that to core asyncio (but I'll stick with Trio for now, thanks).
It is, and it's a subinterpreters thing. Node has separate interpreters that run in worker threads but also share memory, which is the route Python is planning on taking.
One reason is if you need to launch a subprocess with a timeout but don't want to use up CPU in the python script while that subprocess runs. The regular subprocess module will busy-loop in such cases, consuming CPU, while asyncio's does not.
The docs even warn about this for subprocess and suggest using asyncio to avoid it, although the docs are misleading - it only busy-loops if the timeout is not None, and only when running on Mac/Linux not Windows.
Multiprocessing provides parallelism up to what the machine supports, but no additional degree of concurrency, asyncio provides a fairly high degree of concurrency, but no parallelism.
Asyncio actually plays really nicely with multiprocessing, too. The concurrent.futures.ProcessPoolExecutor can handle running tasks in child processes and handles the communication seemlessly for you. I've used it quite a bit. Can easily use all 32 cores on my server this way without the GIL getting in the way.
Single-threaded execution of IO-bound work can be faster than breaking it up between threads or processes, and it can use a lot less resources. Then there are the preemptive vs cooperative multitasking concerns and the pros and cons of processes/threads vs light-weight threads/coroutines/etc.
Some IO-bound workloads are suited really well by the asyncio model, while other workloads might be better suited for processes and threads. They're three separate tools whose use cases might be similar, but they're not necessarily replacing one another. Multiple processes still have their place even while asyncio exists and vice versa.
Explicit (await) vs. implicit (anything that uses patched I/O deep down) switch. Essentially, it makes the reasoning about the code almost as hard as with preemptive threading.