Hacker Newsnew | past | comments | ask | show | jobs | submit | mrfusion's commentslogin

I’m not quite getting it. Would you be willing to explain it using a toy example?


Imagine you're Spotify and you have a stream of user-song-timestamp triplets per listen. You'll likely want to transform it into features such as: top genre per user in last 30 days. As a data scientist, you'll write your transformations to do so and run it yourself on something like Spark and store it on Redis for inference and S3 for training. You have to keep track of your versioning, jobs, and transformations. You also can't easily share them across data scientists.

Featureform's library allows you to define your transformations, feature, and training sets. It will interface with Spark, Redis, etc. on your behalf to achieve your desired state. It'll also keep track of all the metadata for you and easily make it share-able and re-usable.


I believe people have found ways to reduce drag with lasers and microwaves so you might be on to something.


It could be a new type of scram jet, right? A normal jet engine has to slow down supersonic air to subsonic to combust it which is inefficient.


We should consider using this on Venus or mercury.



I’m a huge climate person. I come from ancestors that laid down the early climate change work. But This methodology seems kind whacky. I think this kind of thing hurts our credibility in the long run.


I remember an interview with a startup where I couldn’t figure out what their product was and they got more and more annoyed with my questions.

I ended up with something with the cloud.


Besides that you actually want your unit to struggle a little in order to have enough run time to control humidity.

It’s actually a problem if you have an overpowered system.


When is the humidity a problem? If I have my thermostat set to 73 and the system is maintaining that through the day and night, will humidity be an issue? A variable speed system in a typically insulated home will likely be running at a very low level most of the day.

I can see it being a problem if I come back from vacation and change the thermostat from 85 to 73 and ask the A/C to get there ASAP, but even then the controller could know how much moisture is in the air and make sure the temperature drop is slow enough to deal with condensation.


I found tower fans are a game changer. They are fairly quiet and move a lot of air.

I prefer them to ceiling fans plus no installation and you can keep them when you move.


> "Extract this to a separate function please"

This sounds really bad. Even if you’re their boss. You’re dealing with professionals and part of that is considering their input.

I’d start with the assumption that they know what they’re doing and might have had a good reason and phrase it from there. Even if not true it sets the right tone of respect.


Why not just use multi processing?


It can use less resources and can be faster than passing data between processes. There's really no reason to use multiple processes for IO-bound workloads, either, and even for some IO-bound workloads, a thread pool can be faster than processes.

The second you aren't just passing raw bytes around, you have to take into consideration what can and can't be sent between processes in Python, as some objects can't be pickled and thus can't be passed between processes.

You can concurrently load a lot more coroutines than processes and threads, as well.


If your task is io bound, aka, lots of network stuff, multi process is overkill. Also, asyncio can handle a _lot_ more tasks 100,000's as opposed to 10s. So it really shines in heavy io things. Also, multiprocessing can not share memory, and that can be a pretty busy g disadvantage depending on the task.


One reason is that you often don't need to use locks. Between lines containing await (or async for or async with), you can be sure that this task won't be pre-empted to run another async task.

Another reason, if you're using the Trio async library, is that managing and cancelling multiple tasks is really easy, and you can be sure that none get lost. This update to Python brings some of that to core asyncio (but I'll stick with Trio for now, thanks).


Asycio is really a list of tasks in the main interpreter thread.

This is far easier to work with than multiprocessing.

When doing, e.g. AWS work, multiprocessing is additional pain for little gain.

Maybe 3.11 will make threads less painful.


From what I've been following, it's going to take another release (or several) after 3.11 for Node-like worker threads to land in Python.


I'm not deep on Node, but this sounds like a GIL thing.


It is, and it's a subinterpreters thing. Node has separate interpreters that run in worker threads but also share memory, which is the route Python is planning on taking.


One reason is if you need to launch a subprocess with a timeout but don't want to use up CPU in the python script while that subprocess runs. The regular subprocess module will busy-loop in such cases, consuming CPU, while asyncio's does not.

The docs even warn about this for subprocess and suggest using asyncio to avoid it, although the docs are misleading - it only busy-loops if the timeout is not None, and only when running on Mac/Linux not Windows.


> Why not just use multi processing?

Multiprocessing provides parallelism up to what the machine supports, but no additional degree of concurrency, asyncio provides a fairly high degree of concurrency, but no parallelism.

OF course, you can use them together to get both.

https://github.com/omnilib/aiomultiprocess


Asyncio actually plays really nicely with multiprocessing, too. The concurrent.futures.ProcessPoolExecutor can handle running tasks in child processes and handles the communication seemlessly for you. I've used it quite a bit. Can easily use all 32 cores on my server this way without the GIL getting in the way.


Probably because async io generally scales better.


Yes curious why this package is needed at all?


Single-threaded execution of IO-bound work can be faster than breaking it up between threads or processes, and it can use a lot less resources. Then there are the preemptive vs cooperative multitasking concerns and the pros and cons of processes/threads vs light-weight threads/coroutines/etc.

Some IO-bound workloads are suited really well by the asyncio model, while other workloads might be better suited for processes and threads. They're three separate tools whose use cases might be similar, but they're not necessarily replacing one another. Multiple processes still have their place even while asyncio exists and vice versa.


Why not just use Gevent?


Explicit (await) vs. implicit (anything that uses patched I/O deep down) switch. Essentially, it makes the reasoning about the code almost as hard as with preemptive threading.


because pickle is an absolute shitshow.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: