Regarding cold-starts, I strongly believe V8 snapshots are perhaps not the best way to achieve fast cold starts with Python (they may be if you are tied to using V8, though!), and will have wide side effects if you go out of the standards packages included on the Pyodide bundle.
To put some perspective: V8 snapshots are storing the whole state of an application (including it's compiled modules). This means that for a Python package that is using Python (one wasm module) + Pydantic-core (one wasm module) + FastAPI... all of those will be included in one snapshot (as well as the application state). This makes sense for browsers, where you want to be able to inspect/recover everything at once.
The issue about this design is that the compiled artifacts and the application state are bundled into one piece artifact (this is not great for AOT designed runtimes, but might be the optimal design for JITs though).
Ideally, you would separate each of the compiled modules from the state of the application. When you do this, you have some advantages: you can deserialize the compiled modules in parallel, and untie the "deserialization" from recovering the state of the application. This design doesn't adapt that well into the V8 architecture (and how it compiles stuff) when JavaScript is the main driver of the execution, however it's ideal when you just use WebAssembly.
This is what we have done at Wasmer, which allows for much faster cold starts than 1 second. Because we cache each of the compiled modules separately, and recover the state of the application later, we can achieve cold-starts that are a magnitude faster than Cloudflare's state of the art (when using pydantic, fastapi and httpx).
If anyone is curious, here is a blogpost where we presented fast-cold starts for the application state (note that the deserialization technique for Wasm modules is applied automatically in Wasmer, and we don't showcase it on the blogpost): https://wasmer.io/posts/announcing-instaboot-instant-cold-st...
Note aside: congrats to the Cloudflare team on their work on Python on Workers, it's inspiring to all providers on the space... keep it up and let's keep challenging the status quo!
I would love to see Wasmer compared there, as we should have way better timings than all the options compared (for each of the use cases). Is there any way you can open the benchmark in Github or start comparing Wasmer as well? Thanks!
Does not appear to work under Firefox, getting a bunch of CORS-related errors (header ‘user-agent’ is not allowed according to header ‘Access-Control-Allow-Headers’ from CORS preflight response) on the /graphql endpoint.
This is awesome! I'm Syrus, from Wasmer. Would love to help you with this!
We are releasing soon a new version of wasmer-js, so it should be very easy to use it with webassembly.sh (note webassembly.sh and wasmer.sh share the same code)
Everything went smooth (just added a new comment on top of this thread for visibility!), only nit is that `convertEol` didn't work, so I had to manually convert `\n` to `\r\n`.
I'd go a bit further. If you want full POSIX support, perhaps WASIX is the best alternative.
It's WASI preview 1 + many missing features, such as: threads, fork, exec, dlopen, dlsym, longjmp, setjmp, ...
I don't think that's accurate, although it's true that needs extra work to work properly in JS based environments.
You can already create threads in Wasm environments (we got even fork working in WASIX!). However, there is an upcoming Wasm proposal that adds threads support natively to the spec: https://github.com/WebAssembly/shared-everything-threads
Right now you should be good to go to start using WASIX.
If you want to compile threaded code, things should already work (without waiting for any proposal in the Wasm space).
If you want to run it, there are few options: use wasmer-js for the browser (Wasmer using the Browser Wasm engine + WASIX) or using normal Wasmer to run it server-side.
No need to wait for the Wasm "proper" implementation. Things should already be runnable with no major issues.
Really impressive work. Would love to see it progress.
Some ways I can see it could improve:
1. setjmp/longjmp could implemented via Wasm Exceptions (this is how we do it on WASIX) - no need to wait on stack switching proposal
2. fork could work easily with asyncify (start/resume), per binary compiled
3. JIT could work via dlopen/dlsym (compiling the Wasm and linking it), even with runtime patching (using memory spaces on tables and updating them as you go to newly compiled code).
In general, I recommend taking an inspiration from WASIX [1] for those things, as we have spend quite a bit of time to make things work as much as possible!
This is very interesting! Would love to see it in play in Wasmer at some point.
I was aware of TinyGo, which allows compiling Go programs via LLVM (and targeting Wasm, for example). They have a very tiny footprint (programs could even run on the browser) https://tinygo.org/
Regarding cold-starts, I strongly believe V8 snapshots are perhaps not the best way to achieve fast cold starts with Python (they may be if you are tied to using V8, though!), and will have wide side effects if you go out of the standards packages included on the Pyodide bundle.
To put some perspective: V8 snapshots are storing the whole state of an application (including it's compiled modules). This means that for a Python package that is using Python (one wasm module) + Pydantic-core (one wasm module) + FastAPI... all of those will be included in one snapshot (as well as the application state). This makes sense for browsers, where you want to be able to inspect/recover everything at once.
The issue about this design is that the compiled artifacts and the application state are bundled into one piece artifact (this is not great for AOT designed runtimes, but might be the optimal design for JITs though).
Ideally, you would separate each of the compiled modules from the state of the application. When you do this, you have some advantages: you can deserialize the compiled modules in parallel, and untie the "deserialization" from recovering the state of the application. This design doesn't adapt that well into the V8 architecture (and how it compiles stuff) when JavaScript is the main driver of the execution, however it's ideal when you just use WebAssembly.
This is what we have done at Wasmer, which allows for much faster cold starts than 1 second. Because we cache each of the compiled modules separately, and recover the state of the application later, we can achieve cold-starts that are a magnitude faster than Cloudflare's state of the art (when using pydantic, fastapi and httpx).
If anyone is curious, here is a blogpost where we presented fast-cold starts for the application state (note that the deserialization technique for Wasm modules is applied automatically in Wasmer, and we don't showcase it on the blogpost): https://wasmer.io/posts/announcing-instaboot-instant-cold-st...
Note aside: congrats to the Cloudflare team on their work on Python on Workers, it's inspiring to all providers on the space... keep it up and let's keep challenging the status quo!
reply