So basically wireguard, but you have to pay for it, and you have create an account through Google/Apple/Microsoft/whatever.
Wireguard is not that hard to set up manually. If you've added SSH keys to your Github account, it's pretty much the same thing. Find a youtube video or something, and you're good. You might not even need to install a wireguard server yourself, as some routers have that built in (like my Ubiquity EdgeRouter)
It's not really "basically wireguard" and you don't have to pay for it for personal use. Wireguard is indeed pretty easy to set up, but basic Wireguard doesn't get you the two most significant features of Tailscale, mesh connections and access controls.
Tailscale does use Wireguard, but it establishes connections between each of your devices, in many cases these will be direct connections even if the devices in question are behind NAT or firewalls. Not every use-case benefits from this over a more traditional hub and spoke VPN model, but for those that do, it would be much more complicated to roll your own version of this. The built-in access controls are also something you could roll your own version of on top of Wireguard, but certainly not as easily as Tailscale makes it.
There's also a third major "feature" that is really just an amalgamation of everything Tailscale builds in and how it's intended to be used, which is that your network works and looks the same even as devices move around if you fully set up your environment to be Tailscale based. Again not everyone needs this, but it can be useful for those that do, and it's not something you get from vanilla Wireguard without additional effort.
I guess I'm still not following. Is there an example thing that you can do with Tailscale that you can't do with Wireguard? "Establishes connections between each of your devices" is pretty vague. The Internet can already do that.
I install tailscale on my laptop. I then install tailscale on a desktop PC I have stashed in a closet at my parents. If they are both logged in to the same tailnet, I can access that desktop PC from my home without any addition network config (no port forwarding on my parents router, UPNP, etc. etc).
I like to think of it as a software defined LAN.
Wireguard is just the transport protocol but all the device management and clever firewall/NAT traversal stuff is the real special sauce.
You can run two nodes both behind restrictive full cone NATs and have them establish an encrypted connection between each other. You can configure your devices to act as exit nodes, allowing other devices on your "tailnet" to use them to reach the internet. You can set up ACLs and share access to specific devices and ports with other users. If you pay a bit more, you can also use any Mullvad VPN node as an exit point.
Tailscale is "just" managed Wireguard, with some very smart network people doing everything they can to make it go point-to-point even with bad NATs, and offering a free fallback trustless relay layer (called DERP) that will act as a transit provider of last resort.
Tailscale is free for pretty much everything you'd want to do as a home user.
It also doesn't constantly try and ram any paid offerings down your throat.
I was originally put off by how much Tailscale is evangelised here, but after trying it, I can see why it's so popular.
I have my Ubuntu server acting as a Tailscale exit node.
I can route any of my devices through it when I'm away from home (e.g. phone, tablet, laptop).
It works like a VPN in that regard.
Last year, I was on a plane and happened to sit next to an employee of Tailscale.
I told him that I thought his product was cool (and had used it throughout the flight to route my in-flight Wi-fi traffic back to the UK) but that I had no need to pay for it!
I would have expected that going from one node (which can't hold the weights in RAM) to two nodes would have increased inference speed by more than the measured 32% (21.1t/s -> 27.8t/s).
With no constraint on RAM (4 nodes) the inference speed is less than 50% faster than with only 512GB.
I think the op meant pipeline parallelism where during inference you only transfer the activation between layers where you cut the model in two, which shouldn't be too large.
Weights are read-only data so they can just be memory mapped and reside on SSD (only a small fraction will be needed in VRAM at any given time), the real constraint is activations. MoE architecture should help quite a bit here.
You need all the weights every token, so even with optimal splitting the fraction of the weights you can farm out to an SSD is proportional to how fast your SSD is compared to your RAM.
You'd need to be in a weirdly compute-limited situation before you can replace significant amounts of RAM with SSD, unless I'm missing something big.
> MoE architecture should help quite a bit here.
In that you're actually using a smaller model and swapping between them less frequently, sure.
Even with MoE you still need enough memory to load all experts. For each token, only 8 experts (out of 256) are activated, but which experts are chosen changes dynamically based on the input. This means you'll be constantly loading and unloading experts from disk.
MoEs is great for distributed deployments, because you can maintain a distribution of experts that matches your workload, and you can try to saturate each expert and thereby saturate each node.
With a cluster of two 512GB nodes, you have to send half the weights (350GB) over a TB5 connection. But you have to do this exactly once on startup.
With a single 512GB node, you'll be loading weights from disk each time you need a different expert, potentially for each token. Depending on how many experts you're loading, you might be loading 2GB to 20GB from disk each time.
Unless you're going to shut down your computer after generating a couple of hundred tokens, the cluster wins.
> only a small fraction will be needed in VRAM at any given time
I don't think that's true. At least not without heavy performance loss in which case "just be memory mapped" is doing a lot of work here.
By that logic GPUs could run models much larger than their VRAM would otherwise allow, which doesn't seem to be the case unless heavy quantization is involved.
Existing GPU API's are sadly not conducive to this kind of memory mapping with automated swap-in. The closest thing you get AIUI is "sparse" allocations in VRAM, such that only a small fraction of your "virtual address space" equivalent is mapped to real data, and the mapping can be dynamic.
As someone who has been waiting for the same thing as op tyre posted, I went to investigate this claim and it seems that it might be true but only when running apps within the Google AI Studio itself.. ie if you were to make an app that was on something like the App Store using Google AI Studio, it would be back to an API key that the developer bears the costs for.
The problem with the current model is that there is a high barrier to justifying the user pays essentially a 2nd/3rd subscription for ultimately the same AI intelligence layer. And so you cannot currently make an economically successful small use case app based on AI without somehow restricting users use of AI. I don't think AI companies are incentivized to fix this.
After he described the rules, my immediate reaction was 'this is like mastermind'. Sure enough, further down the page:
Other than that, in my research I came across a boardgame called Mastermind, which has been around since the 70s. This is a very similar premise - think of it as "Guess Who?" on hard mode.
A couple of weeks ago, I bought a 'sensor kit' from Amazon for my son to use with his Raspberry Pi. It includes some input devices (e.g. button, moisture sensor) and output devices (e.g. LED) that can be plugged onto breadboard.
You can do this with bidicalc already! You just have to model the problem correctly. If you expect the ratio to remain constant, what you actually want is a problem with a single free variable: the scale.
A1 = 1.0 // the scale, your variable
A2 = 6 * A1 // intermediate values
A3 = 8 * A1
A4 = A2 + A3 // the sum
Now update A4 (or any other cell!) and the scale (A1, the only variable) will update as you expect.
Regarding the last point, you can do better than sleep (lower power state). You can have the microcontroller cut its own power once it's done its work:
Install the tailscale client on each of your devices.
Each device will get an IP address from Tailscale. Think about that like a new LAN address.
When you're away from home, you can access your home devices using the Tailscale IP addresses.
reply