Sooner or later I believe, there will be models which can be deployed locally on...

KK7NIL · 2025-11-25T18:52:01 1764096721

If you read the article you'd notice that running an LLM locally would not fix this vulnerability.

pennomi · 2025-11-25T19:02:45 1764097365

Right, you’d have to deny the LLM access to online resources AND all web-capable tools… which severely limits an agent’s capabilities.

yodon · 2025-11-25T19:00:29 1764097229

From the HN guidelines[0]:

>Please don't comment on whether someone read an article. "Did you even read the article? It mentions that" can be shortened to "The article mentions that".

[0]: https://news.ycombinator.com/newsguidelines.html

KK7NIL · 2025-11-25T19:09:18 1764097758

That's fair, thanks for the heads up.

kami23 · 2025-11-25T18:50:49 1764096649

I've been repeating something like 'keep thinking about how we would run this in the DC' at work. The cycles of pushing your compute outside the company and then bringing it back in once the next VP/Director/CTO starts because they need to be seen as doing something, and the thing that was supposed to make our lives easier is now very expensive...

I've worked on multiple large migrations between DCs and cloud providers for this company and the best thing we've ever done is abstract our compute and service use to the lowest common denominator across the cloud providers we use...

api · 2025-11-25T18:56:16 1764096976

Can't find 4.5, but 3.5 Sonnet is apparently about 175 billion parameters. At 8-bit quantization that would fit on a box with 192 gigs of unified RAM.

The most RAM you can currently get in a MacBook is 128 gigs, I think, and that's a pricey machine, but it could run such a model at 4-bit or 5-bit quantization.

As time goes on it only gets cheaper, so yes this is possible.

The question is whether bigger and bigger models will keep getting better. What I'm seeing suggests we will see a plateau, so probably not forever. Eventually affordable endpoint hardware will catch up.

pmontra · 2025-11-25T19:14:34 1764098074

That's not easy to accomplish. Even a "read the docs at URL" is going to download a ton of stuff. You can bury anything into those GETs and POSTs. I don't think that most developers are going to do what I do with my Firefox and uMatrix, that is whitelisting calls. And anyway, how can we trust the whitelisted endpoint of a POST?

zahlman · 2025-11-26T02:52:46 1764125566

> Edit: "completely local" meant not doing any network calls unless specifically approved. When llm calls are completely local you just need to monitor a few explicit network calls to be sure.

The problem is that people want the agent to be able to do "research" on the fly.

tcoff91 · 2025-11-25T19:05:36 1764097536

At the time that there's something as good as sonnet 4.5 available locally, the frontier models in datacenters may be far better.

People are always going to want the best models.

fragmede · 2025-11-25T18:54:14 1764096854

it's already here with qwen3 on a top end Mac and lm-studio.

dizzy3gg · 2025-11-25T18:53:28 1764096808

Why is the being downvoted?

jermaustin1 · 2025-11-25T18:56:30 1764096990

Because the article shows it isn't Gemini that is the issue, it is the tool calling. When Gemini can't get to a file (because it is blocked by .gitignore), it then uses cat to read the contents.

I've watched this with GPT-OSS as well. If the tool blocks something, it will try other ways until it gets it.

The LLM "hacks" you.

lazide · 2025-11-25T19:33:16 1764099196

And… that isn’t the LLM’s fault/responsibility?

ceejayoz · 2025-11-25T19:35:21 1764099321

As the apocryphal IBM quote goes:

"A computer can never be held accountable; therefore, a computer must never make a management decision."

jermaustin1 · 2025-11-25T23:15:01 1764112501

How can an LLM be at fault for something? It is a text prediction engine. WE are giving them access to tools.

Do we blame the saw for cutting off our finger? Do we blame the gun for shooting ourselves in the foot? Do we blame the tiger for attacking the magician?

The answer to all of those things is: no. We don't blame the thing doing what it is meant to be doing no matter what we put in front of it.

lazide · 2025-11-25T23:31:54 1764113514

It was not meant to give access like this. That is the point.

If a gun randomly goes off and shoots someone without someone pulling the trigger, or a saw starts up when it’s not supposed to, or a car’s brakes fail because they were made wrong - companies do get sued all the time.

Because those things are defective.

jermaustin1 · 2025-11-26T12:45:01 1764161101

But the LLM can't execute code. It just predicts the next token.

The LLM is not doing anything. We are placing a program in front of it that interprets the output and executes it. It isn't the LLM, but the IDE/tool/etc.

So again, replace Gemini with any Tool-calling LLM, and they will all do the same.

lazide · 2025-11-26T12:51:38 1764161498

When people say ‘agentic’ they mean piping that token to various degrees of directly into an execution engine. Which is what is going on here.

And people are selling that as a product.

If what you are describing was true, sure - but it isn’t. The tokens the LLM is outputting is doing things - just like the ML models driving Waymo’s are moving servos and controls, and doing things.

It’s a distinction without a difference if it’s called through an IDE or not - especially when the IDE is from the same company.

That causes effects which cause liability if those things cause damage.

NitpickLawyer · 2025-11-25T18:58:57 1764097137

Because it misses the point. The problem is not the model being in a cloud. The problem is that as soon as "untrusted inputs" (i.e. web content) touch your LLM context, you are vulnerable to data exfil. Running the model locally has nothing to do with avoiding this. Nor does "running code in a sandbox", as long as that sandbox can hit http / dns / whatever.

The main problem is that LLMs share both "control" and "data" channels, and you can't (so far) disambiguate between the two. There are mitigations, but nothing is 100% safe.

mkagenius · 2025-11-25T19:05:07 1764097507

Sorry, I didn't elaborate. But "completely local" meant not doing any network calls unless specifically approved. When llm calls are completely local you just need to monitor a few explicit network calls to be sure.

pmontra · 2025-11-25T20:35:11 1764102911

In a realistic and useful scenario, how would you approve or deny network calls made by a LLM?

zahlman · 2025-11-26T02:59:47 1764125987

The LLM cannot actually make the network call. It outputs text that another system interprets as a network call request, which then makes the request and sends that text back to the LLM, possibly with multiple iterations of feedback.

You would have to design the other system to require approval when it sees a request. But this of course still relies on the human to understand those requests. And will presumably become tedious and susceptible to consent fatigue.

pmontra · 2025-11-26T07:37:31 1764142651

Exactly.