> Large language models are something else entirely*. They are black boxes. You cannot audit them. You cannot truly understand what they do with your data. You cannot verify their behaviour. And Mozilla wants to put them at the heart of the browser and that doesn't sit well.
Am I being overly critical here or is this kind of a silly position to have right after talking about how neural machine translation is okay? Many of Firefox's LLM features like summarization afaik are powered by local models (hell even Chrome has local model options). It's weird to say neural translation is not a black box but LLMs are somehow black boxes that we cannot hope to understand what they do with the data, especially when viewed a bit fuzzily LLMs are scaled up versions of an architecture that was originally used for neural translation. Neural translation also has unverifiable behavior in the same sense.
I could interpret some of the data talk as talking about non local models but this very much seems like a more general criticism of LLMs as a whole when talking about Firefox features. Moreover, some of the critiques like verifiability of outputs and unlimited scope still don't make sense in this context. Browser LLM features except for explicitly AI browsers like Comet have so far had some scoping to their behavior, either in very narrow scopes like translation or summarization. The broadest scope I can think of is the side panels that show up which allow you to ask about a web page with context. Even then, I do not see what is inherently problematic about such scoping since the output behavior is confined to the side panel.
To be more charitable to TFA, machine translation is a field where there aren't great alternatives and the downside is pretty limited. If something is in another language you don't read it at all. You can translate a bunch of documents and benchmark the result and demonstrate that the model doesn't completely change simple sentences. Another related area is OCR - there are sometimes mistakes, but it's tractable to create a model and verify it's mostly correct.
LLMs being applied to everything under the sun feels like we're solving problems that have other solutions, and the answers aren't necessarily correct or accurate. I don't need a dubiously accurate summary of an article in English, I can read and comprehend it just fine. The downside is real and the utility is limited.
There's an older tradition of rule-based machine translation. In these methods, someone really does understand exactly what the program does, in a detailed way; it's designed like other programs, according to someone's explicit understanding. There's still active research in this field; I have a friend who's very deep into it.
The trouble is that statistical MT (the things that became neural net MT) started achieving better quality metrics than rule-based MT sometime around 2008 or 2010 (if I remember correctly), and the distance between them has widened since then. Rule-based systems have gotten a little better each year, while statistical systems have gotten a lot better each year, and are also now receiving correspondingly much more investment.
The statistical systems are especially good at using context to disambiguate linguistic ambiguities. When a word has multiple meanings, human beings guess which one is relevant from overall context (merging evidence upwards and downwards from multiple layers within the language understanding process!). Statistical MT systems seem to do something somewhat similar. Much as human beings don't even perceive how we knew which meaning was relevant (but we usually guessed the right one without even thinking about it), these systems usually also guess the right one using highly contextual evidence.
Linguistic example sentences like "time flies like an arrow" (my linguistics professor suggested "I can't wait for her to take me here") are formally susceptible of many different interpretations, each of which can be considered correct, but when we see or hear such sentences within a larger context, we somehow tend to know which interpretation is most relevant and so most plausible. We might never be able to replicate some of that with consciously-engineered rulesets!
It's bitter for me because I like looking at how things work under the hood and that's much less satisfying when it's "a bunch of stats and linear algebra that just happens to work"
If you're building on a computer language, you can say you understand the computer's abstract machine, even though you don't know how we ever managed to make a physical device to instantiate it!
My friend is working on Grammatical Framework, which has a Resource Grammar library of pre-written natural language grammars, at least for portions of them. The GF research community continues to add new ones over time, based on implementing portions of written reference grammars, or sometimes by native speakers based on their own native speaker intuitions. I'm not sure if there are larger grammar libraries elsewhere.
There could be companies that made much better rule-based MT but kept the details as trade secrets. For example, I think Google Translate was rule-based for "a long time" (I don't remember until what year, although it was pretty apparent to users and researchers when it switched, and indeed I think some Google researchers even spoke publicly about it). They had made a lot of investment (very far beyond something like a GF resource grammar) but I don't think they ever published any of that underlying work even when they discontinued that version of the product.
So basically there may be this gap where academic stuff is advancing slowly and yet now represents the majority of examples in the field because companies are so unlikely to have ongoing rule-based projects as part of projects. The available state of the art you can actually interact with may have gone backwards in recent years as a result!
It's completely possible to write a parser that outputs every possible parse of "time flies like an arrow", and then try interpreting each one and discard ones that don't make sense according to some downstream rules (unknown noun phrase: "time fly").
I did this for a text adventure parser, but it didn't work well because there are exponentially ways to group the words in a sentence like "put the ball on the bucket on the chair on the table on the floor"
I would argue that particular sentence only exists to convey the bamboozled feeling you get when you reach the end of it, so only sentient parsers can parse it properly.
> There's an older tradition of rule-based machine translation. In these methods, someone really does understand exactly what the program does, in a detailed way
I would softly disagree with this. Technically, we also understand exactly what a LLM does, we can analyze every instruction that is executed. Nothing is hidden from us. We don't always know what the outcome will be; but, we also don't always know what the outcome will be in rule-based models, if we make the chain of logic too deep to reliably predict. There is a difference, but it is on a spectrum. In other words, explicit code may help but it does not guarantee understanding, because nothing does and nothing can.
The grammars in rule-based MT are normally fully conceptually understood by the people who wrote them. That's a good start for human understanding.
You could say they don't understand why a human language evolved some feature but they fully understand the details of that feature in human conceptual terms.
I agree in principle the statistical parts of statistical MT are not secret and that computer code in high-level languages isn't guaranteed to be comprehensible to a human reader. Or in general, binary code isn't guaranteed to be incomprehensible and source code isn't guaranteed to be comprehensible.
But for MT, the hand-written grammars and rules are at least comprehended by their authors at the time they're initially constructed.
Sure, I agree with that, but that's a property of hand-writing more than rule-based systems. For instance, you could probably translate a 6B LLM into an extremely big rule system, but doing so would not help you understand how the LLM worked.
LLMs are great because of exactly that: they solve things that have no other solutions.
(And also things that have other solutions, but where "find and apply that other solution" has way more overhead than "just ask an LLM".)
There is no deterministic way to "summarize this research paper, then evaluate whether the findings are relevant and significant for this thing I'm doing right now", or "crawl this poorly documented codebase, tell me what this module does". And the alternative is sinking your own time in it - while you could be doing something more important or more fun.
and demonstrate that the model doesn't completely change simple sentences
A nefarious model would work that way though. The owner wouldn't want it to be obvious. It'd only change the meaning of some sentences some of the time, but enough to nudge the user's understanding of the translated text to something that the model owner wants.
For example, imagine a model that detects the sentiment of text about Russian military action, and automatically translates it to something a more positive if it's especially negative, but only 20% of the time (maybe ramping up as the model ages). A user wouldn't know, and a someone testing the model for accuracy might assume it's just a poor translation. If such a model became popular it could easily shift the perception of the public a few percent in the owner's preferred direction. That'd be plenty to change world politics.
Likewise for a model translating contracts, or laws, or anything else where the language is complex and requires knowledge of both the language and the domain. Imagine a Chinese model that detects someone trying to translate a contract from Chinese to English, and deliberately modifies any clause about data privacy to change it to be more acceptable. That might be paranoia on my part, but it's entirely possible on a technical level.
That's not a technical problem though is it? I don't see legal scenarios where unverified machine translation is acceptable - you need to get a certified translator to sign off on any translations and I also don't see how changing that would be a good thing.
I was briefly considering trying to become a professional translator, and I partly didn't pursue it because of the huge use of MT. I predict demand for human translators will continue to fall quickly unless there are some very high-profile incidents related to MT errors (and humans' liability for relying on them?). Correspondingly the supply of human translators may also fall as it appears like a less credible career option.
I think the point here is that, while such a translation wouldn't be admissible in court, many of us already used machine translation to read some legal agreement in a language we don't know.
Aside: Does anyone actually use summarization features? I've never once been tempted to "summarize" because when I read something I either want to read the entire thing, or look for something specific. Things I want summarized, like academic papers, already have an abstract or a synopsis.
In-browser ones? No. With external LLMS? Often. It depends on the purpose of the text.
If the purpose is to read someone's _writing_, then I'm going to read it, for the sheer joy of consuming the language. Nothing will take that from me.
If the purpose is to get some critical piece of information I need quickly, then no, I'd rather ask an AI questions about a long document than read the entire thing. Documentation, long email threads, etc. all lend themselves nicely to the size of a context window.
> If the purpose is to get some critical piece of information I need quickly, then no, I'd rather ask an AI questions about a long document than read the entire thing. Documentation, long email threads, etc. all lend themselves nicely to the size of a context window.
And what do you do if the LLM hallucinates? For me, skim-reading still comes out on top because my own mistakes are my own.
Yeah, basically every 15 minute YouTube video, because the amount of actual content I care about is usually 1-2 sentences, and usually ends up being the first sentence of an LLM summary of the transcript.
If something has actual substance I'll watch the whole thing, but that's maybe 10% of videos I find in experience.
I'd wager there's 95% of the benefit for 0.1% of the CPU cycles just by having a "search transcript for term" feature, since in most of those cases I've already got a clear agenda for what kind of information I'm seeking.
Many years ago I make a little proof-of-concept for displaying the transcript (closed captions) of a YouTube video as text, and highlighting a word would navigate to that timestamp and vice-versa. Such a thing might be valuable as a browser extension, now that I think of it.
YouTube already supports that natively these days, although it's kind of hidden (and knowing Google, it might very well randomly disappear one day). Open the description of the video, scroll down and click "show transcript".
Searching the transcript has the problem of missing synonyms. This can be solved by the one undeniably useful type of AI: embedding vector search. Embeddings for each line of the transcript can be calculated in advance and compared with the embeddings of the user's search. These models need only a few hundred million parameters for good results.
One of the best features of SponsorBlock is crowd sourced timestamps for the meat of the video. Skip right over 20 minutes of rambling to see the cool thing in the thumbnail.
> AI False Information Rate Nearly Doubles in One Year
> NewsGuard’s audit of the 10 leading generative AI tools and their propensity to repeat false claims on topics in the news reveals the rate of publishing false information nearly doubled — now providing false claims to news prompts more than one third of the time.
Wonderful article showing the uselessness of this technology, IMO.
> I just realised the situation is even worse. If I have 35 sentences of circumstance leading up to a single sentence of conclusion, the LLM mechanism will — simply because of how the attention mechanism works with the volume of those 35 — find the ’35’ less relevant sentences more important than the single key one. So, in a case like that it will actively suppress the key sentence.
> I first tried to let ChatGPT one of my key posts (the one about the role convictions play in humans with an addendum about human ‘wetware’). ChatGPT made a total mess of it. What it said had little to do with the original post, and where it did, it said the opposite of what the post said.
> For fun, I asked Gemini as well. Gemini didn’t make a mistake and actually produced something that is a very short summary of the post, but it is extremely short so it leaves most out. So, I asked Gemini to expand a little, but as soon as I did that, it fabricated something that is not in the original article (quite the opposite), i.e.: “It discusses the importance of advisors having strong convictions and being able to communicate them clearly.” Nope. Not there.
Why, after reading something like this, should I think of this technology as useful for this task? It seems like the exact opposite. And this is what I see with most LLM reviews. The author will mention spending hours trying to get the LLM to do a thing, or "it made xyz, but it was so buggy that I found it difficult to edit it after, and contained lots of redundant parts", or "it incorrectly did xyz". And every time I read stuff like that I think — wow, if a junior dev did that the number of times the AI did, they'd be fired on the spot.
See also, something like https://boston.conman.org/2025/12/02.1 where (IIRC) the author comes away with a semi-positive conclusion, but if you look at the list near the end, most of these things are something that any person would get fired for, and are things that are not positive for industrial software engineering and design. LLMs appear to do a "lot", but still confabulates and repeats itself incessantly, making it worthless to depend on for practical purposes unless you want to spend hours chasing your own tail over something it hallucinated. I don't see why this isn't the case. I thought we were trying to reduce the error rate in professional software development, not increase it.
You mean you don't summarize those terrible articles you happen to come across and you're a little intrigued, hoping that there's some substance, and then you read, and it just repeats the same thing over and over again with different wording? Anyway, I sometimes still give them the benefit of the doubt, and end up doing a summary. Often they get summarized into 1 or 2 sentences.
Not him, but no. I read a ton already. Using LLMs to summarize a document is a good way to find out if I should bother reading it myself, or if I should read something else.
I occasionally use the "summarize" button on the iPhone Mobile Safari reader view if I land on a blog entry and it's quite long and I want to get a quick idea of if it's worth reading the whole thing or not.
Yes. I use it sometimes in Firefox with my local LLM server. Sometimes i come across an article I'm curious about but don't have the time or energy to read. Then I get a TL;DR from it. I know it's not perfect but the alternative is not reading it at all.
If it does interest me then I can explore it. I guess I do this once a week or so, not a lot.
I just use it for personal information, I'm not involved in any wars :) I don't base any decisions on it, for example if I buy something I don't go by just AI stuff to make a decision. I use the AI to screen reviews, things like that (generally I prefer really deep review and not glossy consumer-focused ones). Then I read the reviews that are suitable to me.
And even reading an article about those myself doesn't make me insusceptible to misinformation of course. Most of the misinformation about these wars is spread on purpose by the parties involved themselves. AI hallucination doesn't really cause that, it might exacerbate it a little bit. Information warfare is a huge thing and it has been before AI came on the scene.
Ok, as a more specific example, recently I was thinking of buying the new Xreal Air 2. I have the older one but I have 3 specific issues with it. I used AI to find references about these issues being solved. This was the case and AI confirmed that directly with references, but in further digging myself I did find that there was also a new issue introduced with that model involving blurry edges. So in the end I decided not to buy the thing. The AI didn't identify that issue (though to be fair I didn't ask it to look for any).
So yeah it's not an allknowing oracle and it makes mistakes, but it can help me shave some time off such investigations. Especially now that search engines like google are so full of clickbait crap and sifting through that shit is tedious.
In that case I used OpenWebUI with a local LLM model that speaks to my SearXNG server which in turn uses different search engines as a backend. It tends to work pretty well I have to say, though perplexity does it a little better. But I prefer self-hosting as much as I can (of course the search engine part is out of scope there).
Even if you know about and act against mis- and disinformation, it affects you, and you voluntarily increase your exposure to it. And the situation is already terrible.
I gave the example of wars, because it’s obvious, even for you, and you won’t relativize away the same way how you just did with AI misinformation, which affects you the exact same way.
Most recently, a new ISP contract: because it's both low stakes enough where I don't care much about inaccuracies (it's a bog standard contract from a run of the mill ISP), there's basically no information in there that the cloud vendor doesn't already have (if they have my billing details) but also where I'm curious about whether anything might jump out, all while not really wanting to read the 5 pages of the thing.
Just went back to that, it got both all of the main items (pricing, contract terms, my details) correctly, but also the annoying fine print (that I referenced, just in case). Also works pretty well across languages, though that depends on the model in question a bunch.
I feel like if browsers or whatever get the UX of this down, people will upload all sorts of data into those vendors that they normally shouldn't. I also think that with nuanced enough data, we'll eventually have the LLM equivalent of Excel messing up data due to some formatting BS.
Looking back with fresh eyes, I definitely think I could’ve presented what I’m trying to say better.
On a purely technical play, you’re right that I’m drawing a distinction that may not hold up purely on technical grounds. Maybe the better framing is: I trust constrained, single purpose models with somewhat verifiable outputs (seeing text go in, translated text go out, compare its consistency) more than I trust general purpose models with broad access to my browsing context, regardless of whether they’re both neural networks under the hood.
WRT to the “scope”, maybe I have picked up the wrong end of the stick with what Mozilla are planning to do - but they’ve already picked all the low hanging fruit with AI integration with the features you’ve mentioned and the fact they seem to want to dig their heels in further, at least to me, signals that they want deeper integration? Although who knows, the post from the new CEO may also be a litmus test to see what the response to that post elicits, and then go from there.
I still don’t understand what you mean by “what they do with your data” - because it sounds like exfiltration fear mongering, whereas LLMs are a static series of weights. If you don’t explicitly call your “send_data_to_bad_actor” function with the user’s I/O, nothing can happen.
I disagree that it’s fear mongering. Have we not had numerous articles on HN about data exfiltration in recent memory? Why would an LLM that is in the drivers seat of a browser (not talking about current feature status in Firefox wrt to sanitised data being interacted with) not have the same pitfalls?
Seems as if we’d be 3 for 3 in the “agents rule of 2” in the context of the web and a browser?
> [A] An agent can process untrustworthy inputs
> [B] An agent can have access to sensitive systems or private data
> [C] An agent can change state or communicate externally
Even if we weren’t talking about such malicious hypotheticals, hallucinations are a common occurrence as are CLI agents doing things it thinks best, sometimes to the detriment of the data it interacts with. I personally wouldn’t want my history being modified or deleted, same goes with passwords and the like.
It is a bit doomerist, I doubt it’ll have such broad permissions but it just doesn’t sit well which I suppose is the spirit of the article and the stance Waterfox takes.
> Have we not had numerous articles on HN about data exfiltration in recent memory?
there’s also an article on the front page of HN right now claiming LLMs are black boxes and we don’t know how they work, which is plainly false. this point is hardly evidence of anything and equivalent to “people are saying”
This is true though. While we know what they do on a mechanistic level, we cannot reliably analyze why the model outputs any particular answer in functional terms without a heroic effort at the "arxiv paper" level.
In the digital world, we should be able to go back from output to input unless the intention of the function is to "not do that". Like hashing.
Llms not being able to go from output back to input deterministically and for us to understand why is very important, most of our issues with llms stem from this issue. Its why mechanistic interpretabilty research is so hot right now.
The car analogy is not good because models are digital components and a car is a real world thing. They are not comparable.
I mean, fluid dynamics is an unsolved issue. But even so we know *considerably* less about how LLMs work in functional terms than about how combustion engines work.
We know how neural nets work. We don't know how a specific combination of weights in the net is capable of coherently asking questions asked in a natural language, though. If we did, we could replicate what it does without training it.
> We know how neural nets work. We don't know how a specific combination of weights in the net is capable of coherently asking questions asked in a natural language, though.
these are the same thing. the neural network is trained to predict the most likely next word (rather token, etc.) — that’s how it works. that’s it. you train a neural network on data, it learns the function you trained it to, it “acts” like the data. have you actually studied neural networks? do you know how they work? I’m confused why you and so many others are seemingly so confused by this. what fundamentally are you asking for to meet the criteria of knowing how LLMs work? some algorithm that can look at weights and predict if the net will output “coherent” text?
> If we did, we could replicate what it does without training it.
It's like you're describing a compression program as "it takes a big file and returns a smaller file by exploiting regularities in the data." Like, you have accurately described what it does, but you have in no way answered the question of how it does that.
If you then explain the function of a CPU and how ELF binaries work (which is the equivalent of trying to answer the question by explaining how neural networks work), you then have still not answered the actually important question! Which is "what are the algorithms that LLMs have learnt that allow them to (apparently) converse and somewhat reason like humans?"
Yes, in the analogy it's equivalent to saying you know "what" every instruction in the compression program is doing. push decrements rsp, xor rax, rax zeroes out the register. You know every step. But you don't know the algorithm that those instructions are implementing, and that's the same situation we're in with LLMs. We can describe their actions numerically, but we cannot describe them behaviorally, and they're doing things that we don't know how to otherwise do with numerical methods. They've clearly learnt algorithms but we cannot yet formalize what they are. The universal approximation theorem actually works against your argument here, because it's too powerful- they could be implementing anything.
edit: We know the data that their function outputs, it's a "blurry jpeg of the internet" because that's what they're trained on. But we do not know what the function is, and being able to blurrily compress the internet into a tb or whatever is utterly beyond any other compression algorithm known to man.
I believe you are conflating multiple concepts to prove a flaky point.
Again, unless your agent has access to a function that exfiltrates data, it is impossible for it to do so. Literally!
You do not need to provide any tools to an LLM that summarizes or translates websites, manages your open tabs, etc. This can be done fully locally in a sandbox.
Linking to simonw does not make your argument valid. He makes some great points, but he does not assert what you are claiming at any point.
Please stop with this unnecessary fear mongering and make a better argument.
Thinking aloud, but couldn't someone create a website with some malicious text that, when quoted in a prompt, convinces the LLM to expose certain private data to the web page, and couldn't the webpage send that data to a third party, without the need for the LLM to do so?
This is probably possible to mitigate, but I fear what people more creative, motivated and technically adept could come up with.
Firefox should look like Librewolf first of all, Librewolf shouldn’t have to exist. Mozilla’s privacy stuff is marketing bullshit just like Apple. It shouldn’t be doing ANYTHING that isn’t local only unless it’s explicitly opt in or user UI action oriented. The LLM part is absurd bc the entire overton window is in the wrong place.
That's not really accurate: Firefox peaked somewhere around 30% market share back when IE was dominant, and then Chrome took over the top spot within a few years of launching.
FWIW, I think there's just no good move for Mozilla. They're competing against 3 of the biggest companies in the world who can cross-subsidise browser development as a loss-leader, and can push their own browsers as the defaults on their respective platforms. The most obvious way to make money from a browser - harvesting user data - is largely unavailable to them.
I would rather firefox release a paid browser with no AI, or at least everything Opt-In, and more user control than to see them stuff unwanted features on users.
I used firefox faithfully for a long time, but it's time for someone to take it out back and put it down.
Also, I switched to Waterfox about a year ago and I have no complaints. The very worst thing about it is that when it updates its very in your face about it, and that is such a small annoyance that its easily negligible.
Throw on an extension like Chrome Mask for those few websites that "require chrome" (as if that is an actual thing), a few privacy extensions, ecosia search, uBlacklist (to permablock certain sites from search results), and Content Farm Terminator to get rid of those mass produced slop sites that weasel their way into search results and you're going to have a much better experience than almost any other setup.
The thing about translation, even a human translator will sometimes make silly mistakes unless they know the domain really well. So LLM are not any worse. Translation is a problem with no deterministic solution (rule-based translation had always been a bad joke). Properly implemented deterministic search/information retrieval, on the other hand, works extremely well. So well it doesn't really need any replacement - except when you also have some extra dynamics on top like "filtering SEO slop" - and that's not something LLMs can improve at all.
No, it is disqualifyingly clueless. The author defends one neural network, one bag of effectively-opaque floats that get blended together with WASM to produce non-deterministic outputs which are injected into the DOM (translation), then righteously crusades against other bags of floats (LLMs).
From this point of view, uBlock Origin is also effectively un-auditable.
Or your point about them maybe imagining AI as non-local proprietary models might be the only thing that makes this make sense. I think even technical people are being suckered by the marketing that "AI" === ChatGPT/Claude/Gemini style cloud-hosted proprietary models connected to chat UIs.
I'm ok with Translation because it's best solved with AI. I'm not ok with it when Firefox "uses AI to read your open tabs" to do things that don't even need an AI based solution.
local, open model
local, proprietary model
remote, open model (are there these?)
remote, proprietary model
There is almost no harm in a local, open model. Conversely, a remote, proprietary model should always require opting in with clear disclaimers. It needs to be proportional.
The harm to me is the implementation is terrible - local or not (assuming no AI based telemetry). If their answer is AI then it pretty much means they won't make a non-AI solution. Today I just got my first stupid AI tab grouping in Firefox that makes zero intuitive sense. I just want grouping not from an AI reading my tabs. It should just be based on where my tabs were opened from. I also tried Waterfox today because of this post and while I'd prefer horizontal grouping atleast their implementation isn't stupid. Language translation is a opaque complex process. Tabs being grouped from other tabs is not good when opaque and unpredictable and does not need AI.
That is a good point, and I think the takeaway is that there are lots of degrees of freedom here. Open training data would be better, of course, but open weights is still better than completely hidden.
I don't see the difference between "local, open weights" and "local, proprietary weights". Is that just the handful of lines of code that call the inference?
The model itself is just a binary blob, like a compiled program. Either you get its source code (the complete training data) or you don't.
You've lost the plot: The [local|remote]-[open|closed] comment is making a broad claim about LLM usage in general, not limited to the hyper-narrow case of tab-grouping. I'm saying the majority of LLM-dangers are not fixed by that 4-way choice.
Even if it were solely about tab-grouping, my point still stands:
1. You're browsing some funny video site or whatever, and you're naturally expecting "stuff I'm doing now" to be all the tabs on the right.
2. A new tab opens which does not appear there, because the browser chose to move it over into your "Banking" or "Online purchases" groups, which for many users might even be scrolled off-screen.
3. An hour later you switch tasks, and return to your "Banking" or "Online Purchases". These are obviously the same tabs before that you opened from a trusted URL/bookmark, right?
4. Logged out due to inactivity? OK, you enter your username and password into... the fake phishing tab! Oops, game over.
Was the fuzzy LLM instrumental in the failure? Yes. Would having a local model with open weights protect you? No.
Am I being overly critical here or is this kind of a silly position to have right after talking about how neural machine translation is okay? Many of Firefox's LLM features like summarization afaik are powered by local models (hell even Chrome has local model options). It's weird to say neural translation is not a black box but LLMs are somehow black boxes that we cannot hope to understand what they do with the data, especially when viewed a bit fuzzily LLMs are scaled up versions of an architecture that was originally used for neural translation. Neural translation also has unverifiable behavior in the same sense.
I could interpret some of the data talk as talking about non local models but this very much seems like a more general criticism of LLMs as a whole when talking about Firefox features. Moreover, some of the critiques like verifiability of outputs and unlimited scope still don't make sense in this context. Browser LLM features except for explicitly AI browsers like Comet have so far had some scoping to their behavior, either in very narrow scopes like translation or summarization. The broadest scope I can think of is the side panels that show up which allow you to ask about a web page with context. Even then, I do not see what is inherently problematic about such scoping since the output behavior is confined to the side panel.