I had a similar opinion, that we were somewhere near the top of the sigmoid curve of model improvement that we could achieve in the near term. But given continued advancements, I’m less sure that prediction holds.
My model is a bit simpler: model quality is something like the logarithm of effort you put into making the model. (Assuming you know what you are doing with your effort.)
So I don't think we are on any sigmoid curve or so. Though if you plot the performance of the best model available at any point in time against time on the x-axis, you might see a sigmoid curve, but that's a combination of the logarithm and the amount of effort people are willing to spend on making new models.
(I'm not sure about it specifically being the logarithm. Just any curve that has rapidly diminishing marginal returns that nevertheless never go to zero, ie the curve never saturates.)
Yeah I have a similar opinion and you can go back almost a year when claude 3.5 launched and I said on hackernews, that its good enough
And now I am saying the same for gemini 3 flash.
I still feel the same way tho, sure there is an increase but I somewhat believe that gemini 3 is good enough and the returns on training from now on might not be worth thaat much imo but I am not sure too and i can be wrong, I usually am.
It seems like a good use case for a caching layer. It seems like you would probably be able to make a set up for agentic systems more simply / cheaply in Hetzner than trying to cobble together a bunch of fragmented apis.
Power draw? A entire Mac Pro running flat out uses less power than 1 5090.
If you have a workload that needs a huge memory footprint then the tco of the Macs, even with their markup may be lower.
HN is not an entity with a single perspective, and there are plenty of people on here who have a financial stake in you believing their perspective on the matter.
I'm beginning to pick up a few more consulting opportunities based on my writing and my revenue from GitHub sponsors is healthy, but I'm not particularly financially invested in the success of AI as a product category.
Thanks for the link. I see that you get credits and access to embargod releases. So I understand that's not financial stake, but seems enough of an incentive to say positive things about those services, doesn't it? Not that it matters to me, and I might be wrong, but to an outsider it might seem so
The counter-incentive here is that my reputation and credibility is more valuable to me than early access to models.
This very post is an example of me taking a risk of annoying a company that I cover. I'm exposing the existence of the ChatGPT skills mechanism here (which I found out about from a tip on Twitter - it's not something I got given early access to via an NDA).
It's very possible OpenAI didn't want that story out there yet and aren't happy that it's sat at the top of Hacker News right now.
Are you factoring in the above comment about as yet un-implemented parallel speed up in there? For on prem inference without any kind of asic this seems quite a bargain relatively speaking.
Forgive me if I’m just reading this incorrectly, but that doesn’t sound exactly like property testing as I’ve done it. the libraries implement an algorithm for narrowing down to the simplest reproducer for a given failure mode, so all of the inputs to a test that are randomized are provided by the library.
UBI will not save you from economic irrelevance. The only difference between you and someone starving in a 3rd world slum is economic opportunity and the means to exchange what you have for what someone else needs. UBI is inflation in a wig and dark glasses.
reply