More

ceroxylon · 2025-11-15T01:12:29 1763169149

I had my suspicions about the GPT-5 routing as well. When I first looked at it, the clock was by far the best; after the minute went by and everything refreshed, the next three were some of the worst of the group. I was wondering if it just hit a lucky path in routing the first time.

ceroxylon · 2025-11-15T01:09:44 1763168984

They have it available on the site under the (?) button:

"Create HTML/CSS of an analog clock showing ${time}. Include numbers (or numerals) if you wish, and have a CSS animated second hand. Make it responsive and use a white background. Return ONLY the HTML/CSS code with no markdown formatting."

ceroxylon · 2025-11-06T16:28:42 1762446522

Surely there can be a workflow created to "fight fire with fire" and have an AI that reads reports, trained on the code base with explicit instructions to verify all of the telltale signs of slop...? If AI services can handle the nightmare of parsing emails and understanding the psychology of phishing, I am optimistic it can be done for OSS reports.

It doesn't have to make the final judgement, just some sort of filter that automatically flags things like function calls that don't exist in the code.

ceroxylon · 2025-11-02T18:26:39 1762107999

It is good to be skeptical, but there is a large amount of detail in the paper itself that would have taken quite a bit of effort to fabricate (for no good reason): https://www.sciencedirect.com/science/article/pii/S235198942...

There is enough detail there to book a trip to Germany and set up infrared cameras if we are so inclined, repeatability is a large part of science.

ceroxylon · 2025-10-31T00:55:11 1761872111

I can't prove it, but from interacting with 'support' teams who are clearly middle-people working with a clunky AI and accepting its output as absolute, that would be my first guess.

ceroxylon · 2025-10-24T15:34:24 1761320064

The demo looks like holding a robot's hand while they do something that would normally take me 15 seconds anyway. I have mostly found AI to be useful for search/research, not creating a middle-man between my friends and myself who has the "feature" of knowing what the star ratings on Google Maps imply.

ceroxylon · 2025-10-20T17:17:38 1760980658

In the last hour, I have seen the number of impacted services go from 90 to 92, currently sitting at 97.

ceroxylon · 2025-10-03T18:26:05 1759515965

This makes sense, but what happens when they stop burning cash on training runs and any of their competitors releases a better model that raises the ceiling?

They will have to train one that is comparable (or better), or the word will spread and users will move to the better model.

ceroxylon · 2025-09-28T18:55:25 1759085725

It has always annoyed me that a huge online mega-mart (that starts with the letter A) will advertise things in the same category as ones that have been recently bought, even though they are definitely not frequent purchases.

It feels like the algorithm is saying "oh, they bought a mattress... they must really love mattresses and want more!", when much better ads could be suggested with the wealth of data they have on shopping habits.

ceroxylon · 2025-09-26T08:35:16 1758875716

Is this not cognitive offload on steroids?