More

hyperpape · 2025-12-22T14:35:17 1766414117

What worries me is that the item I most know about seems like the problem statement is not that useful. The title is "early delay detection for shipments" and the text seems to mostly be about inventory (so the description is odd).

The frame is that enterprise solutions for these problems do not scale to smaller retail chains, and I find that at least believable.

The thing about this problem is that it's not hidden. It's extremely obvious. I work in SaaS targeting logistics (transportation for me, transportation, order management and warehouse management for our company), so I know a bit about this space, though I'm not in it directly. Plenty of people are solving this problem for bigger companies.

I put roughly 0% credence in the idea that many many people haven't noticed this is a problem for smaller companies. I put low credence on the idea that no one has tried to solve it for them via software. What I suspect is that this is a case where the basic idea is super-simple and obvious, but the reality of producing something that works in the market is hard.

None of that is to say that there couldn't be a real improvement here. It's just that I suspect it takes a real insight into why this problem hasn't been solved yet, and a new angle to make your solution work.

hyperpape · 2025-12-18T15:28:52 1766071732

CI systems operate according to rules that humans feel they understand and can apply mechanically. Moreover, they (primarily) fail closed.

hyperpape · 2025-12-17T19:34:36 1766000076

Salesmen making bad deals that boost their numbers and then don't make money in the long-term is one of the first things you learn when you work in an org that sells in the enterprise market.

wagwang · 2025-12-17T19:37:15 1766000235

Ur in a software bubble, there are millions of sales jobs where you sell a simple product and the only thing that matters is sale volume and maybe "dont be a dick". The really strategic sales process we employ in tech is the exception.

hyperpape · 2025-12-15T12:37:23 1765802243

Your comment is sufficiently generic that it’s impossible to tell what specific part of the article you’re agreeing with, disagreeing with, or expanding upon.

vintermann · 2025-12-15T12:48:02 1765802882

I disagree that performance should be a reason to choose running numbers over guids until you absolutely have to.

I think IDs should not carry information. Yes, that also means I think UUIDv7 was wrong to squeeze a creation date into their ID.

Isn't that clear enough?

mcny · 2025-12-15T13:22:38 1765804958

That's the creation date of that guid though. It doesn't say anything about the entity in question. For example, you might be born in 1987 and yet only get a social security number in 2007 for whatever reason.

So, the fact that there is a date in the uuidv7 does not extend any meaning or significance to the record outside of the database. To infer such a relationship where none exists is the error.

vintermann · 2025-12-15T13:54:01 1765806841

You can argue that, but then what is its purpose? Why should anyone care about the creation date of a by-design completely arbitrary thing?

I bet people will extract that date and use it, and it's hard to imagine use which wouldn't be abuse. To take the example of a PN/SSN and the usual gender bit: do you really want anyone to be able to tell that you got a new ID at that time? What could you suspect if a person born in 1987 got a new PN/SSN around 2022?

Leaks like that, bypassing whatever access control you have in your database, is just one reason to use real random IDs. But it's even a pretty good one in itself.

mcny · 2025-12-15T14:02:18 1765807338

> What could you suspect if a person born in 1987 got a new PN/SSN around 2022?

Thank you for spelling it for me. For the readers, It leaks information that the person is likely not a natural born citizen. The assumption doesn't have to be a hundred percent accurate, There is a way to make that assumption And possibly hold it against you.

And there are probably a million ways that a record created date could be held against you If they don't put it in writing, how will you prove They discriminated against you.

Thinking... I don't have a good answer to this. If data exists, people will extract meaning from it whether rightly or not.

infogulch · 2025-12-15T14:42:04 1765809724

To quote the great Mr Sparrow:

> The only rules that really matter are these: what a man can do and what a man can't do.

When evaluating security matters, it's better to strip off the moral valence entirely ("rightly") and only consider what is possible given the data available.

Another potential concerning implication besides citizenship status: a person changed their id when put in a witness protection program.

majorchord · 2025-12-15T15:01:15 1765810875

> You can argue that, but then what is its purpose? Why should anyone care about the creation date of a by-design completely arbitrary thing?

Pretty sure sorting and filtering them by date/time range in a database is the purpose.

miroljub · 2025-12-15T15:24:40 1765812280

If you need sorting and filtering by date, just add a timestamp to your table instead of misusing an Id column for that.

mixmastamyk · 2025-12-15T16:36:12 1765816572

That happens, in general. The benefit comes when it’s time to look up by uuid only; the prefix is an index to its disk block location.

dpark · 2025-12-15T18:44:34 1765824274

> the prefix is an index to its disk block location

What? This is definitely not the case and can’t be because B-tree nodes change while UUIDs do not.

mixmastamyk · 2025-12-15T18:45:48 1765824348

I didn’t mean that literally, but no longer editable. Was supposed to have “like” etc in there.

dpark · 2025-12-15T19:00:10 1765825210

But UUIDv7 doesn’t change that at all. It doesn’t matter what flavor of UUID you choose. The ID is always “like” an index to a block in that you traverse the tree to find the node. What UUIDv7 does is improve some performance characteristics when creating new entries and potentially for caching.

majorchord · 2025-12-15T16:38:14 1765816694

> just

It is easy to have strong opinions about things you are sheltered from the consequences of.

naasking · 2025-12-15T16:26:39 1765815999

Exactly, be explicit, don't shoehorn multiple purposes into a single column that's supposed to be a largely meaningless unique identifier.

dpark · 2025-12-15T18:42:22 1765824142

That is absolutely not the purpose. The specific purpose of uuidv7 is to optimize for B-Tree characteristics, not so you can craft queries based on the IDs being sequential.

This assumption that you can query across IDs is exactly what is being cautioned against. As soon as you do that, you are talking a dependency on an implementation detail. The contract is that you get a UUID, not that you get 48 bits of timestamp. There are 8 different UUID types and even v7 has more than one variant.

kentm · 2025-12-16T06:31:00 1765866660

B-trees too but also bucketing for formats like delta lake or iceberg, where having ids that cluster will reduce the number of files you need to update.

anamexis · 2025-12-15T14:08:28 1765807708

I would argue that is one of very few situations where leaking the timestamp that the ID was created when you already have the ID is a possible concern at all.

And when working with very large datasets, there are very significant downsides to large, completely random IDs (which is of course what the OP is about).

kentm · 2025-12-16T06:29:24 1765866564

> You can argue that, but then what is its purpose?

The purpose is to reduce randomness while still preserving probability of uniqueness. UUIDv4 come with performance issues when used to bucket data for updates, such as when there used as primary keys in a database.

A database like MySQL or PostgreSQL has sequential ids and you’d use those instead, but if you’re writing something like iceberg tables using Trino/Spark/etc then being able to generate unique ids (without using a data store) that tend to be clustered together is useful.

kube-system · 2025-12-15T15:57:13 1765814233

The time component either has meaning and it should be in its own column, or it doesn't have meaning and it is unnecessary and shouldn't be there at all.

I'm not a normalization fanatic, but we're only talking about 1NF here.

hyperpape · 2025-12-15T13:59:47 1765807187

Those are two unrelated points and the connection between them was unclear in the original post.

hxtk · 2025-12-15T18:37:47 1765823867

When I think "premature optimization," I think of things like making a tradeoff in favor of performance without justification. It could be a sacrifice of readability by writing uglier but more optimized code that's difficult to understand, or spending time researching the optimal write pattern for a database that I could spend developing other things.

I don't think I should ignore what I already know and intentionally pessimize the first draft in the name of avoiding premature optimization.

barrkel · 2025-12-15T13:27:34 1765805254

UUID v7 doesn't squeeze creation date in. If you treat it as anything other than a random sequence in your applications, you're just wrong.

zamadatix · 2025-12-15T14:03:20 1765807400

"What it does" and "what I think you should do with it" should not be treated as equivalent statements.

anamexis · 2025-12-15T13:47:30 1765806450

For what it’s worth, it was also completely unclear to me how you were responding to the article itself. It does not discuss natural keys at all.

hyperpape · 2025-12-15T11:02:04 1765796524

The thing is, none of us are mice, but many of us use Postgres.

It would be the equivalent of "if you're a middle-aged man" or "you're an American".

P.S. I think some of the considerations may be true for any system that uses B-Tree indexes, but several will be Postgres specific.

hyperpape · 2025-12-15T00:49:53 1765759793

Note that the author does not mention a single specific SaaS subscription he’s cancelled or seen a team cancel.

The only named product was Retool.

linsomniac · 2025-12-15T01:53:59 1765763639

We just had a $240/year renewal for teamretro.com come due, and while TeamRetro has a lot of components, we are only using the retro and ice breaker components. So I gave Claude Code a couple of prompts and I now have a couple static HTML pages that do the ice breaker (using local storage) and the retro (using a Google sheet as the storage backend, largely because it mimics our pre-teamretro process).

It took me no more than 2 hours to put those together. We didn't renew our TeamRetro

lelanthran · 2025-12-15T08:44:37 1765788277

> It took me no more than 2 hours to put those together. We didn't renew our TeamRetro

Okay, so two hours with an LLM vs maybe 2.5 days without an LLM in the best-case scenario (i.e. LLMs gave you a 10x boost. I would expect it to be less than that though, like maybe a 2x boost) - it sounds like it was always pretty cheap to replace the SaaS, but the business didn't do it.

TBH, the arguments were never "It would take too long to do ourselves", it was always "but then we'd have to maintain it ourselves".

The place I am consulting at now just moved (i.e. a month ago) from their in-house built ticketing system ($0/m as it had not needed maintenance for over a year) to Jira (~$2k/m).

In this specific case, it was literally 0 hours to avoid paying the SaaS, and they still moved, because they wanted some modern features (billing for time on support calls, etc) and figured that rather than update their in-house system to add support hours costing (a day, at most) they may as well move to a system that already had it.

(Joke's on them though - the Jira solution for support hours costing is not near the level of granularity they need, even with multiple paid plugins).

Once again, companies aren't using SaaS because it's cheaper or quicker; they could already quickly replace their SaaS with in-house.

linsomniac · 2025-12-15T14:52:51 1765810371

>.e. LLMs gave you a 10x boost. I would expect it to be less than that though, like maybe a 2x boost

I'm not a frontend guy, I'm an operations guy that sometimes does some backends. So it's likely a solid 2.5 days for me to build the pair of these, probably more I haven't touched Javascript in over a decade.

lelanthran · 2025-12-15T17:58:11 1765821491

> I'm not a frontend guy, I'm an operations guy that sometimes does some backends. So it's likely a solid 2.5 days for me to build the pair of these, probably more I haven't touched Javascript in over a decade.

Right, understood and agreed, but this was not about you and your specific skills or lack thereof; your anecdote was in support of an argument that companies would stop their SaaS because LLMs enable them to build in house.

That was your argument, right?

So in the absence of LLMs, if the company wanted to stop paying for the SaaS, would they have chosen you to do the replacement, or someone who had recent experience in the tech?

Look, we are interested in comparing the time taken to replace the SaaS with an LLM, and the time taken to replace the SaaS without LLM assistance.

That's really the only two scenarios under discussion, so lets explore those exhaustively:

1. Without LLMs: In the worst case scenario, the company had to pay for 2.5 days of employee time with the best case being 1 day of employee time. Lets go with something in-between like 1.5 days of dev time.

2. With LLMs: The company pays for 0.5 days of employee time (includes the overhead of token cost/subscription).

The difference between the only two scenarios that we have is literally a single day of employee costs!

I am skeptical that the company failed to leave the SaaS earlier because they didn't want to eat the cost of a 1.5 paid days for an employee, but a difference of a single day of cost was enough to tip the scales.

linsomniac · 2025-12-16T03:32:46 1765855966

>That was your argument, right?

I wasn't intending to make an argument, I was specifically replying to:

>does not mention a single specific SaaS subscription he’s cancelled

I was imagining it could start a thread of examples where it's happened.

>would they have chosen you to do the replacement, or someone who had recent experience in the tech?

I get what you're saying, but those aren't the only two options; they very likely would have chosen neither of those options. The resources we had available was an ops guy who is pretty handy with the LLMs.

I get the point you're making, I really do. My counterpoint is that there are some SaaSes out there that people can build replacements for by using the LLMs at no incremental cost.

>I am skeptical that the company failed to leave the SaaS earlier because they didn't want to eat the cost of a 1.5 paid days for an employee

Sure, I'd be skeptical about it when put that way as well. That's not how it played out however: We were having a retro and the guy running it said that our subscription was expiring the end of the month and wanted discussion about whether we wanted to purchase it for another year. 2 weeks later, before our next retro, I threw a prompt at Claude Code and asked a couple people to try out the result, incorporated their feedback and we ran the retro on it. We aren't planning to renew.

This was not something "the company" had a big discussion about; my boss made an offhand comment about it, and I did it as a side project while I was doing something else.

matwood · 2025-12-15T07:45:08 1765784708

How is teamretro a product to begin with? I feel like these SaaS’s exploded like js libraries.

linsomniac · 2025-12-15T14:47:04 1765810024

It is fine. We already had a retro process we were using, and teamretro didn't really enable us to change or improve our process so much as just continue doing our existing process. It is a solid product, but honestly we just used a google sheet prior to it and that worked fine as well.

nop_slide · 2025-12-15T02:33:48 1765766028

Or just don’t do retro and save even more time and money!

mikeocool · 2025-12-15T14:23:32 1765808612

If you're a typical software engineer, that time probably cost your company more than $240.

osn9363739 · 2025-12-15T22:50:42 1765839042

I didn't look at the product. But $240/year is nothing for an org. Plus not only just the time to make it. What about the time fixing bugs? hosting costs? backups? I'm sure there will be products that can be replaced (possibly this one), but I'm not convinced the death of SaaS is here yet.

jonathanharel · 2025-12-15T07:40:46 1765784446

Good point! For me the only change that happened so far (because the agentic product was better) was switching from JetBrains to Cursor. I'm sure this will happen with more products I use in the future

hyperpape · 2025-12-14T19:16:41 1765739801

The reality is that the HTML+CSS+JS is the canonical form, because it is the form that humans consume, and at least for the time being, we're the most important consumer.

The API may be equivalent, but it is still conceptually secondary. If it went stale, readers would still see the site, and it makes sense for a scraper to follow what readers can see (or alternately to consume both, and mine both).

The author might be right to be annoyed with the scrapers for many other reasons, but I don't think this is one of them.

pwg · 2025-12-14T20:18:51 1765743531

The reality is that the ratio of "total websites" to "websites with an API" is likely on the order of 1M:1 (a guess). From the scraper's perspective, the chances of even finding a website with an API is so low that they don't bother. Retrieving the HTML gets them 99% of what they want, and works with 100% of the websites they scrape.

Investing the effort to 1) recognize, without programmer intervention, that some random website has an API and then 2) automatically, without further programmer intervention, retrieve the website data from that API and make intelligent use of it, is just not worth it to them when retrieving the HTML just works every time.

edit: corrected inverted ratio

JimDabell · 2025-12-14T23:09:12 1765753752

I’ve implemented a search crawler before, and detecting and switching to the WordPress API was one of the first things I implemented because it’s such an easy win. Practically every WordPress website had it open and there are a vast number of WordPress sites. The content that you can pull from the API is far easier to deal with because you can just pull all the articles and have the raw content plus metadata like tags, without having to try to separate the page content from all the junk that whatever theme they are using adds.

> The reality is that the ratio of "total websites" to "websites with an API" is likely on the order of 1M:1 (a guess).

This is entirely wrong. Aside from the vast number of WordPress sites, the other APIs the article mentions are things like ActivityPub, oEmbed, and sitemaps. Add on things like Atom, RSS, JSON Feed, etc. and the majority of sites have some kind of alternative to HTML that is easier for crawlers to deal with. It’s nothing like 1M:1.

> Investing the effort to 1) recognize, without programmer intervention, that some random website has an API and then 2) automatically, without further programmer intervention, retrieve the website data from that API and make intelligent use of it, is just not worth it to them when retrieving the HTML just works every time.

You are treating this like it’s some kind of open-ended exercise where you have to write code to figure out APIs on the fly. This is not the case. This is just “Hey, is there a <link rel=https://api.w.org/> in the page? Pull from the WordPress API instead”. That gets you better quality content, more efficiently, for >40% of all sites just by implementing one API.

danielheath · 2025-12-14T21:17:06 1765747026

Right - the scraper operators already have an implementation which can use the HTML; why would they waste programmers time writing an API client when the existing system already does what they need?

alsetmusic · 2025-12-15T12:23:51 1765801431

> Investing the effort to 1) recognize, without programmer intervention, that some random website has an API

Hrm…

>> Like most WordPress blogs, my site has an API.

I think WordPress is big enough to warrant the effort. The fact that AI companies are destroying the web isn't news. But they could certainly do it a with a little less jackass. I support this take.

sdenton4 · 2025-12-14T20:32:24 1765744344

If only there were some convenient technology that could help us sort out these many small cases automatically...

Gud · 2025-12-14T20:39:12 1765744752

Then again, why bother?

junon · 2025-12-14T20:32:21 1765744341

1M:1 by the way, but I agree.

dlcarrier · 2025-12-14T19:59:43 1765742383

Not only is abandonment of the API possible, but hosts may restrict it on purpose, requiring paid access to use acessability/usability tools.

For example, Reddit encouraged those tools to use the API, then once it gained traction, they began charging exorbitant fees effectively blocking every blocking such tools.

culi · 2025-12-14T20:09:08 1765742948

That's a good point. Anyone who used the API properly were left with egg on their face and anyone who misused the site and just scraped HTML ended up unharmed

ryandrake · 2025-12-14T21:18:43 1765747123

Web developers in general have a horrible track record with many notable "rug pulls" and "lol the old API is deprecated, use the new one" behaviors. I'm not surprised that people don't trust APIs.

dolmen · 2025-12-14T21:34:01 1765748041

This isn't about people.

KK7NIL · 2025-12-14T23:08:21 1765753701

APIs are always about people, they're an implicit contract. This is also why API design is largely the only difficult part of software design (there are tough technical challenges too sometimes, but they are much easier to plan for and contain).

modeless · 2025-12-14T20:47:40 1765745260

I want AI to use the same interfaces humans use. If AIs use APIs designed specifically for them, then eventually in the future the human interface will become an afterthought. I don't want to live in a world where I have to use AI because there's no reasonable human interface to do anything anymore.

You know how you sometimes have to call a big company's customer support and try to convince some rep in India to press the right buttons on their screen to fix your issue, because they have a special UI you don't get to use? Imagine that, but it's an AI, and everything works that way.

sowbug · 2025-12-14T20:29:58 1765744198

I'm reminded of Larry Wall's advice that programs should be "strict in what they emit, and liberal in what they accept." Which, to the extent the world follows this philosophy, has caused no end of misery. Scrapers are just recognizing reality and being liberal in what they accept.

A1kmm · 2025-12-14T20:37:51 1765744671

I think it's Jon Postel who was the original source of the principle (it's often called Postel's Law). https://www.rfc-editor.org/rfc/rfc761#section-2.10 is an example dating back to 1980.

athenot · 2025-12-14T20:35:24 1765744524

This is Postel's Law, aka the Principle of Robustness:

    "be conservative in what you send, be liberal in what you accept"

https://en.wikipedia.org/wiki/Robustness_principle

llbbdd · 2025-12-14T19:23:12 1765740192

Yeah APIs exist because computers used to require very explicitly structured data, with LLMs a lot of the ambiguity of HTML disappears as far as a scraper is concerned.

swatcoder · 2025-12-14T20:07:24 1765742844

> LLMs a lot of the ambiguity of HTML disappears as far as a scraper is concerned

The more effective way to think about it is that "the ambiguity" silently gets blended into the data. It might disappear from superficial inspection, but it's not gone.

The LLM is essentially just doing educated guesswork without leaving a consistent or thorough audit trail. This is a fairly novel capability and there are times where this can be sufficient, so I don't mean to understate it.

But it's a different thing than making ambiguity "disappear" when it comes to systems that actually need true accuracy, specificity, and non-ambiguity.

Where it matters, there's no substitute for "very explicit structured data" and never really can be.

llbbdd · 2025-12-14T21:53:09 1765749189

Disappear might be an extremely strong word here, but yeah as you said as the delta closes between what a human user and an AI user are able to interpret from the same text, it becomes good enough for some nines of cases. Even if on paper it became mathematically "good enough" for high-risk cases like medical or government data structured data will still have a lot of value. I just think more and more structured data is going to be cleaned up from unstructured data except for those higher precision cases.

dmitrygr · 2025-12-14T19:43:49 1765741429

"computers used to require"

please do not write code. ever. Thinking like this is why people now think that 16GB RAM is to little and 4 cores is the minimum.

API -> ~200,000 cycles to get data, RAM O(size of data), precise result

HTML -> LLM -> ~30,000,000,000 cycles to get data, RAM O(size of LLM weights), results partially random and unpredictable

hartator · 2025-12-14T19:46:32 1765741592

If API doesn’t have the data you want, this point is moot.

dotancohen · 2025-12-14T19:52:13 1765741933

Not GP, but I disagree. I've written successful, robust web scrapers without LLMs for decades.

What do you think the E in perl stands for?

llbbdd · 2025-12-14T22:16:27 1765750587

This is probably just a parallel discussion. I written plenty of successful web scrapers without LLM's, but in the last couple years, I've written a lot more where I didn't need to look at the web markup for more than a few seconds first, if at all. Often you can just copy-paste an example page into the LLM and have it generate accurate, consistent selectors. It's not much different when integrating with a formal API, except that the API usually has more explicit usage rules, and APIs will also often restrict data that can very obviously be used competitively.

llbbdd · 2025-12-15T04:11:26 1765771886

Double-posting so I'm sorry but the more I read this the less it makes sense. The parent reply was talking about data that was straight-up not available via the API, how does perl help with that?

hartator · 2025-12-15T04:16:35 1765772195

Yeah, I don’t get it either. Someone trying AI to mass reply?

llbbdd · 2025-12-15T04:30:30 1765773030

Maybe Perl is more powerful than I've ever given it credit for.

shadowgovt · 2025-12-14T20:19:55 1765743595

On the other hand, I already have an HTML parser, and your bespoke API would require a custom tool to access.

Multiply that by every site, and that approach does not scale. Parsing HTML scales.

swiftcoder · 2025-12-14T20:42:35 1765744955

You already have a JSON and XML parser too, and the website offers standardised APIs in both of those

shadowgovt · 2025-12-14T22:43:26 1765752206

Not standardized enough; I can't guarantee the format of an API is RESTful, I can't know apriori what the response format is (arbitrary servers on the internet can't be trusted to be setting content type headers properly) or How to crawl it given the response data, etc. we ultimately never solved the problem of universal self- describing APIs, so a general crawling service can't trust they work.

In contrast, I can always trust that whatever is returned to be consumed by the browser is in the format that is consumable by a browser, because if it isn't the site isn't a website. Html is pretty much the only format guaranteed to be working.

dmitrygr · 2025-12-14T20:25:54 1765743954

parsing html -> lazy but ok

using an llm to parse html -> please do not

llbbdd · 2025-12-14T22:07:31 1765750051

> Lazy but ok

You're absolutely welcome on your own free time to waste it on whatever feels right

> using an llm to parse html -> please do not

have you used any of these tools with a beginner's mindset in like, five years?

llbbdd · 2025-12-14T21:50:29 1765749029

A lot of software engineering is recognizing the limitations of the domain that you're trying to work in, and adapting your tools to that environment, but thank you for your contribution to the discussion.

EDIT: I hemmed and hawed about responding to your attitude directly, but do you talk to people anywhere but here? Is this the attitude you would bring to normal people in your life?

Dick Van Dyke is 100 years old today. Do you think the embittered and embarrassing way you talk to strangers on the internet is positioning your health to enable you to live that long, or do you think the positive energy he brings to life has an effect? Will you readily die to support your animosity?

venturecruelty · 2025-12-14T19:50:57 1765741857

Weeping and gnashing of teeth because RAM is expensive, and then you learn that people buy 128 GB for their desktops so they can ask a chatbot how to scrape HTML. Amazing.

llbbdd · 2025-12-15T04:14:35 1765772075

The more I've thought about it the RAM part is hardly the craziest bit. Where the fuck do you even buy a computer with less than 4 cores in 2025? Pawn shop?

llbbdd · 2025-12-14T21:57:06 1765749426

isn't it ridiculous? This is hacker news. Nobody with the spare time to post here is living on the street. Buy some RAM or rent it. I can't believe honestly how many people on here I see bemoaning the fact that they haven't upgraded their laptops in 20 years and it's somehow anyone else's problem.

shadowgovt · 2025-12-14T23:00:41 1765753241

I may be out of the loop; is system RAM key for LLMs? I thought they were mostly graphics RAM constrained.

lechatonnoir · 2025-12-14T20:14:45 1765743285

it's kind of hard to tell what your position is here. should people not ask chatbots how to scrape html? should people not purchase RAM to run chatbots locally?

cr125rider · 2025-12-14T19:43:18 1765741398

Exactly. This parallels “the most accurate docs are the passing test cases”

btown · 2025-12-14T20:03:08 1765742588

I like to go a level beyond this and say: "Passing tests are fine and all, but the moment your tests mock or record-replay even the smallest bit of external data, the only accurate docs are your production error logs, or lack thereof."

handfuloflight · 2025-12-15T00:15:05 1765757705

Absolutely.

1718627440 · 2025-12-15T10:26:15 1765794375

This is something the XML ecosystem (which is now getting killed) actually got right and is the primary reason people don't want to have it killed.

echelon · 2025-12-14T19:36:28 1765740988

[flagged]

edent · 2025-12-14T19:45:04 1765741504

As I wrote:

> Like most WordPress blogs, my site has an API.

WordPress, for all its faults, powers a fair number of websites. The schema is identical across all of them.

gldrk · 2025-12-14T19:55:45 1765742145

If you decide to move your blog to another platform, are you going to maintain API compatibility?

sdenton4 · 2025-12-14T20:35:25 1765744525

Shouldn't the llm that all this scraping is powering be able to help figure out which websites have an API and how to use it?

_puk · 2025-12-14T20:48:22 1765745302

Is there a meta tag to point to the API / MCP?

hyperpape · 2025-12-14T14:15:55 1765721755

Being a little bit clever can lead you to make some pretty bad mistakes.

Yes, the distance according to roads can be different from the distance as the crow flies. No, it cannot realistically be 10x the distance when the crow's distance is 2500 miles.

meindnoch · 2025-12-14T15:28:21 1765726101

But what if you count the distance of going in and out of each tiny crack on the road surface?

scrame · 2025-12-14T16:04:32 1765728272

Look up "fractal dimensions".

meindnoch · 2025-12-14T17:23:41 1765733021

I know :)

hyperpape · 2025-12-11T18:58:22 1765479502

> PostgreSQL isn't "Generic SQL Database 47" it's the successor to Ingres (Post-Ingres-SQL).

Indeed. This helps me know that I'm using a database more modern than Ingres. I chose not to use Oracle or SQL Server because they might have predated Ingres.

Just one question: what's Ingres, and why do I care about it? Of course, I don't, which makes Postgres no more useful of a name than "fluffnutz" or "hooxup". That said, over time, I've come to like the name Postgres.

indymike · 2025-12-11T19:18:33 1765480713

Sometimes names have great value at the beginning of the project. In this case it explains exactly what the project is and will be... That said, marketing decisions like naming a product often don't age well.

lr0 · 2025-12-11T19:16:28 1765480588

You don't need to know what Ingres is. "PostgreSQL" still tells you it's SQL-related, which is infinitely more than "fluffnutz" tells you. And once you learn it's a database, the name reinforces that knowledge forever. Good luck remembering what "fluffnutz" does in 6 months.

hyperpape · 2025-12-11T20:25:32 1765484732

That's a really nice mnemonic. I wish I lived in an alternate universe where Postgres was called PostgreSQL so that it was easier to remember. Perhaps if we start using that, it will take over, like how everyone calls the Go project Golang.

necovek · 2025-12-12T03:02:02 1765508522

When Google introduced the Go language, it was impossible to google for any content related to it. So community quickly pivoted to always saying golang ;)

(At least that's how I remember it as I was "why name a language like that when you know it won't be searchable")

lr0 · 2025-12-11T21:49:51 1765489791

https://www.postgresql.org

https://en.wikipedia.org/wiki/PostgreSQL

hyperpape · 2025-12-11T23:44:27 1765496667

I know.

My point is that almost everyone refers to it as Postgres, because they do not actually value the descriptiveness of "PostgreSQL".

quesera · 2025-12-12T17:47:02 1765561622

> almost everyone refers to it as Postgres, because they do not actually value the descriptiveness of "PostgreSQL".

Also because the original name was, just, "Postgres". Stylized as POSTGRES.

PostgreSQL is an awful neologism (OK it's been around for a while now), and I honestly thought that they had decided to switch back to the original, and clearly superior, name. :) I recall it being under discussion several years back, and I am surprised it did not happen.

halper · 2025-12-12T04:49:10 1765514950

I always thought it was because it is more obvious how to pronounce "postgres" than "PostgreSQL".

hyperpape · 2025-12-07T15:21:21 1765120881

It's awful that there are these hallucinated citations, and the researchers who submitted them ought to be ashamed. I also put some of the blame on the boneheaded culture of academic citations.

"Compression has been widely used in columnar databases and has had an increasing importance over time.[1][2][3][4][5][6]"

Ok, literally everyone in the field already knows this. Are citations 1-6 useful? Well, hopefully one of them is an actually useful survey paper, but odds are that 4-5 of them are arbitrarily chosen papers by you or your friends. Good for a little bit of h-index bumping!

So many citations are not an integral part of the paper, but instead randomly sprinkled on to give an air of authority and completeness that isn't deserved.

I actually have a lot of respect for the academic world, probably more than most HN posters, but this particular practice has always struck me as silly. Outside of survey papers (which are extremely under-provided), most papers need many fewer citations than they have, for the specific claims where the paper is relying on prior work or showing an advance over it.

mccoyb · 2025-12-07T20:48:20 1765140500

That's only part of the reason that this type of content is used in academic papers. The other part is that you never know what PhD student / postdoc / researcher will be reviewing your paper, which means you are incentivized to be liberal with citations (however tangential) just in case someone is reading your paper, and has the reaction "why didn't they cite this work, of which I had some role in?"

Papers with a fake air of authority of easily dispatched with. What is not so easily dispatched with is the politics of the submission process.

This type of content is fundamentally about emotions (in the reviewer of your paper), and emotions is undeniably a large factor in acceptance / rejection.

zipy124 · 2025-12-07T21:03:01 1765141381

Indeed. One can even game review systems by leaving errors in for the reviewers to find so that they feel good about themselves and that they've done their job. The meta-science game is toxic and full of politics and ego-pleasing.