The article sums up some valid arguments against UUIDv4 as PKs but the solution ...

mort96 · 2025-12-15T11:09:07 1765796947

I tend to avoid UUIDv7 and use UUIDv4 because I don't want to leak the creation times of everything.

Now this doesn't work if you actually have enough data that the randomness of the UUIDv4 keys is a practical database performance issue, but I think you really have to think long and hard about every single use of identifiers in your application before concluding that v7 is the solution. Maybe v7 works well for some things (e.g identifiers for resources where creation times are visible to all with access to the resource) but not others (such as users or orgs which are publicly visible but without publicly visible creation times).

nbadg · 2025-12-15T11:31:28 1765798288

I'm also not a huge fan of leaking server-side information; I suspect UUIDv7 could still be used in statistical analysis of the keyspace (in a similar fashion to the german tank problem for integer IDs). Also, leaking data about user activity times (from your other comment) is a *really* good point that I hadn't considered.

I've read people suggest using a UUIDv7 as the primary key and a UUIDv4 as a user-visible one as a remedy.

My first thought when reading the suggestion was, "well but you'll still need an index on the v4 IDs, so what does this actually get you?" But the answer is that it makes joins less expensive; you only require the index once, when constructing the query from the user-supplied data, and everything else operates with the better-for-performance v7 IDs.

To be clear, in a practical sense, this is a bit of a micro-optimization; as far as I understand it, this really only helps you by improving the data locality of temporally-related items. So, for example, if you had an "order items" table, containing rows of a bunch of items in an order, it would speed up retrieval times because you wouldn't need to do as many index traversals to access all of the items in a particular order. But on, say, a users table (where you're unlikely to be querying for two different users who happen to have been created at approximately the same time), it's not going to help you much. Of course the exact same critique is applicable to integer IDs in those situations.

Although, come to think of it, another advantage of a user-visible v4 with v7 Pk is that you could use a different index type on the v4 ID. Specifically, I would think that a hash index for the user-visible v4 might be a halfway-decent way to go.

I'm still not sure either way if I like the idea, but it's certainly not the craziest thing I've ever heard.

thayne · 2025-12-16T03:02:31 1765854151

I think a bigger benefit from doing that would be that inserts would be cheaper. Instead of an expensive insert into the middle of an index for every table that needs an index on that key, you can do a cheaper insert at the end of the index for all of them except for the one that uses uuid4.

But if you are doing that, why not just use an incrementing integer instead of a uuidv7?

nbadg · 2025-12-16T08:25:05 1765873505

Certainly for many applications, the autoint approach would be fine.

The benefit of uuid in this case is that it allows horizontally scalable app servers to construct PKs on their own without risk of collisions. In addition to just reducing database load by doing the ID generation on the app server (admittedly usually a minor benefit), this can be useful either to simplify insert queries that span multiple tables with FK relationships (potentially saving some round trips in the process) or in very niche situations where you have circular dependencies in non-nullable FKs (with the constraint deferred until the end of the transaction).

throw0101a · 2025-12-15T11:53:09 1765799589

> I tend to avoid UUIDv7 and use UUIDv4 because I don't want to leak the creation times of everything.

See perhaps "UUIDv47 — UUIDv7-in / UUIDv4-out (SipHash‑masked timestamp)":

* https://github.com/stateless-me/uuidv47

* Sept 2025: https://news.ycombinator.com/item?id=45275973

wongarsu · 2025-12-15T13:38:07 1765805887

If that kind of stuff is on the able you can also use boring 64bit integer keys and encrypt those (e.g. [1]). Which in the end is just a better thought out version of what the article author did.

UUIDv47 might have a space if you need keys generated on multiple backend servers without synchronization. But it feels very niche to me.

1: https://wiki.postgresql.org/wiki/XTEA_(crypt_64_bits)

cdmckay · 2025-12-15T11:15:20 1765797320

Out of curiosity, why is it an issue if you leak creation time?

bruce511 · 2025-12-15T11:48:47 1765799327

The issue will be very context specific. In other words to (reasonably) answer the question, we'd have to judge each application individually.

For one example, say you were making voting-booth software. You really don't want a (hidden) timestamp attached to each vote (much less an incrementing id) because that would break voter confidentiality.

More generally, it's more a underlying principle of data management. Not leaking ancillary data is easier to justify than "sure we leak the date and time of the record creation, but we can't think of a reason why that matters."

Personally I think the biggest issue are "clever" programmers who treat the uuid as data and start displaying the date and time. This leads to complications ("that which is displayed, the customer wants to change"). It's only a matter of time before someone declares the date "wrong" and it must be "fixed". Not to mention time zone or daylight savings conversions.

mort96 · 2025-12-15T11:28:10 1765798090

Well you're leaking user data. I'm sure you can imagine situations where "the defendant created an account on this site on this date" could come up. And the user could have created that account not knowing that the creation date is public, because it's not listed anywhere in the publicly viewable part of the profile other than the UUID in the URL.

koakuma-chan · 2025-12-15T15:18:15 1765811895

Discord is doing fine.

mort96 · 2025-12-15T15:37:02 1765813022

Hacker news is also doing fine, even though I can just click your profile and see you joined in october 2024. It doesn't matter for every use case.

But there are cases where it matters. Using UUIDv7 for identifiers means you need to carefully consider the security and privacy implications every time you create a new table identified by a UUID, and you'll possibly end up with some tables where you use v4 and some where you use v7. Worst case, you'll end up with painful migrations from v7 to v4 as security review identifies timestamped identifiers as a security concern.

nish__ · 2025-12-15T12:17:17 1765801037

Pretty much every social media app has a "Member since X" visible on public profiles. I don't think it's an issue.

mort96 · 2025-12-15T12:22:42 1765801362

Who said I was talking about social media?

nish__ · 2025-12-15T12:25:20 1765801520

Well where else do users have public profiles?

0x3f · 2025-12-15T12:35:24 1765802124

The whole point though is that the ID itself leaks info, even if the profile is not public. There are many cases where you reference an object as a foreign key, even if you can't see the entire record of that foreign key.

nish__ · 2025-12-15T17:49:47 1765820987

I can't think of any.

strbean · 2025-12-15T21:00:08 1765832408

Sending a friend request is an obvious example.

nish__ · 2025-12-18T04:49:32 1766033372

Ok but in that case, why not use the user ID as the primary key?

natch · 2025-12-15T13:38:01 1765805881

If your system (pseudo-) random number generator (RNG) is compromised to derive a portion of its entropy from things that are knowable by knowing the time when the function ran, then the search space for cracking keys created around the same time can be shrunken considerably.

This doesn’t even rely on your system’s built-in RNG being low quality. It could be audited and known to avoid such issues but you could have a compromised compiler or OS that injects a doctored RNG.

kreetx · 2025-12-15T11:25:56 1765797956

E.g, if your service users have timestamp as part of the key and this data is visible to other users, you would know when that account was created. This could be an issue.

saaspirant · 2025-12-15T12:33:16 1765801996

There was a HN comment about competitors tracking how many new signups are happening and increasing the discounts/sales push based on that. Something like this.

JetSetIlly · 2025-12-15T12:51:17 1765803077

In a business I once worked for, one of the users of the online ordering system represented over 50% of the business' income, something you wouldn't necessarily want them to know.

However, because the online ordering system assigned order numbers sequentially, it would have been trivial for that company to determine how important their business was.

For example, over the course of a month, they could order something at the start of the month and something at the end of the month. That would give them the total number of orders in that period. They already know how many orders they have placed during the month, so company_orders / total_orders = percentage_of_business

It doesn't even have to be accurate, just an approximation. I don't know if they figured out that they could do that but it wouldn't surprise me if they had.

pezezin · 2025-12-15T22:32:04 1765837924

This is also something that depends heavily on regulations. In my home country, invoice numbers have to be sequential by law, although you can restart the numbering every year.

JetSetIlly · 2025-12-16T08:23:35 1765873415

Yes, even if it's not a legal requirement it's definitely best practice to have sequential invoice numbers. I thought about this at the time but these numbers aren't invoice numbers, only order numbers.

hexbin010 · 2025-12-16T02:16:12 1765851372

A global sequence, or sequence-per-account? I suspect the latter?

pezezin · 2025-12-16T11:31:06 1765884666

A sequence per "series", where a series can be a fiscal year, a department or category, etc. But I am not sure if you can have one series per customer, I only find conflicting information.

You can have more details here, in the section "Complete invoice":

https://sede.agenciatributaria.gob.es/Sede/en_gb/iva/factura...

https://www.boe.es/buscar/act.php?id=BOE-A-2012-14696#a6 (Spanish only)

0x3f · 2025-12-15T12:51:32 1765803092

That's happening everywhere. You can order industrial parts from a Fortune 500 and check some of the numbers on it too, if they're not careful about it.

dboreham · 2025-12-15T13:40:56 1765806056

Apart from all the other answers here: an external entity knowing the relative creation time for two different accounts, or just that the two accounts were created close in time to each other can represent a meaningful information leak.

robertlagrant · 2025-12-15T11:18:05 1765797485

Depends on the data. If you use a primary key in data about a person that shouldn't include their age (e.g. to remove age-based discrimination) then you are leaking an imperfect proxy to their age.

lwhi · 2025-12-15T11:26:37 1765797997

So the UUID could be used as an imperfect indicator of a records created time?

benterix · 2025-12-15T11:29:43 1765798183

UUIDv7 but not UUIDv4.

lwhi · 2025-12-15T11:34:16 1765798456

I suppose timing attacks become an issue too.

wongarsu · 2025-12-15T13:46:40 1765806400

UUIDv7 still have a lot of random bits. Most attacks around creating lots of ids are foiled by that

Bombthecat · 2025-12-15T11:48:47 1765799327

Admins, early users, founders, CEOs etc etc would have althe lowest creation time...

barrkel · 2025-12-15T13:31:34 1765805494

You shouldn't generally use PKs as public identifiers, least of all UUIDs, which are pretty user hostile.

mort96 · 2025-12-15T13:40:32 1765806032

I really don't see the issue with having a UUID in a URL.

formerly_proven · 2025-12-15T11:56:38 1765799798

If all you want is to obfuscate the fact that your social media site only has 200 users and 80 posts, simply use a permutation over the autoincrement primary key. E.g. IDEA or CAST-128, then encode in base64. If someone steps on your toes because somewhere in your codebase you're using a forbidden legacy cipher, just use AES-128. (This is sort of the degenerate/tautological base case of format-preserving encryption)

(What do you think Youtube video IDs are?)

enz · 2025-12-15T13:11:25 1765804285

The problem with this approach is that you now have to manage a secret key/secret for a (maybe) a very long time.

I shared this article a few weeks ago, discussing the problems with this kind of approach: https://notnotp.com/notes/do-not-encrypt-ids/

I believe it can make sense in some situations, but do you really want to implement such crypto-related complexity?

formerly_proven · 2025-12-15T16:05:29 1765814729

The article is self-contradictory in that it acts like that key is super-important ("Operations becomes a nightmare. You now have a cryptographic secret to manage. Where does this key live? Protected by a wrapping key living in a KMS or HSM? Do you use the same key across prod, staging, and dev? If dev needs to test with prod data, does it need access to prod encryption keys? What about CI pipelines? Local developer machines?") but then also acknowledges that we're talking about an obfuscation layer of stuff which is not actually sensitive ("to hide timestamps that aren't sensitive"). Don't get me wrong, it's a definitive drawback for scaling the approach, but most applications have to manage various secrets, most of which are actually important. E.g. session signing keys, API keys etc. It's still common for applications to use signed session with RCE data formats. The language from that article, while not wrong, is much more apt for those keys.

That being said, while fine for obfuscation, it should not be used for security for this purpose, e.g. hidden/unlisted links, confirmation links and so on. Those should use actual, long-ish random keys for access, because the inability to enumerate them is a security feature.

benterix · 2025-12-15T12:47:07 1765802827

I always thought they are used and stored as they are because the kind of transformation you mention seems terribly expensive given the YT's scale, and I don't see a clear benefit of adding any kind of obfuscation here.

pdimitar · 2025-12-15T12:21:29 1765801289

> What do you think Youtube video IDs are?

I actually haven no idea. What are they?

(Also what is the format of their `si=...` thing?)

intalentive · 2025-12-15T16:56:38 1765817798

Can’t recall where I heard this, but I’m pretty sure the si=… is tracking information that associates the link with the user who shared it.

pdimitar · 2025-12-15T17:17:37 1765819057

Oh absolutely, I am just wondering _what_ does it contain.

wpollock · 2025-12-19T00:17:48 1766103468

YouTube video ids are just integers in a base-64 encoding, modified to be URL safe.

pdimitar · 2025-12-19T00:37:13 1766104633

Interesting. Any examples? I mean, I can probably reverse-engineer something myself but just curious.

I am much more interested in the `si` parameter.. but I am fairly sure nobody outside of Google knows what it is exactly.

Retr0id · 2025-12-15T13:02:31 1765803751

Why not use AES-128 by default? Your CPU has instructions to accelerate AES-128.

conradfr · 2025-12-15T13:45:06 1765806306

Can't you just change the starting value of your sequence?

stickfigure · 2025-12-15T17:28:33 1765819713

In Postgres I often like to use a single sequence for everything. It leaks some information yes but in a busy system it tends to be "obscure enough".

x0x0 · 2025-12-15T19:06:21 1765825581

It's not leaking that's the concern. It's that not having the names of objects be easily enumerable is a strongly security-enhancing feature of a system.

Yes of course everyone should check and unit test that every object is owned by the user or account loading it, but demanding more sophistication from an attacker than taking "/my_things/23" and loading "/my_things/24" is a big win.

stickfigure · 2025-12-15T21:57:52 1765835872

With a single sequence and a busy system, the ids for most high-level tables/collection are extremely sparse. This doesn't mean they can't be enumerated, but you will probably notice if you suddenly start getting hammered with 404s or 410s or whatever your system generates on "not found".

Also, if most of your endpoints require auth, this is not typically a problem.

It really depends on your application. But yes, that's something to be aware of. If you need some ids to be unguessable, make sure they are not predictable :-)

thayne · 2025-12-16T03:34:59 1765856099

If you have a busy system, a single sequence is going to be a pretty big performance bottleneck, since every resource creation will need to acquire a lock on that sequence.

x0x0 · 2025-12-15T23:24:53 1765841093

> Also, if most of your endpoints require auth, this is not typically a problem.

Many systems are not sparse, and separately, that's simply wrong. Unguessable names is not a primary security measure, but a passive remediation for bugs or bad code. Broken access control remains an owasp top 10, and idor is a piece of that. Companies still get popped for this.

See, eg, google having a bug in 2019, made significantly less impactful by unguessable names https://infosecwriteups.com/google-did-an-oopsie-a-simple-id...