> Did you read the article? He doesn’t recommend natural keys, he recommends integer-based surrogates.
I am not a cryptographer, but I would want his recommendation reviewed by a cryptographer. And then I would have to implement it. UUIDs have been extensively reviewed by cryptographers, I have a variety of excellent implementations I can use, I know they solve the problem well. I know they can cause performance issues; they're a security feature that is easy to implement, and I can deal with the performance issues if and when they crop up. (Which, in my experience, it's unusual. Even at a large company, most databases I encounter do not have enough data. I will err on the side of security until it becomes a problem, which is a good problem to have.)
Why they are a security feature? They are not, the article even says it. Even if UUID4 are random, nobody guarantees that they are generated with a cryptographically secure random number generator, and in fact most implementations don't!
The reason why in a lot of context you use UUID is when you have a distributed system where you want your client to decide the ID that is then stored in multiple systems that not communicate. This is surely a valid scenario for random UUID.
To me the rule is use UUID as a customer-facing ID for things that has to have an identity (e.g. a user, an order, etc) and expose it publicly through APIs, use integer ID as internal identifier that are used to create relations between entities, and interal IDs are always kept private. That way numeric ID that are more efficient remain inside the database and are used for joining data, UUID is used only for accessing the object from an API (for example) but then internally when joining (where you have to deal with a lot of rows) you can use the more efficient numeric ID.
By the way, I think that the thing of "using UUID" came from NoSQL databases, where surely you use an UUID, but also you don't have to join data. People than transposed a best practice in one scenario to SQL, where its not really that best practice...
If a sequential ID is exposed to the client, the client can trivially use it to determine the number of records and the relative age of any records. UUID solves this, and the use of a cryptographically secure number generator isn't really necessary for it to solve this. The author's scheme might be similarly effective, but I trust UUIDs to work well. There are obviously varying ways to hide this information other than UUIDs, but UUIDs are simple and I don't have to think about it, I just get the security benefits. I don't have to worry about not exposing IDs to the clients, I can do it freely.
I have never seen anyone post an actual example of the German Tank problem creating an issue for them, only that it’s possible.
> I don’t have to think about it
And here we have the main problem of most DB issues I deal with on a daily basis - someone didn’t want to think about the implications of what they were doing, and it’s suddenly then my emergency because they have no idea how to address it.
If you can predict user IDs this is extremely useful when you're trying to come up with an exploit that might create a privileged user, or perhaps you can create some object you have access to that is owned by users that will be created in the near future.
When I say "I don't have to think about it" I mean I don't have to think about the ways an attacker might be able to predict information about my user ids which they could use to gain access to accounts, because I know they cannot predict information about user ids.
You are dismissing the implications of using something that is less secure than UUIDs and you haven't convinced me I'm the one failing to think through the implications. I know there are performance problems, I know they might require some creative solutions. I am not worried about unpredictable performance issues, I am worried about unpredictable security problems.
Perhaps this is my bias coming through. I work with DBs day in and day out, and the main problem I face is performance from poorly-designed schemas and queries; next largest issue is referential integrity violations causing undefined behavior. The security issues I’ve found were all people doing absurdly basic stuff, like exposing an endpoint that dumped passwords.
To me, if you’re relying on having a matching PK as security, something has already gone wrong. There are ways to provide AuthN and AuthZ other than that. And yes, “defense in depth,” but if your base layer is “we have unguessable user ids,” IME people will become complacent, and break it somewhere else in the stack.
> We generate every valid 7-digit North American phone number, then for every area code, send every number in batches of 40000
> Time to go do something else for a while. Just over 27 hours and one ill-fated attempt at early season ski touring later, the script has finished happily, the logfile is full of entries, and no request has failed or taken longer than 3 seconds. So much for rate limiting. We’ve leaked every Freedom Chat user’s phone number
I am not a cryptographer, but I would want his recommendation reviewed by a cryptographer. And then I would have to implement it. UUIDs have been extensively reviewed by cryptographers, I have a variety of excellent implementations I can use, I know they solve the problem well. I know they can cause performance issues; they're a security feature that is easy to implement, and I can deal with the performance issues if and when they crop up. (Which, in my experience, it's unusual. Even at a large company, most databases I encounter do not have enough data. I will err on the side of security until it becomes a problem, which is a good problem to have.)