> Because it is difficult to assume what the best encoding will be for any given workload, database systems should dynamically choose encodings based on storage and workload characteristics.
It would be better just to take the storage requirement on the chin and not add a gratuitous variation in encoding which will bite you on the ass somehow (or someone else).
As much as possible, pick one way of doing one thing. Your stuff already has thousands of things to do. Each time you do something in two or more ways, you add combinations between that and surrounding things being done in two or more ways.
The combinatorial explosion problem is nicely solved by defining good interfaces. C++ gives you iterators and algorithms that work on iterators. Clojure has sequence interfaces and functions that work on all sequence types.
That just improves the organization of the program; it doesn't get rid of the increased risks of doing the same thing in N ways that could be pined down to one.
Do this thing in 3 ways, do that one in 4, do another one in 2 and you have 3x4x2 = 24 combinations which are entirely gratuitous compared to the 1 combination that exists if all three things are done one way each.
Oh, you don't have to test the combinations because the code is bug free, is that the argument? Which is because of some good organization?
Those things are nicely isolated so 3 + 4 + 2 unit tests, and we are done?
> Because each element requires at least a 16 byte representation, both tiny and repeated short strings use more memory than they otherwise would.
In a wider view, that depends. If one is using a general-purpose heap for string storage and a 64-bit instruction set architecture, the heap is often aligning and padding out allocations to such multiples already.
> The concept of inlined strings with prefixes (called “German Strings” by Andy Pavlo, in homage to TUM, where the Umbra paper that describes them originated) has been used in many recent database systems (Velox, Polars, DuckDB, CedarDB, etc.) and was introduced to Arrow as a new StringViewArray[^3] type. Arrow’s original StringArray is very memory efficient but less effective for certain operations. StringViewArray accelerates string-intensive operations via prefix inlining and a more flexible and compact string representation.
Seems to be nothing more than they were invented at a German university. I spent quite some time thinking it had something to do with German’s sometimes-SOV word order.
It also applies to infitives and participles and the verb in nominalized noun-verb compounds. So the rule is closer to "the verb is at the end of its grammatical unit, except for the finite verb in a main clause, which appears in second position." https://en.wikipedia.org/wiki/V2_word_order
It would be better just to take the storage requirement on the chin and not add a gratuitous variation in encoding which will bite you on the ass somehow (or someone else).
As much as possible, pick one way of doing one thing. Your stuff already has thousands of things to do. Each time you do something in two or more ways, you add combinations between that and surrounding things being done in two or more ways.