I still don’t understand how a single language can have multiple (what is it now, half a dozen?) different type checkers, all with different behaviour.
Do library authors have to test against every type checker to ensure maximum compatibility? Do application developers need to limit their use of libraries to ones that support their particular choice of type checker?
Users of a library will generally instruct their type-checker not to check the library.
So only the outer API surface of the library matters. That’s mostly explicitly typed functions and classes so the room for different interpretations is lower (but not gone).
This is obviously out the window for libraries like Pydantic, Django etc with type semantics that aren’t really covered by the spec.
You’re talking about a duck typed language with optional type annotations. I love python but that’s a combination that should explain a bit why there are so many different implementations.
The annotations have fairly well defined semantics, the behavior of typecheckers in the absence of annotations, where types are ambiguous (a common case being when the type is a generic collection type but the defining position is assignment to an empty collection so that the correct specialization of the generic type is ambiguous) is less defined.
TypeScript widens the type of x to allow `number | string`, there are no type errors below:
const x = []
x.push(1)
type t = typeof x
// ^? type t = number[]
x[0] = "new"
type t2 = typeof x
// ^? type t2 = (number | string)[]
const y = x[0] + "oops"
// ^? const y: string
> Either way, you didn't annotate the code so it's kind of pointless to discuss.
There are several literals in that code snippet; I could annotate them with their types, and this code would still be exactly as it is. You asked why there are competing type checkers, and the fact that the language is only optionally typed means ambiguity like that example exists, and should be a warning/bug/allowed; choose the type checker that most closely matches the semantics you want to impose.
> There are several literals in that code snippet; I could annotate them with their types, and this code would still be exactly as it is.
Well, no, there is one literal that has an ambiguous type, and if you annotated its type, it would resolve entirely the question of what a typechecker should say; literally the entire reason it is an open question is because that one literal is not annotated.
True, you could annotate 3 of the 4 literals in this without annotating the List, which is ambiguous. In the absence of an explicit annotation (because those are optional), type checkers are left to guess intent to determine whether you wanted a List[Any] or List[number | string], or whether you wanted a List[number] or List[string].
> And the fact that python doesn't specify the semantics of its type annotations is a super interesting experiment.
That hasn't been a fact for quite a while. Npw, it does specify the semantics of its type annotations. It didn't when it first created annotations for Python 3.0 (PEP 3107), but it has progressively since, starting with Python 3.5 (PEP 484) through several subsequent PEPs including creation of the Python Typing Council (PEP 729).
> I could annotate them with their types, and this code would still be exactly as it is.
Well, no, you didn't. Because it's not clear whether the list is a list of value or a list of values of a distinct type. And there are many other ways you could quibble with this statement.
> Do library authors have to test against every type checker to ensure maximum compatibility?
Yes, but in practice, the ecosystem mostly tests against mypy. pyright has been making some inroads, mostly because it backs the diagnostics of the default VS Code Python extension.
> Do application developers need to limit their use of libraries to ones that support their particular choice of type checker?
You can provide your own type stubs instead of using the library's built-in types or existing stubs.
I am not that surprised, to be honest. Basically every C/C++ static analyzer out there does (among other things) some amount of additional "custom" type checking to catch operations that are legal up to the standard, but may cause issues at runtime. Of course in Python you have gradual typing which adds to the complexity, but truly well-formalised type systems are not that common in the industry.
> do the build process on alpine linux and […] statically link musl libc
IIRC it used to be common to do builds on an old version of RHEL or CentOS and dynamically link an old version of glibc. Binaries would then work on newer systems because glibc is backwards compatible.
If you need glibc for any kind of reason, that approach is still used. But that won’t save you if no glibc is available. And since the folks here want to produce a musl build anyways for alpine, the easier approach is to just go for musl all the way.
> The values in tuples cannot change. The values that keys point to in a frozen dict can?
The entries of a tuple cannot be assigned to, but the values can be mutated. The same is true for a `frozendict` (according to the PEP they don't support `__setitem__`, but "values can be mutable").
I wonder whether Raymond Hettinger has an opinion on this PEP. A long time ago, he wrote: "freezing dicts is a can of worms and not especially useful".
This is why I love how Rust approached this; almost by accident to make borrow checking work. Every reference is either mutable or not, and (with safe code), you can't use an immutable reference to get a mutable reference anywhere down the chain. So you can slowly construct a map through a mutable reference, but then return it out of a function as immutable, and that's the end of it. It's no longer ever mutable, and no key or value is either. There's no need to make a whole new object called FrozenHashMap, and then FrozenList, and FrozenSet, etc. You don't need a StringBuilder because String is mutable, unless you don't want it to be. It's all just part of the language.
Kotlin _kinda_ does this as well, but if you have a reference to an immutable map in Kotlin, you are still free to mutate the values (and even keys!) as much as you like.
Only if you're returning a reference or wrapping it in something that will only ever return a reference. If you return an object by value ('owned'), then you can do what you like with it and 'mut' is just an light guardrail on that particular name for it.
You cannot return an immutable version. You can return it owned (in which case you can assign/reassign it to a mut variable at any point) or you can take a mut reference and return an immutable reference - but whoever is the owner can almost always access it mutably.
Arg, you’re right. Not sure what I was thinking there. I still think my point stands, because you get the benefits of immutability, but yeah, I didn’t explain it well.
It explains a lot about the design of Python container classes, and the boundaries of polymorphism / duck typing with them, and mutation between them.
I don't always agree with the choices made in Python's container APIs...but I always want to understand them as well as possible.
Also worth noting that understanding changes over time. Remember when GvR and the rest of the core developers argued adamantly against ordered dictionaries? Haha! Good times! Thank goodness their first wave of understanding wasn't their last. Concurrency and parallelism in Python was a TINY issue in 2006, but at the forefront of Python evolution these days. And immutability has come a long way as a design theme, even for languages that fully embrace stateful change.
> Also worth noting that understanding changes over time. Remember when GvR and the rest of the core developers argued adamantly against ordered dictionaries? Haha! Good times!
The new implementation has saved space, but there are opportunities to save more space (specifically after deleting keys) that they've now denied themselves by offering the ordering guarantee.
Ordering, like stability in sorting, is an incredibly useful property. If it costs a little, then so be it.
This is optimizing for the common case, where memory is generally plentiful and dicts grow more than they shrink. Python has so many memory inefficiencies that occasional tombstones in the dict internal structure is unlikely to be a major effect. If you're really concerned, do `d = dict(d)` after aggressive deletion.
> Ordering, like stability in sorting, is an incredibly useful property.
I can't say I've noticed any good reasons to rely on it. Didn't reach for `OrderedDict` often back in the day either. I've had more use for actual sorting than for preserving the insertion order.
This morning for example, I tested an object serialized through a JSON API. My test data seems to never match the next run.
After a while, I realized one of the objects was using a set of objects, which in the API was turned into a JSON array, but the order of said array would change depending of the initial Python VM state.
3 days ago, I used itertools.group by to group a bunch of things. But itertools.group by only works on iterable that are sorted by the grouping key.
Now granted, none of those recent example are related to dicts, but dict is not a special case. And it's iterated over regularly.
Personally, I find lots of reasons to prefer an orders Dict to an unordered one. Even small effects like "the debugging output will appear in a consistent order making it easier to compare" can be motivation enough in many use cases.
Thinking about this upfront for me, I am actually wondering why this is useful outside of equality comparisons.
Granted, I live and work in TypeScript, where I can't `===` two objects but I could see this deterministic behavior making it easier for a language to compare two objects, especially if equality comparison is dependent on a generated hash.
The other is guaranteed iteration order, if you are reliant on the index-contents relationship of an iterable, but we're talking about Dicts which are keyed, but extending this idea to List, I see this usefulness in some scenarios.
Beyond that, I'm not sure it matters, but I also realize I could simply not have enough imagination at the moment to think of other benefits
Same. Recently I saw interview feedback where someone complained that the candidate used OrderedDict instead of the built-in dict that is now ordered, but they'll let it slide... As if writing code that will silently do different things depending on minor Python version is a good idea.
Well it's been guaranteed since 3.7 which came out in 2018, and 3.6 reached end-of-life in 2021, so it's been a while. I could see the advantage if you're writing code for the public (libraries, applications), but for example I know at my job my code is never going to be run with Python 3.6 or older.
Honestly, if I was writing some code that depended on dicts being ordered I think I'd still use OrderedDict in modern Python. I gives the reader more information that I'm doing something slightly unusual.
It seems like opinions really differ on this item then. I love insertion sort ordering in mappings, and python with it was a big revelation. The main reason is that keys need some order, and insertion order -> iteration order is a lot better than pseudorandom order (hash based orders).
For me, it creates more reproducible programs and scripts, even simple ones.
This was 19 (almost) 20 years ago.
As stated in the lwn.net article, a lot of concurrency has been added to python, and it might now be time for something like a frozendict.
Things that were not useful in 2006 might be totally useful in 2026 ;P
Still, like you, I'm curious wether he has anything to say about it.
I think Raymond Hettinger is called out specially here because he did a well known talk called [Modern Dictionaries](https://youtu.be/p33CVV29OG8) where around 32:00 to 35:00 in he makes the quip about how younger developers think they need new data structures to handle new problems, but eventually just end up recreating / rediscovering solutions from the 1960s.
“What has been is what will be, and what has been done is what will be done; there is nothing new under the sun.”
I think he was always reluctant to add features, and his version of Python would be slimmer, beautiful, and maybe 'finished'. His voice is definitely not guiding the contemporary Python development, which is more expansionist in terms of features.
It's interesting that he concludes that freezing dicts is "not especially useful" after addressing only a single motivation: the use of a dictionary as a key.
He doesn't address the reason that most of us in 2025 immediately think of, which is that it's easier to reason about code if you know that certain values can't change after they're created.
You can't really tell though. Maybe the dict is frozen but the values inside aren't. C++ tried to handle this with constness, but that has its own caveats that make some people argue against using it.
Indeed. So I don't really understand what this proposal tries to achieve. It even explicitly says that dict → frozendict will be O(n) shallow-copy, and the contention is only about O(n) part. So… yeah, I'm sure they are useful for some cases, but as Raymond has said — it doesn't seem to be especially useful, and I don't understand what people ITT are getting excited about.
Ha, Raymond Hettinger has a lot of opinions. Great guy and I admire his dedication to Python, but in my own experience (and the experience of some other contributors), he has a chilling effect on contributions to certain parts of the CPython code base.
> Another PEP 351 world view is that tuples can serve as frozenlists; however,
that view represents a Liskov violation (tuples don't support the same
methods). This idea resurfaces and has be shot down again every few months.
... Well, yes; it doesn't support the methods for mutation. Thinking of ImmutableFoo as a subclass of Foo is never going to work. And, indeed, `set` and `frozenset` don't have an inheritance relationship.
I normally find Hettinger very insightful so this one is disappointing. But nobody's perfect, and we change over time (and so do the underlying conditions). I've felt like frozendict was missing for a long time, though. And really I think the language would have been better with a more formal concept of immutability (e.g. linking it more explicitly to hashability; having explicit recognition of "cache" attributes, ...), even if it didn't go the immutable-by-default route.
Apple (or perhaps NeXT) has solved this problem already in Objective-C. Look at NSArray and NSMutableArray, or NSData and NSMutableData. It’s intuitive and Liskov-correct to make the mutable version a subclass of the immutable version. And it’s clearly wrong to have the subclass relationship the other way around.
Given how dynamic Python is, such a subclass relationship need not be evident at the C level. You can totally make one class whose implementation is independent of another class a subclass of the other, using PEP 3119. This gives implementations complete flexibility in how to implement the class while retaining the ontological subclass relationship.
> ImmutableFoo as a subclass of Foo is never going to work. And, indeed, `set` and `frozenset` don't have an inheritance relationship.
Theoretically, could `set` be a subclass of `frozenset` (and `dict` of `frozendict`)? Do other languages take that approach?
> linking [immutability] more explicitly to hashability
AFAIK immutability and hashability are equivalent for the language's "core" types. Would it be possible to enforce that equivalence for user-defined types, given that mutability and the implementation of `__hash__` are entirely controlled by the programmer?
> Theoretically, could `set` be a subclass of `frozenset` (and `dict` of `frozendict`)?
At one extreme: sure, anything can be made a subclass of anything else, if we wanted to.
At the other extreme: no, since Liskov substitution is an impossibly-high bar to reach; especially in a language that's as dynamic/loose as Python. For example, consider an expression like '"pop" in dir(mySet)'
> Barbara Liskov and Jeannette Wing described the principle succinctly in a 1994 paper as follows:[1]
> > Subtype Requirement: Let ϕ(x) be a property provable about objects x of type T. Then ϕ(y) should be true for objects y of type S where S is a subtype of T.
My expression `"pop" in dir(mySet)` gives an explicit example of how `set` and `frozenset` are not subtypes of each other (regardless of how they're encoded in the language, with "subclasses" or whatever). In this case `ϕ(x)` would be a property like `'"pop" in dir(x)' = 'False'`, which holds for objects x of type frozenset. Yet it does not hold for objects y of type set.
Your example of `hasattr(mySet, 'pop')` gives another property that would be violated.
My point is that avoiding "Liskov violations" is ("theoretically") impossible, especially in Python (which allows programs to introspect/reflect on values, using facilities like 'dir', 'hasattr', etc.).
The root of the issue here is that Liskov substitution principle simply references ϕ(x) to be some property satisfied by objects of a class. It does not distinguish between properties that are designed by the author of the class to be satisfied or properties that happen to be satisfied in this particular implementation. But the Hyrum’s Law also states that properties that are accidentally true can become relied upon and as time passes become an intrinsic property. This to me suggests that the crux of the problem is that people don’t communicate sufficiently about invariants and non-invariants of their code.
> > Subtype Requirement: Let ϕ(x) be a property provable about objects x of type T. Then ϕ(y) should be true for objects y of type S where S is a subtype of T.
This says "if hasattr(parent, 'pop') == True then hasattr(child, 'pop') must be True". This is not violated in this case, since hasattr(parent, 'pop') is False. If you want to extend the above definition so that negative proofs concerning the parent should also hold true for the child, then subtyping becomes impossible since all parent and child types must be identical, by definition.
The property in question is `hasattr(x, "pop") is False`.
> If you want to extend the above definition so that negative proofs concerning the parent should also hold true for the child, then subtyping becomes impossible since all parent and child types must be identical, by definition.
The distinction isn’t “negative proofs”, but yes, that’s their point. In Python, you have to draw a line as to which observable properties are eligible.
ImmutableFoo can't be a subclass of Foo, since it loses the mutator methods. But nor can Foo be a subclass of ImmutableFoo, since it loses the axiom of immutability (e.g. thread-safety) that ImmutableFoo has.
When you interpret Liskov substitution properly, it's very rare that anything Liskov-substitutes anything, making the entire property meaningless. So just do things based on what works best in the real world and aim for as much Liskov-substitution as is reasonable. Python is duck-typed anyway.
It's a decent guiding principle - Set and ImmutableSet are more substitutable than Set and Map, so Set deriving from ImmutableSet makes more sense than Set deriving from Map. It's just not something you can ever actually achieve.
This is because the ABC system is defined such that MutableMapping is a subtype of Mapping. Which mostly makes sense, except that if we suppose there exist Mappings that aren't MutableMappings (such that it makes sense to recognize two separate concepts in the first place), then Mapping should be hashable, because immutable things generally should be hashable. Conceptually, making something mutable adds a bunch of mutation methods, but it also ought to take away hashing. So Liskov frowns regardless.
It really doesn't make sense for there to be an inheritance relationship between Mapping and MutableMapping if Mapping is immutable (it isn't, of course), but the weirder part is still just that the typing machinery is cool with unhashable key types like:
I agree, same with frozenset. If you really want to use one of those as a key, convert to a tuple. There might be niche use cases for all this, but it's not something that the language or even the standard lib need to support.
Problem being that sets aren't consistently ordered and conversion to a tuple can result in an exponential (specifically, factorial) explosion in the number of possible keys associated with a single set. Nor can you sort all objects. Safe conversion of sets to tuples for use as keys is possible but the only technique I know requires an auxiliary store of objects (mapping objects to the order in which they were first observed), which doesn't parallelize well.
tuple(sorted(s)) and if you can't even sort the values, they're probably not hashable. I get that this involves a copy, but so does frozenset, and you can cross that bridge in various ways if it's ever a problem.
I'm not saying the problem with tuple doesn't exist, but that there doesn't need to be a built-in way to deal with it. If for some unfortunate reason you've got a mixed-type set that you also want to use as a dict key, you can write a helper.
> Jack Crenshaw's tutorial takes the syntax-directed translation approach, where code is emitted while parsing, without having to divide the compiler into explicit phases with IRs.
Is "syntax-directed translation" just another term for a single-pass compiler, e.g. as used by Lua (albeit to generate bytecode instead of assembly / machine code)? Or is it something more specific?
> in the latter parts of the tutorial it starts showing its limitations. Especially once we get to types [...] it's easy to generate working code; it's just not easy to generate optimal code
So, using a single-pass compiler for a statically-typed language makes it difficult to apply type-based optimizations. (Of course, Lua sidesteps this problem because the language is dynamically typed.)
Are there any other downsides? Does single-pass compilation also restrict the level of type checking that can be performed?
As long as your target language has a strict define-before-use rule and no advanced inference is required you will know the types of expressions, and can perform type-based optimizations. You can also do constant folding and (very rudimentary) inlining. But the best optimizations are done on IRs, which you don't have access to in an old-school single pass design. LICM, CSE, GVN, DCE, and all the countless loop opts are not available to you. You'll also spill to memory a lot, because you can't run a decent regalloc in a single pass.
I'm actually a big fan a function-by-function dual-pass compilation. You generate IR from the parser in one pass, and do codegen right after. Most intermediate state is thrown out (including the AST, for non-polymorphic functions) and you move on to the next function. This give you an extremely fast data-oriented baseline compiler with reasonable codegen (much better than something like tcc).
There is a TV movie In Pursuit of Honor (1995) claiming to be based on true events. My short search online states that such things were never really documented, but it's plausible that there were similar things happening.
> In Pursuit of Honor is a 1995 American made-for-cable Western film directed by Ken Olin. Don Johnson stars as a member of a United States Cavalry detachment refusing to slaughter its horses after being ordered to do so by General Douglas MacArthur. The movie follows the plight of the officers as they attempt to save the animals that the Army no longer needs as it modernizes toward a mechanized military.
reply