The project where I looked at Mimir was a 500+ million timeseries project, with the desire to support scaling to the ten-figure level of timeseries (working for a BigCo supporting hundreds of product development teams).
All of these systems that store metrics in object storage - you have to remember that object storage is not file storage. Generally speaking (stuff like S3 One Zone being a relatively recent exception) you cannot append to object files. Metrics queries are resolved by querying historical metrics in object storage plus a stateful service hosting the latest 2 hours of data before it can be compressed and uploaded to object storage as a single block. At a certain scale, you simply need to choose which is more important - being able to answer queries or being able to insert more timeseries. And if you don't prioritize insertion, it just results in the backlog getting bigger and bigger, which especially in the eventual case (Murphy's Law guarantees it) of a sudden flood of metrics to ingest will cause several hour ingestion delays during which you are blind. And if you do prioritize insertion, well the component simply won't respond to queries, which makes you blind anyway. Lose-lose.
Mimir built in Kafka because it's quite literally necessary at scale. You need the stateful query component (with the latest 2 hours) to prioritize queries, then pull from the Kafka topic on a lower priority thread, when there's spare time to do so. Kafka soaks up the sudden ingestion floods so that they don't result in the stateful query component getting DoS'd.
I took a quick look at VictoriaMetrics - no Kafka or Kafka-like component to soak up ingestion floods? DOA.
Again, most companies are not BigCos. If you're a startup/scaleup with one VP supervising several development teams, you likely don't need that scale, probably VictoriaMetrics is just fine, you're not the first person I've heard recommend it. But I would say 80% of companies are small enough to be served with a simple Prometheus or Thanos Query over HA Prometheus setup, 17% of companies will get a lot of value out of Victoria Metrics, the last 3% really need Mimir's scalability.
I'm not sure where you saw that Victoria Metrics uses object storage. It doesn't - it uses block storage and it runs completely fine on HDD, you don't even need SSD/NVMe.
There are multiple ways to deal with ingestion floods. Kafka/distributed log is one of them, but it's not the only one. In cluster mode VM is a distributed set of services that scale out independently and buffer at different levels.
Resource usage for ingestion/storage is much lower than other solutions, and you get more for your money. At $PREVIOUS_JOB, we migrated from a very expensive Thanos to a VM cluster backed by HDDs, and saved a lot. Performance was much better as well. It was a while ago, and I don't remember the exact number of time series, but it was meant to handle 10k+ VMs (and a lot of other resources, multiple k8s clusters) and did it with ease (also for everybody involved).
I don't think you have really looked into VM - you might get pleasantly surprised by what you find :) Check out this benchmark with Mimir[1] (it is a few years old though), and some case studies [2]. Some of the companies in the case studies run at significantly higher volume than your requirements.
There were other problems with VictoriaMetrics - a failed migration attempt by previous engineers made it politically difficult to raise as a possibility, lack of a promise of full PromQL compatibility (too many PromQL dashboards built by too many teams), seeing features locked behind the Enterprise version (Mimir Enterprise had features added on top, not features locked away).
> HDD
You're right, I'm misremembering here, that particular complaint about a lack of Kafka was a Thanos issue, not VM.
That said, HDD is a hard sell to management. Seen as "not cloud native". People with old trauma from 100% full disks not expanded in time. Organizational perception that object storage does not need to be backed up (because redundancy is built into the object storage system) but HDD does (and automated backups are a VM Enterprise feature, and even more important if storing long-term metrics in VM).
> In cluster mode VM is a distributed set of services that scale out independently and buffer at different levels
So are Thanos and Mimir, which suffer from ingest floods causing DoS, at least until Kafka was added. vminsert is billed as stateless, same as Thanos Receiver, same as Mimir Distributor. Not convinced.
> lack of a promise of full PromQL compatibility (too many PromQL dashboards built by too many teams)
This is a classical FUD. VictoriaMetrics is used as a drop-in replacement for Prometheus, Thanos and Mimir. It works perfectly across all the existing dashboards in Grafana, and across all the existing recording and alerting rules. I'm unaware of VictoriaMetrics users who hit PromQL compatibility issues during the migration from Prometheus, Thanos and Mimir to VictoriaMetrics. There are a few deliberate incompatibilities aimed towards improving user experience. See https://medium.com/@romanhavronenko/victoriametrics-promql-c...
> seeing features locked behind the Enterprise version (Mimir Enterprise had features added on top, not features locked away)
All the VictoriaMetrics features, which are useful across the majority of practical use cases, are included in open-source version. The main Enterprise feature - high-quality technical support by VictoriaMetrics engineers. Other Enterprise features are needed only for large enterprise companies. See https://docs.victoriametrics.com/victoriametrics/enterprise/
All of these systems that store metrics in object storage - you have to remember that object storage is not file storage. Generally speaking (stuff like S3 One Zone being a relatively recent exception) you cannot append to object files. Metrics queries are resolved by querying historical metrics in object storage plus a stateful service hosting the latest 2 hours of data before it can be compressed and uploaded to object storage as a single block. At a certain scale, you simply need to choose which is more important - being able to answer queries or being able to insert more timeseries. And if you don't prioritize insertion, it just results in the backlog getting bigger and bigger, which especially in the eventual case (Murphy's Law guarantees it) of a sudden flood of metrics to ingest will cause several hour ingestion delays during which you are blind. And if you do prioritize insertion, well the component simply won't respond to queries, which makes you blind anyway. Lose-lose.
Mimir built in Kafka because it's quite literally necessary at scale. You need the stateful query component (with the latest 2 hours) to prioritize queries, then pull from the Kafka topic on a lower priority thread, when there's spare time to do so. Kafka soaks up the sudden ingestion floods so that they don't result in the stateful query component getting DoS'd.
I took a quick look at VictoriaMetrics - no Kafka or Kafka-like component to soak up ingestion floods? DOA.
Again, most companies are not BigCos. If you're a startup/scaleup with one VP supervising several development teams, you likely don't need that scale, probably VictoriaMetrics is just fine, you're not the first person I've heard recommend it. But I would say 80% of companies are small enough to be served with a simple Prometheus or Thanos Query over HA Prometheus setup, 17% of companies will get a lot of value out of Victoria Metrics, the last 3% really need Mimir's scalability.