Hacker Newsnew | past | comments | ask | show | jobs | submit | more blopker's commentslogin

Nice work! If you’re looking for more questions, my nonprofit specializes in authentic communication in groups. We have a list of prompts for our group moderators, but you’re welcome to use them as well: https://www.totem.org/repos/prompts/


Maybe it's not for you, but the "everything is a string" thing is just the default. SQLite has STRICT table option since 2021 that people really should be using if possible: https://www.sqlite.org/stricttables.html

This brings strict types that people expect from the other server-based databases.


TFA did mention using animated AVIF, but not WebP for some reason. The issues still stand though, no playback controls, no programmatic playback, no audio. For my use case, I was not able to get an animated WebP to just play once and stop.

Edit: also no desktop Safari support for transparent animated WebP.


Can be done but requires a server.

>> Edit: also no desktop Safari support for transparent animated WebP.

Do you mean the link above that I posted? Works fine in my desktop Safari.

https://static.crowdwave.link/transparentanimatedwebp.html


While I don't see the advantage of writing UI in Rust vs Dart, I'm a huge fan of flutter_rust_bridge.

The work fzyzcjy and the community have put into calling Rust code from Dart seamlessly is such an asset to Flutter apps. I remade the popular image compression app ImageOptim in Flutter over a weekend because webp wasn't supported. After a little pain in the initial setup, I was able to call mature Rust image libraries using flutter_rust_bridge and a small wrapper[0]. It even handled all the parallelism for me.

The app ended up being more capable and faster than ImageOptim, largely just because of the Rust integration. Thank you fzyzcjy!

[0]: https://github.com/blopker/alic/blob/main/rust/src/api/compr...


You are welcome, and I am happy to see it helps! When I initially develop flutter_rust_bridge, my personal usage scenario is quite similar to yours: using Rust for some high performance algorithms.


Link to the data is in a Github repo at the bottom: https://github.com/Activision/caldera

Reading the article, they don't seem to know what people should do with it. It feels like a recruiting tool more than anything, especially given the non-commercial license.


It's useful to have AAA tier sample game level assets available for engine development or for apps like Blender.


As an avid CoD player, I literally have no idea why this would be useful. Map data isn’t really interesting.

The player data seems far too low of resolution to be meaningful.


These sorts of data sets can be useful for graphics research, particular as a data set to test ray tracing algorithms on.

See for example, the Moana Island data set. [1]

I definitely foresee papers on BVH construction using this scene.

For graphics research in academia, there's a dearth of real-world data sets like this, so the ones that do get released are gold. And for graphics research in industry, one may have access to good internal data sets for development and testing, but getting permission to publish anything with them tends to be a giant hassle. It's often easier to just use publicly available data sets. Plus, that makes it easier to compare results across papers.

[1] https://www.disneyanimation.com/resources/moana-island-scene...


The Moana island has complete material data though. This release seems to be only geometry. No materials or textures at all.


Yep. That's still fine for building BVHs and shooting some rays around.


Thank you for explaining that. Very helpful.


Since they provide player movement data, you can train a transformer to predict which player will win the BR given movement patterns. Or maybe create "player embeddings" to see if player behaviors can be clustered. That could be a fun project...but definitely not useful.

Extracting and converting the player data from the .usd files would not be fun, though.


> Since they provide player movement data, you can train a transformer to predict which player will win the BR given movement patterns.

You didn't consider the main factor for CoD - cheating. Which clearly seems to be an inside thing.

Not sure if anything meaningful can be obtained by analyzing anything that has player data on it considering every video game out there is prone to this.


Why would having player movement data help cheating?

Why is the cheating clearly an insider thing?

Why aren't you sure if anything meaningful can be derived from the movement data?

What do you mean by "prone to this"?

Are you sure they didn't consider "cheating" as a possible use of the movement data?

Could they have considered it but thrown it away as off-topic and implausible?


They are implying player teleporting, which is a common hack in BRs.

Player movement data that is too fast for normal players could be seen as cheating. An AI isn't strictly needed for that, just check displacement over time.


Is it really a common hack? I would have guessed teleportation is the easiest to detect server-side, or impossible from the start as the server is authoritative (clients sends inputs, the server computes the positions and any important change, sends them back to clients, clients cannot hack their movement).


I’ve never seen it in CoD. Last time I was this was in like 2010 when MWII was hacked to death.


Given all the other variables that introduce a bunch of noise to the player movement data, I doubt you could ever determine any useful predictive pattern.

If anything though, I could see how player behavior of match winners could be used to both identify varying level of cheaters and players that use various methods for providing an advantage (i.e., keyboard mouse, joystick extensions, etc) and automatically sequester or even handicap their accounts.

It appears to me that so much effort is placed on trying to identify and hamper cheaters in real time, when that both seems extremely resource intensive and unnecessary, considering you have all the digital evidence proof of cheating you need after the fact, you just have to understand what you are looking at.


> so much effort is placed on trying to identify and hamper cheaters in real time, when that both seems extremely resource intensive and unnecessary, considering you have all the digital evidence proof of cheating you need after the fact, you just have to understand what you are looking at

It's not resource intensive at all compared to the alternative of ahaving humans doing post match reviews. It's all "AI" and automated reviews because it's cheaper. Half of the "anti-cheat" tactic is anyway using your computer resources to run some anti cheat tool.

These games are optimized for revenue so every action is dictated by that. Including catching/banning cheaters. If it costs too much to do it properly, or (and this is actually plausible) cheaters are a significant enough portion of the already small chunk of players who create recurring revenue, then there's no incentive to take real action.

This data is probably useful for actual academic rather than practical purposes today. They're building the knowledge they might want to use in a few years.


> considering you have all the digital evidence proof of cheating you need after the fact,

It's actually getting increasingly hard to tell. Old cheating use to be snap-to-the head type of cheating.

The newer cheats work really hard to resemble natural players. Soft aim, intentionally missed shots, non-perfect recoil control.


>Given all the other variables that introduce a bunch of noise to the player movement data, I doubt you could ever determine any useful predictive pattern.

Predicting a winner will be difficult but I would not be surprised if you could loosely predict rank (does Warzone track player rank?) off of movement alone. You may be able to predict more accurately by looking at the associations between two players and their movement. From my prior experience in FPS games, positioning, awareness, and aim are the core pillars of success. Unfortunately as far as I can tell from the data set, only player position is tracked.


It sounds like this is simply not for you, then, and that's fine.


From an information theory perspective, it should be possible to define strategically important locations in terms of Empowerment [1]. As a map designer there are likely some rough rules you want to abide by, such as roughly equal highly empowered locations throughout the map to reduce location bias.

I remember an old game where they defined map rules in FPS CTF maps that there should be more than one path to each flag (usually three) and flag areas should be partially visible from one base to another. There were lots of rules like these, some more flexible than others.

[1] https://arxiv.org/abs/1310.1863


Not really, they released it primarily for artist training and tutorials. Getting hold of XXL gaming maps of this high-quality from a super popular game is definitely something that most game design training courses will use 100%.


Is there anything particularly novel about this vs the game map of a different FPS?


Other fps game maps don't have licenses that let you use them to stress test your renderer or game engine. Existing freely available scenes are all too small and poorly made to be proper stress tests with modern hardware (eg. Old sponza is way too light, Intel sponza they just spammed the subdivision modifier to make it stupidly high poly, Bistro is small and really weirdly made, etc).


Not that it makes it novel, but this appears to be a “battle royale” map based on the picture shown in the post. So it is fairly large, for whatever that’s worth.

The assumption with this type of game is that player will play the same big map over an over for the season (or something like that, changing the map is very rare, might not happen at all over the lifespan of the game), and but they can pick which part of the map they start in and explore from there. So it is, I guess, more similar to having data from all of the maps, for classic first person shooters.


Specifically, during each play session, the map has a "storm" which converges on a random location over time: staying in the storm is lethal, so you are forced to eventually go to that location and points-of-interest along the way, which adds play variance.


Make killer robots on the island of caldera


If you're still looking, Gilbert Strang makes the best introduction book I know of: https://math.mit.edu/~gs/linearalgebra/ila6/indexila6.html


I like that he leaves determinants to a later chapter and doesn't _start_ with them, I never understood why they were useful or made sense. His view, represented on the cover, is great for learning


I don't understand the anti-determinant brigade. Many linear algebra books don't don't start with determinants.


They're fine where they are useful, I guess, but my undergrad put way too much emphasis on them when they're not intuitive, don't help (me) much with comprehension, and aren't useful in that many cases compared to the other techniques.


There's a similar method to get into an Eight Sleep Pod 3 [0]. This requires less extra hardware though since some models come with a MicroSD card that you can modify. The method used in TFA might be a good way to get root on Pods without the card. That being said, I just learned that while Eight Sleep does sign their firmware updates, they also send you the private key used to sign the update in the same package.

[0]: https://github.com/bobobo1618/ninesleep


Ironically this makes me more likely to buy one. If I can make the smart thing local and/or home assistant controlled, and kill their internet connectivity... I'm thinking that isn't so bad.

Don't get me wrong $2-4k is steep, but if it's a one-time for a decade or so, that's reasonable. But $4k plus you want $25/mo? Just fluff right off.


This would be cool to compile to wasm and ship to the browser. Seems like it would give a static site super fast search powers.


I ‘m using https://stork-search.net for my static website search, but it’s no longer maintained. So yeah, Tantivy would be a great candidate to replace it! :)


> He said he didn’t respond to friends’ messages or confide in anyone, feeling like nobody would understand anyway.

I feel this. I think people would call me an introvert, but I'm probably just an over-thinker. It's casual conversation that seems to be exhausting (or uninteresting?) to me. Once I'm in a space where I can talk openly about more abstract topics I start to enjoy it. Getting there just often seems like too much work though.

I tried therapy, meditation, 'wellness' apps. It all either felt too 'me' focused, or too detached. I like this site because people here seem to share what they are actually thinking, and are eloquent enough to capture interesting nuance. I don't always agree with it, but there's a level of authenticity to where I always learn something about the human condition. I wanted more of that.

[This is kind of a plug, but whatever]

I've spent the last few years in a deep-dive around why we seem to be collectively getting lonelier over time. I started a non-profit[0] to house this research. It's evolved into a platform where we host these support groups. Anyone can join, it's free, and as long as you stick to the community guidelines [1] anyone is welcome to join.

For me, it's a place to get out of my head. To hear from real people who don't generally feel like their voice matters. I know from years in tech management that these are in fact the most interesting people to talk to.

I've never really talked about Totem here because I think it might be too 'woo-woo' for this crowd, but if any of that landed for you, come check us out. If you don't like it, I'd love to know why. My personal email is in my profile.

We are a non-profit, grant-funded, and open-source[2] organization. Feedback of any kind is welcome. My hope is to become something like a public utility for these spaces. We're also looking for engineers to help make an app out of this.

[0]: https://www.totem.org [1]: https://www.totem.org/guidelines/ [2]: https://github.com/totem-technologies/totem-server


Can’t stand small talk either. I get it, it’s required at the beginning of meeting people, but don’t stay too long in that mode otherwise it gets boring.

Your site looks beautiful, kudos to the design team.


I don't know exactly how this works, but I wanted to share my experience trying to anonymize data. Don't.

While you may be able to change or delete obvious PII, like names, every bit of real data in aggregate leads to revealing someone's identity. They are male? That's half the population. They also live in Seattle, are Hispanic, age 18-25? Down to a few hundred thousand. They use Firefox? That might be like 10 people.

This is why browser fingerprinting is so effective. It's how Ad targeting works.

Just stick with fuzzing random data during development. Many web frameworks already have libraries for doing this. Django for example has factory_boy[0]. You just tell it what model to use, and the factory class will generate data based on your schema. You'll catch more issues this way anyway because computers are better at making nonsensical data.

Keep production data in production.

[0]: https://factoryboy.readthedocs.io/en/stable/orms.html


Thanks for the comment and hear you on the anonymization. What we see is that customers will go through and categorize what is PII and what is not and anonymize as needed. If not, they'll back fill with synthetic data. You can change the gender from male to something else, same with the city, etc.

It's really down to the use-case. If you're strictly doing development, then you'll probably want to use more synthetic data than anonymization. If you care about preserving the statistical characteristics of the data then you can use ML models like CTGAN to create net new data.

Definitely a balance between when do you anonymize vs. when do you create synthetic data.


Thanks for the reply, I don't mean to be discouraging! I totally believe people do this, I'm saying they shouldn't. There are other issues as well. Once production data is floating around different environments, it will be easy to lose track of. Then the first GDPR delete request comes in. Was this data synthetic? Was it real? I think Joe has a copy on his laptop, he's on vacation?

It gets messy. It also doesn't solve the main 'unsolvable' issue with production data: scale. It is difficult to test some changes locally because developers often don't have access to databases large enough that would show issues before getting to production. At a certain size, this is the #1 killer of deployments.


Combining this tool with downsampling would allow you to run isomorphic workloads on smaller nodes and thereby reveal the yield curve.


Yup - I worked on a data warehouse project that was subject to GDPR. The way we did it is we didn't do any synthetic data generation, we just blanked out any PII fields with "DELETED". Then it's still possible to action a delete request, because the PK's, emails are the same as they are in production.

It's definitely possible to practice this while adhering to GDPR, but you do need to plan carefully, and synthetic data should only be used for local dev/testing, not data warehousing.


> categorize what is PII and what is not and anonymize as needed

That sounds like just de-identification / pseudoanonymization, if you're just targeting PII or not


Fuzzing random data is fine for development environments, but it won't give you the same scale or statistical significance as production data. Without that you can't really ensure that a change will work reliably in production, without actually deploying to production. Canary deployments can only give you this assurance to a certain degree, and by definition would only cover a subset of the data, so having a traditional staging environment is still valuable.

Not only that, but a staging environment with prod-like data is also useful for running load, performance and E2E tests without actually impacting production servers or touching production data. In all of these cases, anonymizing production data is important as you want to minimize the risk of data leaks, while also avoiding issues that can happen when testing against real production data (e.g. sending emails to actual customers).


I don't totally understand this comment. Random data can get you more scale than production data, in that it can just be made up. All the load and E2E testing can be done with test data, no problem.

This idea of data being statistically significant has come up, but that's also easy to replicate with random data once you know the distributions of the data. In practice, those distributions rarely change, especially around demographic data. However, I don't think I've seen a case where this has been a problem. I'd be interested to learn about one.


The ideal scenario is that you're able to augment your existing data with more data that looks just like it. The matter of statistical significance really depends on the use-case. For load testing, it's probably not as important as it is for something like feature testin/debugging/analytical queries.

Even if you know the distribution of the data (which imo can be fairly difficult) replicating that can also be tricky. If you know that a gender column is 30-70 male - female, how do you create 30% male names? How about the female names? Are they the same name or do you repeat names? Does it matter? In some cases it does and in others it doesn't.

What we've seen is that it's really use-case specific and there are some tools that can help but there isn't a complete tool set. That's what we're trying to build over time.


There are various reasons why you might want synthetic data. Anonymisation is one of them - but the issue is around which statistical relationships are preserved in the anonymizing process, while ensuring that fusion with other data sources is not going to unmask the real data hidden beneath.


So because you were a) too lazy to understand the concept of differential privacy b) the value of using anonymized data and c) come up with a trivial strawman, that makes the whole concept of anonymization unnecessary and something that should be replaced by oversimplified factories?

How ignorant and wrong-headed. I'd recommend learning more about the concepts of k-anonymity and differential privacy before you prematurely presume impossible a concept that others (including Google) have been able to use successfully.


Are you ok?


Your professional laziness clearly has you in the wrong, and all you can muster is a snide, defensive "are you ok?"

I'd really advise you to work on that. There's a world of difference between "This is impossible" and "I can't be bothered to figure out how this useful thing others successfully do while managing billions of dollars of risk is possible." Otherwise, you risk professional stagnation.


It depends on the use case though. It's not just about developers testing locally.

One use case that many companies have is data warehousing. Here, you want to have real customer and order data, but anonymized to a degree where only the necessary data is exposed to business analysts and so on. I once worked on a project to do exactly that: clone production to the data warehouse, stripping out only things like contact details, but preserving things like emails, what the customer ordered, and other data like that.


Belated reply, sorry: This is the Correct Answer™.

Mid 2000s, I worked with electronic medical records. I eventually determined anon isn't worthwhile.

For starters, deanon will always beat anon. This statement is unambiguously true, per the research. Including the differential privacy stuff.

My efforts pivoted to adopting Translucent Database techniques. My current hill to die on is demanding that all PII be encrypted at rest, at the field level.

(It's basically applying paper password storing techniques to protecting PII. The book shows a handful of illuminating use cases. Super clever. No weird or new tech required.)


So, how does one create synthetic relational data? Do you just crank out a list of synthetic customers, assign IDs, create between 0 and 3 synthetic orders per person, and between 0 and 3 order line entries per order?


This is somewhat framework dependent, but factory_boy supports connecting factories together via SubFactory. There's a real-world example I'm building [0]. See where "author = SubFactory(UserFactory)". I'd imagine there are similar ways to do this for Rails and others too.

[0]: https://github.com/totem-technologies/totem-server/blob/main...


we're actually working on this right, can see the PR here -> https://github.com/nucleuscloud/neosync/pull/1832/files

it's a combination of creating a random number of records for foreign keys i.e 1 customer - create between 2 and 5 transctions. Working on giving you control over that, and handling referential integrity with table constraints (foreign keys, unique constraints, etc.)

ML based approaches typically are not very good at this and struggle with handling things like referential integrity. So a more "procedural" or imperative way is slightly better. The ideal is a combination of both.


Pretty much, yeah. Use normal distribution so you can get some outliers.


Knowing everything about an unspecified someone is different than knowing who that is


Well… if you know the data is real, then knowing everything about that someone can seriously limit the number of people it could actually be, which makes that someone identifiable.

Like if you only know the full address, and you see that only one person lives at that address. Or you know the exact birthdate and the school that someone went to. Or the year of birth and the small shop that the person works for. And so on…


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: