Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The minimum puzzle length for spelling bee is 20 words iirc. The dictionary is also a highly curated list of “common” words. What constitutes a valid word is up to Sam, the NYT editor. It’s designed to make the puzzles doable by the average solver. You’ll notice that a lot of the words in the OP are very esoteric.

Source: helped build SB at NYT.



Thanks for the insight!

There's no actual answer to the question, given that the word list can be set inconsistently, so I had to choose _something_ to go off.

The best word list I was able to find is the one hosted by https://www.sbsolver.com/ , but unfortunately they don't distribute it.

I got a somewhat better word list for part 2: https://notes.billmill.org/blog/2024/03/mitzVah_-_the__worst...

But it's still imperfect. However a lot of the words I expected to be invalid have actually been in puzzles before, so it's not easy to guess which are going to be good and which aren't.

edit: there have been puzzles with as few as 16 words before: https://www.sbsolver.com/stats/count/low

edit 2: I modified the program to print puzzles with at least 16 words, and the "worst" puzzles it found with that constraint are:

unbEknown, jawbonE, monadnocK, woRkgRoup, daGlock, moonwalK, confLux, buLLhorn, yOkOzuna, Fraught, hogliKe


I think your word list is still considerably too large. Zero chance in my mind that jouk or qajaq, for example, would make it to the NYT wordlist. (I don't think they'd even be accepted in the crossword, which has looser standards, unless there was a very specific theme that called for them). Apart from being obscure, their only use seems to be as non-standard spellings, for juke and kayak respectively. The Spelling Bee doesn't even accept UK spellings.

At least 5 of the proposed pangrams wouldn't make the cut, either.


I agree with you, do you have a better one? I can only work with what I have


Perhaps you could scrape https://nytbee.com/, mentioned in the thread, for the historical answers, or contact the owner. Also @banana_giraffe, a commenter on this thread, seems to have scraped 6 years of data.

Then you could take a very permissive wordlist and filter it using the historical data. For all words of six distinct letters or fewer, you could determine whether they were allowed, not allowed, or indeterminate (no puzzle ever appeared that would have allowed them). My gut feeling is that you'd be left with very few indeterminate words, though jouk and qajaq might well be among them - review those manually.


[1] has a couple of references to lists of common words used as inspiration.

[1] https://gitlab.com/engmark/xkcd-passphrase-generator


They've loosened up the rules since I left many moons ago, likely to expand number of puzzles without repeats. IIRC we had about 5 years worth at the beginning.


Very cool, thanks! I play every morning. There are times when Sam's curation is very frustrating. It would be nice to submit other valid words and have the game verify them as a way to score "bonus" points. Oftentimes I find that there are some baffling omissions and, after the fact, some truly bizarre inclusions. It would be nice to be able to score points based on your own vocabulary while still having the game's score based on whatever common denominator Sam comes up with.


Agreed, I get frustrated a few days a week with curation inclusions and exclusions. DOORYARD, MICROMINI, NONCOM, ROMCOM are in the list and IMO shouldn't be. UNTENDED, MONOTONIC, BOLE are not in the list but IMO deserve to be.


As a nonnative speaker it's first time ever I've heard word "bole" being used


Agreed. When I type a good word that isn’t accepted, I usually just stop playing that days puzzle. My guess is that Sam is not very scientifically literate. Simple weather words like cyclonic or adiabatic, advection, no dice. And then you get some pretty obscure literary words.

Makes me want to make a free clone that includes science words, and isn’t afraid of the letter S.


I think your definition of "simple" doesn't agree with the average person's. I guarantee you that 98% of people don't know the word "adiabatic".


It’s a very common word in many technical domains. Not like it’s a guy’s name or something.


"It's a very common word in extremely niche domains" doesn't make something common, unfortunately.


It is also very culturally biased; some loanwords are more present than others


The omissions that kill me are common nautical terms.


And so many "non-American" words are rejected too like: whinge, colour, metre, etc.


"Whinge" is just a plain old misspelling of "Whine," nothing more or less. We don't need "Whinge" to become a word; we already have "Whine."


"Whinge" definitively is a word[1] and has existed since old English.

[1] https://www.oed.com/search/dictionary/?scope=Entries&q=Whing...


Well, it shouldn't be. It's entirely redundant with 'whine'.


Where do you stand on "guarantee" vs "warranty"?

That's not to mention common phrases with a built-in redundancy such as "Cease and desist"? "Face mask"? "Free gift"?

Idiomatic English has lots of redundancies of one kind or another.


Those aren't really redundant, though. I can say, "I guarantee that this product will work," but nobody would say, "I warranty that this product will work." (You could argue that guarantee is redundant with warrant, but the police don't go to the judge to request a search guarantee. Both words are needed.)

I can't think of a single use case for whinge that wouldn't be equally satisfied by whine. Can you?


I can't think of a single use case for nought that wouldn't be equally satisfied by "zero". Or courgette that wouldn't be satisfied by zuchini, or both of those by "baby marrow". Or why use "oregano" when you could use "wild marjoram"? A perfectly good English name that has been largely displaced by an Italian word.

English (as all modern languages) has tons and tons of exact synonyms and other types of redundant words. It's just a normal part of usage.

In this specific case I personally prefer whinge for emphasizing the complaint and whine for emphasising the noise, so I don't really think they are redundant - I think they are slightly different.


No it isn't. What gave you that daft idea?


It's a stupid word, one that we don't need. 'Whine' works just fine. Why was it necessary to add a 'g'?

You can see 'whinge' gaining ground very recently at the expense of 'whine' here:

https://books.google.com/ngrams/graph?content=whine%2Cwhinge...

This is an outrage, and must be stopped. :-P


no homophones. whine shares a phonemic address with wine, while whinge staked out a plot of its very own, even if its just a slapdash pop-up tent next door to the local drunkard, binge. hinge shares a smartly bricked-up border with both.


It feels like they could just use something like the Google Ngram viewer to filter the words.


I’m still pissed about advection!


What does “isn’t afraid of the letter S” mean?


spelling bee puzzles never contain S. Originally they didn't include puzzles with e+d or i+n+g either.


I still remember my disappointment when I entered HEMOPHAGE and it was deemed "not a valid word".


naphtha and caracal.


And naphthalene, which would have been a pangram that day.


I seem to remember both those words were put forward in the hints page, too!


> Source: helped build SB at NYT.

Wow. Another example of HN at its finest.

Great work on the game btw. My gf introduced me to the game and we love it. Though, we play a variation of it against one another in which we open the game on a single screen and whoever finds a pangram first wins.


To share remotely, one player gets halfway to genius, pangram forbidden, and the second player gets over the line. After that you use SB Buddy (no peeking at hints) to get to Queeen Bee.

The most annoying missing wordlist words are naphtha and caracal. An objective measure of word-use frequency should determine the words included. Probably super-obscure articles of clerical costume should not be.


That may be the target, but there have been a handful of Spelling Bees with less than 20 words in the answer list. For instance, March 27, 2023 had 16 answers:

> MORTIFY, FORTIFY, FIFTY, FORTY, MOTIF, FIRM, FOOT, FORM, FORT, FROM, IFFY, MIFF, RIFF, RIFT, ROOF, TIFF


Very possible. The've loosened the rules have a bit since I originally generated the puzzles e.g. they now allow more than one vowel, i+n+g and e+d are allowed in puzzles, possibly more


> The've loosened the rules have a bit since I originally generated the puzzles e.g. they now allow more than one vowel, ….

The original Bees allowed only one vowel? That must have made it really tough to get long Bees!


Maybe it was 2. It's been a while..


the "ed" and "ing" puzzles are just annoying.


Toot, trot, tort, and moot aren't legal?


From the list of words, I think F was the required (central) letter.


As others have pointed out, all words must have the center letter, which was "F" on this day, the outer letters were "I M O R T Y"


You have to use the center letter, whatever it was.


So you need a 10-letter pangram that generates like 19 4-letter words. :)


> What constitutes a valid word is up to Sam, the NYT editor

I think the only listed words I'd think would get approved are jukebox, quixotic, and gimmickry.


Is it the same dictionary Letter Boxed uses?


Definitely not! I play both and Letter Boxed accepts many more words.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: