Other words too, e.g. "from".<p>My first thought was that the creator used a search library that filters common words by default, but the search code is all in the page and doesn't do that.<p>My second thought was that the 10k word corpus doesn't include those most common words. But it does.<p>Then I realized that the creator filtered them out. The page does say "7931 words", and the title here on HN says "10k* most common". The original corpus has exactly 10,000 words.<p><a href="https://github.com/first20hours/google-10000-english/blob/d0736d492489198e4f9d650c7ab4143bc14c1e9e/google-10000-english.txt" rel="nofollow">https://github.com/first20hours/google-10000-english/blob/d0...</a><p>The first 21 include all four we've mentioned:<p>the, of, and, to, a, in, for, is, on, that, by, this, with, i, you, it, not, or, be, are, from
The reason for this (I should have probably added a note to the site in hindsight), is that WordNet doesn't include definitions for these words in its corpus. This is why the count is less than 10,000: anything that WordNet doesn't have a definition for isn't included. I left a nod to this in the asterisk, but I realise now I didn't explain it anywhere.<p>From the old Princeton WordNet FAQ page (<a href="https://wordnet.princeton.edu/frequently-asked-questions" rel="nofollow">https://wordnet.princeton.edu/frequently-asked-questions</a>):<p>> WordNet only contains "open-class words": nouns, verbs, adjectives, and adverbs. Thus, excluded words include determiners, prepositions, pronouns, conjunctions, and particles.<p>I suppose I could have included them as source nodes (only outgoing), but I think they would have ended up connecting to a whole bunch of definitions, while not providing much in the way of interest.