Training a neural network to predict word embeddings from spelling, and using nearest neighbor search to decode meaning.
This is a simple experiment in predicting a word’s meaning purely from it’s spelling. A recurrent neural network is first trained to predict word embeddings from spelling, from a subset of 2000 word-embedding pairs trained using Glove on a Wikipedia crawl. This subset is filtered for common words that are at least five letters long, so a lot of words won’t appear in the search results. Next, a new word is fed to the network to get it’s word embedding. The nearest neighbors to that embedding are shown; these represent the words that the network thinks are closest in meaning to the presented word. This happens entirely in your browser using Keras JS. Nearest neighbor decoding is done using NumJS.
Because the model is deployed using Keras JS, you can run a full neural network in your browser, despite my site being hosted statically on Github Pages (although the network files are pretty large, so it can be kind of buggy, especially on slow connections). In the future I hope to make more interactive examples like this, for language modeling and other things.
Word Embeddings are a ubiquitous idea in language modeling that suggests that we can represent a word as a vector, where the location of the vector in vector space represents the meaning of that word. There are some cool examples of what this looks like in practice. The most common example is that if you take the vector for “king”, subtract the vector for “man” and add the vector for “woman” you get the vector for “queen”. This means that our neural network is essentially trying to predict the word’s meaning purely from it’s spelling. We can decode it’s meaning by finding words with similar vectors. As with a lot of neural applications, interpreting the results is a bit like finding images in clouds.