Return to all case studies

Case Study

Wordnik Harnesses the Internet to Define Every Word in the English Language

Wordnik’s mission: build a resource that houses the entire English language, including the 52% of words that aren’t in any dictionary. mLab provides the easy database scalability and trusted support Wordnik needs to accomplish this.

Highlights
  • Going beyond the traditional definition of support and troubleshooting, mLab provides Wordnik with actionable advice and guidance on how to structure and optimize its database queries.
  • mLab’s fully managed MongoDB hosting platform enables Wordnik to achieve the rapid scalability needed to pursue its goals – to continue to be the world’s largest online English dictionary by word volume.
  • Wordnik relies on mLab’s database tools for essential capabilities in running queries, accessing advanced analytics, and conveniently inputting new data that enable additional website features.
Customer Profile

Wordnik is the largest online English dictionary by word count, with more than eight million unique words and nearly one billion real-life example usage sentences available to users. The organization is a nonprofit, operating with the mission of discovering and sharing “as many words of English as possible with as many people as possible.” Rather than relying on editors or accepting user-contributed definitions, Wordnik applies advanced text mining and machine learning to automate the process of gathering new words, and provides example sentences as “free-range definitions.”

The Challenge

To build a database prepared to contain “all the words” in the English language, Wordnik needed an experienced Database-as-a-Service provider able to deliver knowledgeable support

It’s estimated that dictionaries, despite their size, are missing more than half of the total words used in the English language*. Many of these are valuable and uniquely expressive words, which simply haven’t been collected and compiled due to limitations in the ways traditional dictionaries are produced. Culturally, dictionaries are often viewed as gatekeepers to the language, and that inclusion in a traditional dictionary makes a word “official.” However, every language is the collective creation of those who use it, and even rare or newly-minted words are valid and serve a purpose. Recognizing this, Wordnik was founded in 2008 with the goal of producing a truly complete English dictionary. Wordnik escapes the limitations of traditional dictionaries and human curators by relying on text mining and machine learning to recognize unique words, and define them through examples of their usage.

To achieve Wordnik’s ambitious goal, it needed a database solution able to accept and handle massive amounts of data from disparate sources. Wordnik selected MongoDB as its database of choice because of its ability to store unstructured data. As Wordnik has continued gathering information from an expanding array of sources over the years (including traditional definitions and thesaurus content, user comments, tags, and lists, and example sentences), MongoDB’s flexibility in handling different data types has continued to be a wise choice.

Wordnik’s undertaking has also meant handling data on a vast scale. For perspective, the Oxford English Dictionary has between 800,000 and one million words; after four months in operation, Wordnik built an alpha version with data for four million words. And Wordnik grew from there, launching a public beta in 2009 and an open source public API for dictionary data in 2010. At this point, Wordnik’s small staff was maintaining seven app servers and running its MongoDB databases – while also supporting 20,000 developers using the Wordnik API.

In October 2014, Wordnik incorporated as a non-profit, and sought the support of a MongoDB hosting platform to handle the transition to hosted database servers by the end of that year. Knowing that the organization’s staff would have a lot of questions, Wordnik looked for a provider that could offer dependable and timely database support. Tools and assistance for overcoming technical and scalability issues, as well as affordability, were also key considerations.

Solution

mLab delivers the trusted and supportive service Wordnik needed to proceed in growing its product with full confidence

The platform’s database tools help Wordnik to introduce and improve upon important features. Wordnik selected mLab based on its reputation as a highly supportive partner with many years of experience. When Wordnik wants to create a new application feature or gather insight about their data, they can rely on mLab to provide database guidance. mLab also helps Wordnik ensure that maintenance events go smoothly; mLab provides estimates for database operations –such as upgrades – and keeps the Wordnik team updated on each step of the process. The mLab support team’s MongoDB expertise has proved invaluable.

mLab’s analytics tools and data browser play a big role in making MongoDB easy to optimize and use. Wordnik relies on mLab’s Telemetry service to keep an eye on overall database health and uses the data browser extensively when working with the database. The data browser allows Wordnik to easily query against the database in order to review user comments, input new dictionary data, and implement a popular Word of the Day page and API. Wordnik also uses the data browser to perform light analytics, such as viewing recently favorited words.

We couldn’t have made Wordnik what it is without the help of managed services like mLab. I touch mLab’s site everyday, managing features that the platform’s tools and expert support make possible. We’ve had an occasion when there was a bug in the version of MongoDB we were using, and mLab’s support person was instantly on top of it (and on a Sunday morning). Things happen; we’ve had replication errors, had AWS kill a database instance, etc. But every time mLab has pinged us to say ‘don’t worry, we’re on it.’ mLab absolutely provides us with the comfort and certainty we need.

Erin McKean founder, Wordnik

Benefits

Supported by mLab, Wordnik continues to expand the world’s largest dictionary while setting its sights on new features – and even new languages

Wordnik now houses over eight million unique words with real-world example uses, and many also have traditional definitions, pronunciations, synonyms, hyponyms, wordmaps, images, and more. Writers and students, in particular, use the service extensively as a powerful resource for gaining a feel for words beyond just their meanings. It is also popular for prospective students studying for tests like the GRE. The Wordnik API is widely utilized by services providing Twitterbots, word games, test study services, and more. Looking forward, Wordnik is working to enable more granular API calls, and has an eye towards making a template to allow for new Wordnik-style dictionary projects for other languages.

* From the research paper: Quantitative Analysis of Culture Using Millions of Digitized Books By Jean-Baptiste Michel, Yuan Kui Shen, Aviva Presser Aiden, Adrian Veres, Matthew K. Gray, The Google Books Team, Joseph P. Pickett, Dale Hoiberg, Dan Clancy, Peter Norvig, Jon Orwant, Steven Pinker, Martin A. Nowak, Erez Lieberman Aiden Science 14 Jan 2011 : 176-182
More Case Studies
View all case studies