Earlyworm 2022
In 2022 I decided to take another shot at the idea behind my Earlyworm project from 2015.
Earlyworm Demo - Early 2023
A demo video of the Earlyworm app taken in early 2023.
In 2022 I decided to try to build Earlyworm again in my free time. At first I was taking a similar approach to my previous attempt, but shortly after getting started on the project the generative AI craze started to heat up.
This was super exciting because it fit perfectly with what my goals for Earlyworm were: I wanted foreign language learners to have access to a broader set of content than was currently available in graded readers. With generative AI I found that I was able to create summaries of clusters of related articles which was a great way to discover new trending topics in the media. I also discovered that with some work I could generate those topic summaries and individual article summaries to match the language level of the user. This has been one of the most exciting developments of this project, because it truly allows anyone to read about anything in their target language while maintaining an immersive feeling.
We (my team of 3 and I) built out a recommendation engine that used OpenAI embeddings, Milvus vector database, and a mix of content-based and collaborative filtering recommendation approaches to recommend a stack of articles and trending topics to users. During onboarding, users picked a few topics and articles that they liked that warmed our system up and ensured the first few recommendations would be valid. We used a stacked Tinder-style UI to get clear and explicit feedback on whether a user would bookmark or pass an article or topic. We also used the length of time a user viewed the article and scrolling activity to determine interest.
All content came from public RSS feeds and we did a lot of processing on each article to extract text, categorize the article with an AI model, and extract named entities using yet another AI model. We fine-tuned several of these models on top of text-davinci-003 and also used fine-tuned models to speed up data-labelling to further fine-tune. Summaries were built using a proprietary text extraction process (not chunking) that were then passed to GPT-3.5. Trending topics were derived from a clustering algorithm using several rounds of hierarchical clustering with several filtering mechanisms and rules layered on top.
Other features we supported were text-to-speech using Microsoft’s text to speech SDK, search powered by OpenAI embeddings and Milvus, word lookup powered by MDBG and Baidu dictionaries, and named entity underlining. We built the backend using terraform and kubernetes and deployed on Azure AKS. The frontend was built in Swift using UIKit (not SwiftUI).
For growth-hacking we also built an automated Tweet thread producer that generated a few different styles of tweet threads based on the content we had available in our database. We built a url-forwarding mechanism that let us record traffic to each shared link on Twitter that we used to feed back into our recommendation engine.
I actively worked on the project for 1 year from Winter 2022 - Winter 2023. At its peak the project served >500 monthly active users. The app and website are still live at www.earlyworm.io although the app content has stopped updating with the recent termination of text-davinci-003 by OpenAI.