A language learning app that uses generative AI to match real-world content to your proficiency level.
Earlyworm gives foreign language learners access to real-world content at customized difficulty levels. Instead of textbook exercises, you read AI-generated summaries of clusters of related news articles, matched to your proficiency—or take any individual article and have the AI rewrite it to your level. It was a great way to discover trending topics in your target language while actually learning.
The recommendation engine used OpenAI embeddings and a Milvus vector database, combining content-based and collaborative filtering. A Tinder-style stacked UI let you bookmark or pass on articles, and the system tracked engagement via view time and scrolling to refine recommendations. Search was powered by the same embedding pipeline, and dictionary lookup integrated MDBG and Baidu.
The content pipeline aggregated RSS feeds, ran custom text extraction, and used hierarchical clustering to group related articles into topics. GPT-3.5 generated summaries at multiple difficulty levels, and fine-tuned text-davinci-003 models handled the content matching. The app reached over 500 downloads before I wound it down when I left Meta to focus on other projects.
The backend runs on Azure AKS with 12+ microservices orchestrated by Kubernetes. The ingestion pipeline starts with an RSS Fetcher (blue/green deployed) that pulls from hundreds of feeds, followed by an Article Processor that fans out to three Python services: a Classifier (17 categories), a Named Entity Extractor (people, organizations, locations), and an Embedder that generates OpenAI vectors stored in Milvus. A Trends Service runs every 6 hours as a CronJob, using TF-IDF hierarchical clustering to group related articles and GPT-3.5 to generate summaries. The recommendation pipeline has two stages: a Recommendation Input CronJob that builds user preference vectors from 150 days of interaction data, and a Feed Recommender Flask service that combines Milvus vector similarity with collaborative filtering. Nine separate Redis instances handle caching, sessions, rate limiting, job queues, and read tracking. The main API server is a Node.js/Koa app with blue/green deployments behind an Azure Application Gateway.

Native Article w/ Word Lookup

Article Feed w/ Word Lookup

Article Feed v2 w/ Recommendation System
Browsing trending topics, reading AI-rewritten articles, and discovering content matched to your proficiency level.

Reddit post about Earlyworm.

Favorable review tweet about Earlyworm.

User growth and engagement metrics.