EarlywormEarlyworm

2023FounderRetired

A language learning app that uses generative AI to match real-world content to your proficiency level.

Earlyworm gives foreign language learners access to real-world content at customized difficulty levels. Instead of textbook exercises, you read AI-generated summaries of clusters of related news articles, matched to your proficiency—or take any individual article and have the AI rewrite it to your level. It was a great way to discover trending topics in your target language while actually learning.

The recommendation engine used OpenAI embeddings and a Milvus vector database, combining content-based and collaborative filtering. A Tinder-style stacked UI let you bookmark or pass on articles, and the system tracked engagement via view time and scrolling to refine recommendations. Search was powered by the same embedding pipeline, and dictionary lookup integrated MDBG and Baidu.

The content pipeline aggregated RSS feeds, ran custom text extraction, and used hierarchical clustering to group related articles into topics. GPT-3.5 generated summaries at multiple difficulty levels, and fine-tuned text-davinci-003 models handled the content matching. The app reached over 500 downloads before I wound it down when I left Meta to focus on other projects.

Backend Architecture

The backend runs on Azure AKS with 12+ microservices orchestrated by Kubernetes. The ingestion pipeline starts with an RSS Fetcher (blue/green deployed) that pulls from hundreds of feeds, followed by an Article Processor that fans out to three Python services: a Classifier (17 categories), a Named Entity Extractor (people, organizations, locations), and an Embedder that generates OpenAI vectors stored in Milvus. A Trends Service runs every 6 hours as a CronJob, using TF-IDF hierarchical clustering to group related articles and GPT-3.5 to generate summaries. The recommendation pipeline has two stages: a Recommendation Input CronJob that builds user preference vectors from 150 days of interaction data, and a Feed Recommender Flask service that combines Milvus vector similarity with collaborative filtering. Nine separate Redis instances handle caching, sessions, rate limiting, job queues, and read tracking. The main API server is a Node.js/Koa app with blue/green deployments behind an Azure Application Gateway.

Screenshots
Native Article w/ Word Lookup

Native Article w/ Word Lookup

Article Feed w/ Word Lookup

Article Feed w/ Word Lookup

Article Feed v2 w/ Recommendation System

Article Feed v2 w/ Recommendation System

Gallery

Browsing trending topics, reading AI-rewritten articles, and discovering content matched to your proficiency level.

Reddit

Reddit post about Earlyworm.

Review

Favorable review tweet about Earlyworm.

Traction

User growth and engagement metrics.

Stack
SwiftUIKitNode.jsPythonTerraformKubernetesAzure AKSOpenAI APIMilvus
Features
  • AI-rewritten articles matched to your proficiency level
  • Recommendation engine with OpenAI embeddings + Milvus
  • Tinder-style content discovery UI
  • Text-to-speech via Microsoft SDK
  • Dictionary lookup with MDBG and Baidu
  • Automated tweet thread generator for growth