15.ai Creator reveals journey from MIT Project to internet phenomenon
When 15.ai first appeared in 2020, it quickly captured the attention of creators, fans, and anyone fascinated by AI’s newfound ability to generate emotive, pitch-perfect character voices. Its developer—known simply as “15”—recently broke a long silence to share how a project initially conceived during undergraduate years at the Massachusetts Institute of Technology (MIT) evolved into a viral sensation, even after a detour into the startup world.
— 15 (@fifteenai) December 7, 2024
In 2016, “15” was a first-year undergraduate student at MIT captivated by the possibilities of synthetic speech. The AI world was changing fast: DeepMind’s WaveNet had provided a glimpse of what was possible in advanced voice modelling, while other researchers were just beginning to explore text-to-speech systems that needed only a fraction of the usual data. In this time period, reducing the amount of required training audio without sacrificing quality became a critical hurdle for researchers in the field. As “15” refined new ways to drastically reduce the amount of audio required to generate believable speech, the idea emerged of pursuing this research further in his future PhD dissertation.
By 2019, the developer had reduced the amount of data necessary to replicate the results of WaveNet and Google’s Tacotron with only a quarter of the data. However, “15” soon co-founded a startup that was later accepted into the Y Combinator accelerator—an opportunity that pulled them away from academia for a while. During that period, the text-to-speech field advanced at a breakneck pace, with Microsoft Research’s FastSpeech, HiFi-GAN, Glow-TTS, and Chinese tech giants Baidu and ByteDance all contributing major breakthroughs.
By the time “15” wrapped up work at the startup and made a successful exit, the text-to-speech landscape had changed dramatically. Yet the core vision persisted: to prove that high-fidelity speech clones could be crafted from just a snippet of audio, potentially as little as 15 seconds. This audacious claim eventually lent its name to the project—15.ai—hinting at the small-data approach it championed.
A significant part of 15.ai’s uniqueness stemmed from what “15” had developed behind the scenes. One element was the “emotional contextualizer,” which leveraged a neural network known as DeepMoji. Built at the MIT Media Lab, DeepMoji was trained on millions of tweets to recognize nuances like mood, sentiment, and sarcasm. By running user-input text through DeepMoji, 15.ai could interpret whether a line sounded playful, annoyed, or sorrowful, integrating those insights into the synthesized voice. Characters might adopt a sarcastic bite or a gentle earnestness, guided by emotional cues extracted from DeepMoji’s embeddings. This gave the system a dynamism that surpassed typical text-to-speech offerings.
Another standout feature was 15.ai’s support for ARPAbet. Standard TTS solutions often stumble over unusual names, regional terms, or words that share a spelling but differ in pronunciation. With 15.ai, users could directly provide ARPAbet transcriptions in curly braces, enabling fine-grained control over how certain words were spoken—such as distinguishing between “read” (present tense) and “read” (past tense). Combined with DeepMoji’s emotional overlays, ARPAbet support gave 15.ai both the precise phonetics and expressive delivery that made its voices sound surprisingly human.
The other major turning point emerged from an unexpected source: /mlp/, the My Little Pony board on the anonymous forum 4chan. On this forum, the so-called “Pony Preservation Project” meticulously gathered and annotated thousands of lines from the show. Their collaborative effort resulted in a top-tier dataset that “15” promptly employed to train a single unified model. Before long, the system could embody dozens of distinct voices, spanning iconic Team Fortress 2 characters, the Portal series, and even Nickelodeon’s SpongeBob SquarePants, simply by typing in text.
In 2020, “15” finally launched a free website showcasing this research. Visitors typed lines into a box, chose from a roster of familiar characters, and were rewarded with stunningly accurate audio clips that mimicked the original voice actors. Unlike many other TTS platforms at the time, 15.ai utilized a single multi-speaker model—rather than separate models for each character—letting it generalize emotional and stylistic cues across a wide range of voices.
The site’s popularity skyrocketed in 2021, eventually drawing in millions of users daily. Its monthly cloud bills reportedly topped $12,000, covered solely out of pocket by “15” using proceeds from the earlier startup. While well-funded companies offered to buy or license the technology, the creator turned them down, explaining, “Instead, I decided to take matters into my own hands. The best way to get my work noticed was to show it off. No gatekeeping, no barriers – just a free, accessible tool for anyone to use. I wanted to democratize AI research. I wanted to give people something that didn’t require coding skills or expensive hardware, something they could just use and be amazed by.” In “15”’s view, giving everyone the chance to experiment with text-to-speech was more valuable than any corporate partnership.
During this period, 15.ai earned credit for single-handedly popularizing AI voice cloning—often described as “audio deepfakes”—in memes, viral content, and fan-driven media. By bridging believable voices with easy online access, the project significantly broadened the reach of AI speech. Even seasoned sceptics found themselves intrigued by how expressive and adaptable its outputs sounded, often tricking listeners who believed the lines had been recorded by real actors. Others, including professional voice actors, were more unsettled by the technology’s future implications, wary of how such sophisticated cloning might disrupt their industry and raise new ethical dilemmas.
Everything changed in 2022. First, “15” discovered a company called Voiceverse NFT, which had partnered with anime and video game voice actor Troy Baker. Voiceverse had apparently lifted 15.ai-generated lines without authorization and presented them as its own creation. The incident became public when “15” took to Twitter, sharing log file evidence that specific voice samples—later found in Voiceverse promotional materials—were generated using 15.ai. Voiceverse responded by blaming its marketing team for the oversight, but the exchange culminated in a blunt final tweet from “15” that went extremely viral, garnering hundreds of thousands of likes and retweets in support of the developer. Voiceverse’s partnership with Baker ended shortly after.
Soon after, the bigger blow landed: a cease-and-desist order forced 15.ai offline. Although “15” believed AI training fell under fair use—especially for non-commercial projects—particular legal complications made it impossible to keep the website up. Overnight, the tool vanished, leaving a wide community of users scrambling to find similar services.
Looking back, “15” acknowledges that the challenges of running the project alone—shouldering server bills, deflecting unauthorized use, and navigating legal hazards—made it untenable. Yet there’s little regret about declining corporate buyers or refraining from monetizing his website. In “15”’s mind, the sudden popularity of 15.ai had already served its key purpose: it showed that cutting-edge, humanlike speech synthesis need not be confined to deep-pocketed companies or secretive research labs.
As of now, 15.ai remains offline. Fans who once relied on it for everything from goofy memes to full-fledged fan episodes hope the service will re-emerge, albeit with measures to avoid the pitfalls that closed it down. Indeed, “15” has hinted that a future version might sidestep copyright vulnerabilities or at least better address them from the outset. Others have tried to fill the void, but few have replicated the playful, wide-ranging voices or the generosity of a fully free platform.
In retrospect, the bold claim that only 15 seconds of audio might suffice to produce a convincing voice clone proved surprisingly prescient. In 2024, OpenAI introduced a text-to-voice model called “Voice Engine,” explicitly citing a 15-second audio sample as the benchmark for near-perfect replication—a development that not only validates “15”’s early insights but also underscores both the rapid evolution and the ethical complexities of AI-powered voice cloning.
Ultimately, 15.ai is remembered for its big leaps. From a prospective PhD topic at MIT to a worldwide viral tool, the project proved that voice cloning doesn’t need vast corporate budgets to excel. Millions who generated lines for memes, fan productions, or just to hear their favorite characters speak were part of a grand experiment—a preview of an AI future that promises extraordinary creativity alongside serious ethical quandaries.
Get the latest news delivered straight to your inbox every day of the week. Stay informed with the Guardian’s leading coverage of Nigerian and world news, business, technology and sports.
0 Comments
We will review and take appropriate action.