15.ai Creator reveals journey from MIT Project to internet phenomenon

By Yusuf Temitope

10 December 2024 | 1:49 am

When 15.ai first appeared in 2020, it quickly captured the attention of creators, fans, and anyone fascinated by AI’s newfound ability to generate emotive, pitch-perfect character voices. Its developer—known simply as “15”—recently broke a long silence to share how a project initially conceived during undergraduate years at the Massachusetts Institute of Technology (MIT) evolved into…

https://t.co/badYVntKCZ

— 15 (@fifteenai) December 7, 2024

In 2016, “15” was a first-year undergraduate student at MIT captivated by the possibilities of synthetic speech. The AI world was changing fast: DeepMind’s WaveNet had provided a glimpse of what was possible in advanced voice modelling, while other researchers were just beginning to explore text-to-speech systems that needed only a fraction of the usual data. In this time period, reducing the amount of required training audio without sacrificing quality became a critical hurdle for researchers in the field. As “15” refined new ways to drastically reduce the amount of audio required to generate believable speech, the idea emerged of pursuing this research further in his future PhD dissertation.

By 2019, the developer had reduced the amount of data necessary to replicate the results of WaveNet and Google’s Tacotron with only a quarter of the data. However, “15” soon co-founded a startup that was later accepted into the Y Combinator accelerator—an opportunity that pulled them away from academia for a while. During that period, the text-to-speech field advanced at a breakneck pace, with Microsoft Research’s FastSpeech, HiFi-GAN, Glow-TTS, and Chinese tech giants Baidu and ByteDance all contributing major breakthroughs.

By the time “15” wrapped up work at the startup and made a successful exit, the text-to-speech landscape had changed dramatically. Yet the core vision persisted: to prove that high-fidelity speech clones could be crafted from just a snippet of audio, potentially as little as 15 seconds. This audacious claim eventually lent its name to the project—15.ai—hinting at the small-data approach it championed.

A significant part of 15.ai’s uniqueness stemmed from what “15” had developed behind the scenes. One element was the “emotional contextualizer,” which leveraged a neural network known as DeepMoji. Built at the MIT Media Lab, DeepMoji was trained on millions of tweets to recognize nuances like mood, sentiment, and sarcasm. By running user-input text through DeepMoji, 15.ai could interpret whether a line sounded playful, annoyed, or sorrowful, integrating those insights into the synthesized voice. Characters might adopt a sarcastic bite or a gentle earnestness, guided by emotional cues extracted from DeepMoji’s embeddings. This gave the system a dynamism that surpassed typical text-to-speech offerings.

Another standout feature was 15.ai’s support for ARPAbet. Standard TTS solutions often stumble over unusual names, regional terms, or words that share a spelling but differ in pronunciation. With 15.ai, users could directly provide ARPAbet transcriptions in curly braces, enabling fine-grained control over how certain words were spoken—such as distinguishing between “read” (present tense) and “read” (past tense). Combined with DeepMoji’s emotional overlays, ARPAbet support gave 15.ai both the precise phonetics and expressive delivery that made its voices sound surprisingly human.

The other major turning point emerged from an unexpected source: /mlp/, the My Little Pony board on the anonymous forum 4chan. On this forum, the so-called “Pony Preservation Project” meticulously gathered and annotated thousands of lines from the show. Their collaborative effort resulted in a top-tier dataset that “15” promptly employed to train a single unified model. Before long, the system could embody dozens of distinct voices, spanning iconic Team Fortress 2 characters, the Portal series, and even Nickelodeon’s SpongeBob SquarePants, simply by typing in text.

In 2020, “15” finally launched a free website showcasing this research. Visitors typed lines into a box, chose from a roster of familiar characters, and were rewarded with stunningly accurate audio clips that mimicked the original voice actors. Unlike many other TTS platforms at the time, 15.ai utilized a single multi-speaker model—rather than separate models for each character—letting it generalize emotional and stylistic cues across a wide range of voices.

The site’s popularity skyrocketed in 2021, eventually drawing in millions of users daily. Its monthly cloud bills reportedly topped $12,000, covered solely out of pocket by “15” using proceeds from the earlier startup. While well-funded companies offered to buy or license the technology, the creator turned them down, explaining, “Instead, I decided to take matters into my own hands. The best way to get my work noticed was to show it off. No gatekeeping, no barriers – just a free, accessible tool for anyone to use. I wanted to democratize AI research. I wanted to give people something that didn’t require coding skills or expensive hardware, something they could just use and be amazed by.” In “15”’s view, giving everyone the chance to experiment with text-to-speech was more valuable than any corporate partnership.

During this period, 15.ai earned credit for single-handedly popularizing AI voice cloning—often described as “audio deepfakes”—in memes, viral content, and fan-driven media. By bridging believable voices with easy online access, the project significantly broadened the reach of AI speech. Even seasoned sceptics found themselves intrigued by how expressive and adaptable its outputs sounded, often tricking listeners who believed the lines had been recorded by real actors. Others, including professional voice actors, were more unsettled by the technology’s future implications, wary of how such sophisticated cloning might disrupt their industry and raise new ethical dilemmas.

Everything changed in 2022. First, “15” discovered a company called Voiceverse NFT, which had partnered with anime and video game voice actor Troy Baker. Voiceverse had apparently lifted 15.ai-generated lines without authorization and presented them as its own creation. The incident became public when “15” took to Twitter, sharing log file evidence that specific voice samples—later found in Voiceverse promotional materials—were generated using 15.ai. Voiceverse responded by blaming its marketing team for the oversight, but the exchange culminated in a blunt final tweet from “15” that went extremely viral, garnering hundreds of thousands of likes and retweets in support of the developer. Voiceverse’s partnership with Baker ended shortly after.

Soon after, the bigger blow landed: a cease-and-desist order forced 15.ai offline. Although “15” believed AI training fell under fair use—especially for non-commercial projects—particular legal complications made it impossible to keep the website up. Overnight, the tool vanished, leaving a wide community of users scrambling to find similar services.

Looking back, “15” acknowledges that the challenges of running the project alone—shouldering server bills, deflecting unauthorized use, and navigating legal hazards—made it untenable. Yet there’s little regret about declining corporate buyers or refraining from monetizing his website. In “15”’s mind, the sudden popularity of 15.ai had already served its key purpose: it showed that cutting-edge, humanlike speech synthesis need not be confined to deep-pocketed companies or secretive research labs.

As of now, 15.ai remains offline. Fans who once relied on it for everything from goofy memes to full-fledged fan episodes hope the service will re-emerge, albeit with measures to avoid the pitfalls that closed it down. Indeed, “15” has hinted that a future version might sidestep copyright vulnerabilities or at least better address them from the outset. Others have tried to fill the void, but few have replicated the playful, wide-ranging voices or the generosity of a fully free platform.

In retrospect, the bold claim that only 15 seconds of audio might suffice to produce a convincing voice clone proved surprisingly prescient. In 2024, OpenAI introduced a text-to-voice model called “Voice Engine,” explicitly citing a 15-second audio sample as the benchmark for near-perfect replication—a development that not only validates “15”’s early insights but also underscores both the rapid evolution and the ethical complexities of AI-powered voice cloning.

Ultimately, 15.ai is remembered for its big leaps. From a prospective PhD topic at MIT to a worldwide viral tool, the project proved that voice cloning doesn’t need vast corporate budgets to excel. Millions who generated lines for memes, fan productions, or just to hear their favorite characters speak were part of a grand experiment—a preview of an AI future that promises extraordinary creativity alongside serious ethical quandaries.

0 Comments

cancel reply

You must be logged in to post a comment.

Latest

Maritime

Onne port tariff may increase from April 1 - WACT

35 mins ago

The operator of the port terminal at Onne Port in Rivers Stat, the West African Container Terminal (WACT), has stated that there may be an increase in tariffs for operations within the facility starting from April 1, 2025. The operator explained that the relevant agencies have given assent to the terminal operator, owned by WACT,…

Energy

2027: HOSTCOM backs Tinubu for 2nd term, says president has delivered

47 mins ago

Less than two years into President Bola Tinubu’s first tenure in office, the Host Communities of Nigeria Producing Oil and Gas (HOSTCOM), backed by their Grand Patron, High Chief Government Ekpemupolo (Tompolo), have declared full support for his re-election bid in 2027. Citing his administration’s economic and security achievements, HOSTCOM has passed a vote of…

News

Makinde, Ladoja inspect Ibadan circular road project

1 hour ago

Governor Seyi Makinde of Oyo State on Monday conducted a former governor of the state and Otun Olubadan of Ibadanland, Senator Rashidi Ladoja, around the Ibadan Circular Road Project, named after him. The two leaders inspected the 32.2-kilometre East Wing of the project, whose infrastructure component is nearing completion. The East Wing end connects the…

News

Trump confident in finding TikTok buyer before deadline

1 hour ago

President Donald Trump again downplayed risks that TikTok is in danger of being banned in the United States, saying he remains confident of finding a buyer for the app's US business by a Friday deadline. The hugely popular video-sharing app, which has over 170 million American users, is under threat from a law that passed…

News

No excuses' for tired Forest against Man Utd, says Nuno

1 hour ago

'No excuses' for tired Forest against Man Utd, says Nuno London, United Kingdom Nuno Espirito Santo says high-flying Nottingham Forest will adopt a "no-excuses" policy for their Premier League clash with Manchester United on Tuesday after a draining FA Cup quarter-final at Brighton. Forest won Saturday's match at the Amex on…

Business

NSIA generates ₦3.74tr as revenue in 12 years

1 hour ago

THE Nigeria Sovereign Investment Authority (NSIA) generated about ₦3.74 trillion as of the end of 2024 marking a sustained 12 years of profitability. This is contained in its audited financial results for the 2024 financial year which was released in Abuja on Monday. The organisation said the landmark profit underscores its resilience of investment strategy…

News

Eledumare Festival honours God, boosts Nigeria’s tourism -Gani Adams

1 hour ago

The Aare Ọnà Kankanfo of Yoruba land, Iba Gani Adams, has said the 13th edition of the Eledumare Festival aims to appreciate God’s blessings in the lives of Nigerians and boost the country’s tourism through various cultural activities in host communities. Speaking on Monday at the opening of the 21-day festival, which runs until April…

Metro

Uromi Killings: 14 arrested, moved to Abuja as Okpebholo vows justice

1 hour ago

In the wake of the gruesome killings of innocent travellers in Uromi, Edo State, Governor Monday Okpebholo has assured that justice will be served, revealing that 14 suspects have been arrested and are being moved to Abuja for further interrogation. The victims, hunters traveling from Port Harcourt, Rivers State to Kano State for Sallah celebrations,…

World

UAE sentences three to death, one to life for Israeli rabbi murder

1 hour ago

An Abu Dhabi court sentenced three people to death and one to life in prison on Monday for the November murder "with terrorist intention" of an Israeli rabbi, state media said. The ruling came about four months after Tzvi Kogan's death dealt a blow to the tiny Jewish and Israeli communities in Muslim-majority UAE. The…

News

Rivers: TNP pipeline restored after explosion

2 hours ago

The Trans Niger Pipeline (TNP), which recently exploded, has been restored to full function, and usual supplies to the Bonny oil terminal have resumed, an indigenous pipeline surveillance firm, Pipeline Infrastructure Nigeria Limited (PINL), has stated. According to PINL, the swift intervention was made possible through the cooperation of host communities in the area. PINL's…

15.ai Creator reveals journey from MIT Project to internet phenomenon

In this article

Related

0 Comments

cancel reply

Latest