ElevenLabs, an AI startup, has announced the closing of a $19 million dollar Series A round and major updates to its platform, which includes an AI Speech Classifier to combat the misuse of its voice-generating technology. The platform has supported text-to-speech generation and voice cloning since its beta launch in January and has accumulated over one million registered users. The voice-cloning tool, which takes snippets of a person’s voice to generate new audio, has been used for nefarious means, making public figures seem like they are saying horrible, discriminatory statements. To combat this, ElevenLabs has released an AI Speech Classifier that will be able to decipher whether uploaded audio contains AI-generated audio from ElevenLabs or not.
The company suggested potential ways to combat the issue such as additional account verification, verifying copyright to the voice, moving voice cloning to a paid tier, and even manually verifying each request. The release of the AI Speech Classifier is the latest step in the company’s push for transparency, and it is a cornerstone of their commitment to creating a safe generative media landscape. According to a previous post announcing the tool, the tool maintains over 99% accuracy in identifying when the audio is unmodified. However, if the audio underwent Codec or reverb transformations, accuracy drops to over 90% accuracy, and the more the content has been processed, the more the accuracy drops, according to the release.
This tool won’t prevent misuse and may simply help clear up the confusion after the initial harm is done. Its effectiveness in solving the issue is questionable, but it’s a small step. This isn’t the first time AI-generation technology has been misused to target public figures. For example, an AI music generator was able to generate a Drake and The Weekend collaboration that sounded real although neither artist was actually on the track. AI art and image generators have also been used to generate fake, realistic images of public figures doing certain activities. Some of these images have been used negatively as political propaganda while others have just been used for entertainment purposes, such as the meme of Pope Francis in a puffer coat.
In addition to the AI Speech Classifier, ElevenLabs also announced the arrival of “Projects” to its suite of products. “Projects” is a workflow for editing and creating long-form spoken content available for early access now. It is meant to serve as a one-stop shop for audio-editing needs and provide a “Google Docs level of simplicity” to audio creation, according to the release. The addition of the “Projects” feature is similar to those we have seen from other creativity platforms, such as Vimeo, TikTok, and Adobe Express. The goal of all of these platforms is to implement AI in a way that optimizes user workflow and allows for easier, optimized creation of content.
Generative AI has the ability to generate all types of content including text, art, images, and even speech. ElevenLabs’ voice-generating technology has had both positive and negative implications. Some of the positive uses, as delineated by ElevenLabs, include “independent authors creating audiobooks, developers voicing characters in video games, supporting the visually impaired to access online written content, and powering the world’s first AI radio channel.” Although these use cases are positive and advance the business processes of many different industries, there have been equally detrimental applications.
AI technology has the potential to be misused and cause harm, but it also has the potential to revolutionize the way we create and consume content. It is up to companies like ElevenLabs to take responsibility and create safeguards to prevent misuse. The release of the AI Speech Classifier is a step in the right direction, but it is not a solution to the problem. It will take a collective effort from all stakeholders to ensure that AI technology is used for good and not for harm.