Study Reveals Humans Struggle to Detect Deepfake Speech, Even with Training

A recent study published in PLOS ONE has unveiled that humans can accurately identify only 73% of deepfake speech samples, highlighting the challenge of distinguishing artificially generated speech. This research is the first to explore humans’ ability to differentiate deepfake speech in languages beyond English.

Deepfakes, a product of generative artificial intelligence, imitate real individuals’ voices and appearances using machine learning algorithms. These algorithms, now accessible and easily trainable, can replicate a person’s voice with a mere three-second audio clip.

Apple has also ventured into voice cloning, introducing a feature capable of mimicking a user’s voice through just 15 minutes of audio.

The University College London study involved creating deepfake speech samples in English and Mandarin using a text-to-speech algorithm. Despite training participants to identify deepfake speech, they could accurately spot fake speech only 73% of the time.

Lead author Kimberly Mai from UCL Computer Science highlighted that the study’s findings underscore the inability of humans to reliably detect deepfake speech, even with training. The study raises concerns about advanced technologies’ potential to produce more convincing deepfake speech in the future.

The researchers now focus on enhancing automated speech detectors to counter the threats posed by synthetic audio and imagery. While generative AI audio technology can have positive applications, the concern is growing over its potential misuse, prompting the need for improved detection strategies.

Check out the latest news in our Global News section

Stay updated on environmental data and insights by following KI Data on Twitter