New AI technique sounding out audio deepfakes

Researchers from Australia’s national science agency CSIRO, Federation University Australia and RMIT University have developed a method to improve the detection of audio deepfakes.

The new technology, Rehearsal with Auxiliary-Informed Sampling (RAIS), is designed to detect audio deepfakes – a growing threat in cybercrime risks such as bypassing voice-based biometric authentication systems, impersonation and disinformation. It determines whether an audio clip is real or artificially generated (“deepfake”) and maintains performance over time as attack types evolve.

In Italy earlier this year, an AI-clone voice of the Italian Defense Minister demanded a “ransom” of €1 million from business leaders, convincing some to pay. This is just one of many examples that highlight the need for audio deepfake detectors.

With the rapid advancement of fake audio technology, the newer “fake technologies” often don’t sound like the older ones.

“We want these detection systems to learn new deepfakes without having to train the model again from scratch,” said co-author Dr Christine Moore from Data61 at CSIRO. “If you just fine-tune the new samples, it will cause the model to forget the older deepfakes it knew before.”

“RAIS solves this problem by automatically selecting and storing a small, diverse set of past examples, including hidden audio features that humans might not even notice, to help AI learn new deepfake patterns without forgetting old ones,” Dr. Moore explained.

RAIS uses an intelligent selection process supported by a network that creates “auxiliary labels” for each audio sample. These classifications help identify a diverse and representative set of vocal samples to retain and practice. By incorporating additional labels beyond simple “pseudo” or “real” labels, RAIS ensures a richer mix of training data, improving its ability to remember and adapt over time.

Outperforming other methods, RAIS achieves the lowest average error rate of 1.95 percent across a series of five experiments. The code is available at github,remains efficient with a small memory buffer, and is designed to maintain accuracy as attacks become more complex.

“Audio deepfake technology is evolving rapidly, and traditional detection methods cannot keep up,” said Faleh Josi Vibriananto, a recent PhD graduate at Australia’s Federation University.

“RAIS helps the model retain what it has learned and adapt to new attacks. Overall, it reduces the risk of forgetting and enhances its ability to detect deepfakes.”

“Our approach not only enhances detection performance, but makes continuous learning practical for real-world applications. By capturing the full diversity of acoustic signals, RAIS sets a new standard for efficiency and reliability,” said Dr. Moore.

Read and download the full paper: Rehearsal with auxiliary informed sampling for audio deepfake detection.



1763823388 349 New AI technique sounding out audio deepfakes

1763823388 349 New AI technique sounding out audio deepfakes

CSIRO

Leave a Reply