Preventing Deep Fake Attacks in Voice biometric Systems

Aayush Kamora

April 24, 2023

New more powerful AI technology can now be used to create digital voices that sound like a real human. The ability to clone a person’s voice is often referred to as a “deep fake”.

Organizations and individuals are right to be concerned that these artificial voice generators might be used to attack services that are protected by Voice Biometric security systems. Recently a journalist demonstrated how he was able to break into his own bank account using an AI-generated voice. In this blog, we will discuss the current state of deep fake attacks and some of the strategies to defend against them.

The first thing to acknowledge is that whilst the voice cloning technology is improving there haven’t been any reports of bad actors using cloning technology to actually carry out malicious activity yet! It is so much easier for criminals to target organizations still relying on Name, Address, Date of Birth and, last four digits of your social security number. Perpetrators of fraud know how these legacy systems can be thwarted. It is much easier to use simple easy to beat systems that just rely on a few easily obtained knowledge based questions. Most bad actors don’t want to spend hours or even days trying to source sufficient audio from the intended target to create a deep fake voice using sophisticated Voice generators. However, rather than wait for the sophisticated attacker to breach a voice biometric system and get access to personal data, what steps should be taken now to defeat the bad actors before they start?

Auraya, a global leader in voice biometric technology, advocates a layered approach to all security systems. Their EVA solution provides a multifactor system where the first security layer is the device that is being used to attempt the access. The device can be a smartphone, PC, laptop or old-fashioned phone if it is being used to call a contact center. Is the device trusted? Is the device associated with the identity whose account is being accessed? If the device is trusted this can be considered the first factor in a multifactor verification process. Can the person using the device say their own phone number or account number? This spoken knowledge based test helps protect against brute force account access attempts and enables the spoken utterance to be used to biometrically verify. The correctly spoken answer can be considered another layer of security. Did the voice that said the individual account number or phone number biometrically match the trusted voice print that is securely stored in the voice print vault within the authorizing organization? This is a very powerful security factor because until this biometric check most of the security factors were reliant on the smartphone or other device not being compromised or in the control of a bad actor.

These 3 layers of security are sufficient for all but the riskiest access requests. But what happens if the organization wants to prevent a deep fake recording to pass the voice biometric test or what happens if the user can’t use their trusted device? How do they prove their bona fides? If the authorized user has created a voice print so their identity can be confirmed by their service provider they can simply be asked more questions that not only need to be correct but they need to spoken by the authorized voice within restricted time frames that a human can respond to but much more difficult for even the most sophisticated user of deep fake technology. This passive voice verification process could escalate past a voice BOT to include the user needing to speak with a skilled call center agent where all of the conversation is biometrically scored to ensure the authorized person’s voice print matches during the conversation. This is a particularly difficult process to get a deep fake voice to achieve.

Sophisticated highly secure systems can also include sending a one time passcode (OTP) to a secondary trusted device and requiring that OTP to be spoken ensuring the correct number is spoken by the person who matches the trusted voice print.

Yet another step against deep fake attacks can be detecting known voice synthesizers. This means identifying the software used to create deep fake voices and preventing access to accounts or services that use these voice generators. This is a process where the Voice biometric AI recognizes that the voice sounds like the authorized person; however the audio also matches an underlying model that recognizes that this voice is being artificially generated.

Selecting the most appropriate security process that meets the risk threshold for the type of service or access requests ensures that the convenience and security tradeoff can be managed for every access request.

It is worth noting that adding additional complexity to the enrollment or verification process can come at a cost. Overly complex security measures can result in a lower percentage of successful enrollments and lower success rates of verifications.

The final point to make is no system is ever 100% secure: complex passwords are hacked, stolen, lost and compromised everyday, answers to knowledge-based questions and passwords are available on lots of websites that criminals have access to. Understanding the vulnerabilities and layering protection to achieve the desired outcomes. Voice Biometrics provides a sophisticated secure yet convenient additional weapon in the battle to safeguard personal information and access to systems.

If criminals can easily access vulnerable but valuable accounts from organizations that still rely on traditional and susceptible processes like tell me your name, address and date of birth then almost all attackers will focus on the easier targets.

Find out more here – https://aurayasystems.com/resources/protecting-against-deepfake-spoof-attempts-and-recorded-playback-attacks