The silent revolution

David Coode, ON Semiconductor

In order to save cost and time, face-to-face meetings and conversations in both our business and personal lives are less frequent than they used to be. Mobile phones and voice over internet technology (VoIP) have been the enablers for this significant change in how we interface with one another. The quality of sound and the suppression of noise are crucial to ensure a good user experience with voice communications.

ON Semiconductor - BelaSigna R261

Seldom these days do we ever experience true quiet, and we have even become so accustomed to the noise that most of us don’t even notice it anymore. The human brain can do an outstanding job of filtering out the noise that we are exposed to, hearing it all, but only listening to what we’re interested in. However, as the world becomes an ever noisier place and the use of voice communications with mobile phones, laptops, and webcams increases, it is more difficult to filter out all the noise.

Due to rapidly advancing electronics technology, several approaches and potential solutions now exist to manage noise and improve voice clarity. The novel ideas from research over recent years have been translated into market-ready products to solve real problems. In fact, we are now at a proliferation stage where many new solutions are available. The effectiveness of various solutions can vary dramatically, and in most cases the impression of what constitutes good clean communications is contextual and subjective. It can be difficult to get a clear picture of how one solution compares to another, and which is better for a given application.

The value of a technology solution to improve the communication capability of a laptop, for instance, is highly related to the context in which that laptop is expected to be used. The user of a notebook for a Skype call will want the notebook to pick up only his voice and suppress background noise, whereas a student using the same notebook to record a lecture will want to effectively pull speech from any location out of the ambient noise of the lecture hall. A given solution may be judged effective in one scenario and a failure in the other. A compromise solution may perform suboptimally in both, but provide value to both users.

Mapping the technical solutions available to the use context is hard enough, but it can be even more challenging to effectively explain audio differentiation to a consumer at the retail level when all the products on the shelf can tritely write “great audio performance” in their product marketing material. With very few audio demo opportunities available before a retail sale, consumers are often left to chance on first purchases.

Comparing noise reduction technologies

The technology that provides noise reduction solutions breaks into three broad classes: electroacoustic, analog, and digital.

Electroacoustic solutions

Electroacoustic solutions involve microphone element design, the selection and placement of these microphones in products and the acoustic coupling design for the microphone mounts. Noise-cancelling or gradient microphones are a simple example of an inexpensive solution that can give a moderate benefit in some situations. Good electroacoustic design is important to get good performance out of any voice communications device, but that base performance can be significantly improved further through the additional use of modern digital and analog circuits.

Analog solutions

Analog solutions involve some direct manipulation of the electrical signals that are produced by a microphone or an array of microphones. Simple solutions such as compression or directional “time-of-arrival” type processing may be more efficient in an analog form since they avoid the digital conversion stages. However, manufacturing variances inherent in semiconductor processes directly affect the performance of an analog solution in a way that digital processes are designed to avoid.

As analog solutions become more complex in a bid to deliver more value, the performance variance at each processing step will compound with each subsequent step. This has effectively kept any successful analog audio products relatively simple. Analog solutions also lack the functional flexibility that is possible with a digital solution, since analog systems implement processing within the silicon design itself rather than as a software layer over a flexible foundation.

Digital solutions

Digital solutions involve the sampling or quantization of the electrical signal from the microphone so that computer processors can apply repeatable algorithms on the signal. This is then either transmitted in digital form or reconstructed as an improved analog representation of the captured speech. Since the digital solutions seem to have many inherent benefits with today’s silicon technology, it is not surprising that most of the available solutions fall into this class.

Digital solutions can realize any algorithm in order to reduce noise or improve the quality of speech as picked up by a microphone. Typically, these algorithms consist of spatial selectivity (where is the speech coming from?), temporal selectivity (when is there speech or not speech?) and frequency selectivity (is the speech at higher or lower pitch than the noise?).

Some solutions only focus on one of these aspects, but the best will use a combination of them. Additional refinements can be added in the form of gain control, advanced environmental modeling, or other concepts.

Spatial selectivity

A solution that relies heavily on spatial selectivity — also known as beamforming or directional processing — will be well suited to an application or use where the speaker is in a known location relative to the microphones. Such approaches are used in notebook computers and in mobile phones, but carry inherent disadvantages with their benefits.

In notebook computers, this scenario is well-suited for video calling where the sound pickup is confined to the direction of the camera, but it doesn’t allow for the same computer to be used as a conference phone with several people around a table. In mobile phones, the location of the speech is usually very tightly constrained to get a dramatic effect in ambient noise reduction, but this means that the voice is also dropped too if the phone isn’t held in just the right position.

In contrast, a solution that relies on the statistics of human speech to make ongoing instantaneous decisions about what should be kept as speech and what should be filtered out as noise will be able to handle a broader scope of uses effectively. Unfortunately, these solutions are never completely sure about their decisions to classify the speech against the noise, and the more aggressively they are tuned, the more distortions a user will hear as parts of the speech get filtered out as a result of these misclassifications.

Typically, intelligibility is maintained while naturalness can suffer. On a mobile phone, this may not matter much since the naturalness is already degraded by the wireless network, but in other applications such as a voice recorder it may be of critical importance.

Blended algorithms

The best digital solutions tend to be blended algorithms that take pieces of all the approaches and combine them in an intelligent way. These solutions can often adapt to different circumstances, but also often add a heavier burden of tuning or customizing a more complex technology for each product design.

For example, ON Semiconductor’s BelaSigna R261 single-chip digital noise reduction processor are representative of the latest digital technology available to give clean voice capture. The device’s ultraminiature SoC format and low power consumption fit well with the needs of the latest small form factor portable voice products.

BelaSigna R261 has an advanced two-microphone noise reduction algorithm that improves perceived speech quality while preserving naturalness. Offered with a range of prototyping tools, it is an example of the easier-to-design-in solutions that are becoming readily available to today’s consumer electronics product designers.

The silent revolution
Click to enlarge

Digital solutions such as the the BelaSigna R261 can realize any algorithm in order to reduce noise or improve the quality of speech as picked up by a microphone. Typically, these algorithms consist of spatial selectivity (where is the speech coming from?), temporal selectivity (when is there speech or not speech?) and frequency selectivity (is the speech at higher or lower pitch than the noise?).

An engineer selecting a technical solution to improve voice quality in a product needs to consider the impact on product design beyond the audio performance merits of a given solution. Some solutions will demand specific microphone types or demand specific microphone placement and acoustic design which can compromise the overall industrial or mechanical design of the product. Some solutions may draw an unacceptably high level of power from the battery of a portable device or not fit into the available space on a PCB. In almost any design, cost will also be a deciding factor when developing the design.

Currently there is no universal standard by which the solutions can be compared. Product designers have a challenging task to interpret the audio performance needs of their specific market and translate that into the best technology choice for their product. However, the latest digital noise reduction solutions offer small chips, low power consumption and advanced algorithms. These solutions give some great options to choose from when designing products for clear high-quality voice communications.

www2.electronicproducts.com