Originally posted on April 18, 2016 and April 21, 2016 on Let’s Do Video.

Recently, it feels like we see more and more companies coming out with microphone array solutions for the conferencing room environment. However, how good are they? What kind of performance can we expect from them? Which buzz words should impress us, and which should we ignore? This article is an attempt to shed some light on the subject. If after reading this article, you get the feeling that this topic is complicated, then I achieved one of my goals, which is to demonstrate just that. The topic of beamforming is a very complex one. These types of solutions have been researched for many years and show a lot of promise, but they don’t always deliver when used in real life situations. When you add to the equation the acoustics of the room, the imperfections of the microphone elements used in the process, and psychoacoustics, you end up with a very complicated puzzle that takes many years of trial and error to resolve.

Front of Condor Microphone Array For Conference Rooms

The Condor is a beamforming microphone array with 15 built-in microphones that provide a pickup range of up to 30ft. This solution is designed to adapt in changes in your environment and adjust accordingly to perform optimally in every meeting. The Condor outperform traditional microphone array because of our proprietary beamforming algorithms.

 

Before we move forward, I’d like to touch a little bit upon my personal involvement with the subject. Decades ago, when I was three years into my career as an Electronic Engineer, I was assigned to develop a DSP-based 96 element beamformer. This was an underwater hydrophone (the underwater equivalent of a microphone) array assigned to listen, detect, and track noise sources from large distances. My entire professional career beyond that assignment has revolved around designing and developing beamformers. Everything from time domain beamforming to frequency domain, adaptive, delay and sum, super directional, constant beam-width, broadside, and end-fire. You name it, we’ve done it.

Looking back at my design approach towards my lifelong companion, the beamformer, it’s pretty evident that I’ve gone full circle. I started with incorporating super aggressive high-performance methods. After which, I mellowed them down and even abandoned them for a while, only to incorporate them back in with a more aggressive approach. This time, I was armed with loads of experience.

The reasoning behind these transitions was that traditionally these algorithms were used in underwater applications where sound waves are much more “behaved” and predictable. Advanced and aggressive algorithms worked well in this environment. The implementation of these advanced techniques received a boost in the early eighties with the appearance of the DSP, a processor optimized and dedicated for the kind of mathematical operations repeatedly used with these algorithms.

In the late eighties, several companies (our team included) took the concept out of the water and into the air to develop beamformers for military applications. These were used to detect and find the direction of firearms, rockets, mortars, as well as for surveillance purposes. A few of my colleagues and I took a personal leap of faith and started a company that took these attempts and developed them for civil applications, including hearing aids, dictation, conferencing, and many more. Several years into this era, we came to the realization that the performance we managed to obtain was excellent for some applications, such as surveillance and to a smaller degree automotive. However, conferencing proved to be disappointing and not adequate. At that point, we took a step back in our endeavors and started to focus on other audio enhancement techniques that were more likely to be accepted. We’ve now gone a full circle and are back to developing large array solutions. I believe our approach is more mature, more experienced, and the results are more encouraging.

So, what are beamformers? What are they good for? What are they not good for? What are the differences? Which of the buzz words you run into should impress you? Which are less meaningful?

Wikipedia defines Beamforming as “…a signal processing technique used in sensor arrays for directional signal transmission or reception. This is achieved by combining elements in a phased array in such a way that signals at particular angles experience constructive interference while others experience destructive interference.”

Hearing_RoyaltyFreeIn a more illustrated way, a microphone picks up sound waves without any distinction, both wanted ones (we’ll call these signals) and unwanted ones (we’ll refer to these as noises). A more sensitive microphone will pick up the same sound waves as a non-sensitive microphone. While it will produce a higher level of electronic signals, it will still pick up the same mixture of good and bad sound waves. Our goal is to eliminate as much noise and leave as much signal as possible. One way to do this is to limit our listening to a single direction, just like holding up our palm to our ear to hear better.

Myth #1: A more sensitive microphone will improve the pickup range

Not true. Well, there is a little truth to it, but very little. A more sensitive microphone produces higher level electronic signals, therefore it’s less susceptible to poorly designed amplification. However, you can obtain the same range with an insensitive microphone. A more important parameter for a microphone is its noise floor, or dynamic range, but these are critical in professional studio applications where the surrounding noise is minimal and below the microphone’s self-noise.

Before getting to the “how”, let’s talk about the “what for”. In other words, let’s assume for a minute that we know how to create a device that listens to one direction and eliminates all the sound waves that arrive from any other direction. Theoretically, utilizing this technology, we will pick up the wanted voice and eliminate all the sources coming from all other directions. This includes noises, but also reflections and reverberations arriving from different directions. Sound great! However, there’s a catch – some interferences and noises originated from unwanted directions will be reflected and will arrive at the array from the direction we’re actually listening to.

If this sounds a little confusing, let’s try a different way to illustrate. In an open space, free of any reflections, the sound waves will propagate from the source to the target in a straight line. If we listen to that direction, that’s what we’ll hear. In a more confined environment, full of reflections, the waves propagate from the source to the target through many paths, losing their directionality characteristics. In this case, the direction we’re listening to loses its correlation to the direction of the source. By listening to a certain direction, we’ll hear sources from many directions. The ratio between the levels of the direct path signal to the indirect path noises penetrating our beam is getting worse when the range to the wanted source increases. In other words, the efficiency of the beamforming decreases with distance. It’s still better than no beamforming, but it loses some of its advantage with range.

Myth #2: Beamforming increases the pickup range

True, but the efficiency of the beamforming goes down when the range increases. This is because the ratio of signal to reverb is going down and the beamformer becomes less efficient and less effective.

So, how is the beamforming achieved? Sound sources propagate through air as waves. If we line up an array of microphones, the sound wave will hit the microphones closest to the sound source first, then the other microphones in the order of their distance from the source. If the array is positioned perpendicular to the sound wave, all the microphones will be at the same distance from the source. Therefore, the sound wave will hit them at the same time. When we sum up the signals received by the microphones, those that hit the microphones at the same time will add up and become emphasized. Other signals, originating from other directions, will hit the different microphones at different times and will diminish and even cancel each other out. To sum it up, the array will create a listening beam towards the direction perpendicular to the array which we call broadside.

What are the parameters that determine the sharpness or width of this listening beam? The most important parameters are the total aperture of the array (distance between the two ends of the array) and the frequency of the signal. The wider the array is, the narrower the beam will be for a given frequency. Likewise for a given width, the higher the frequency is, the narrower the beam will be.

I discussed the basic parameters that determine the sharpened or width of the listening beam of an array. As mentioned, the most important parameters are the total aperture of the array (distance between the two ends of the array) and the frequency of the signal. The wider the array is the narrower the beam will be for a given frequency. Likewise for a given width, the higher the frequency is, the narrower the beam will be.

Does it matter how many microphones are in the array? Absolutely! First, the distance between adjacent microphones has to be smaller than a certain distance to prevent ambiguity. We call this aliasing. Secondly, the more microphones we have, the better we average and reduce the non-directional noises, which is an added bonus of the array. Let’s take a look at some of the common myths surrounding these.

Myth #3: The Wider the Array, the Better the Beamforming

Very true. The width (or aperture) of the array and the number of microphones in it, are extremely important indicatives of the potential performance.

What if we wanted to listen to a direction other than broadside? We can either rotate the array or electronically delay the signals coming out of the microphones. This allows signals originating from the wanted direction to be aligned, effectively rotating or steering the array.

These fundamentals describe the physics behind a simple beamforming, many times referred to as “delay-and-sum”. Let’s discuss some of the pros and cons of this simple beamformer, and what other alternatives are available.

Pros: Simple to implement. The process has no negative affect on the quality of audio signal, no artifact, and no processing noise.

Cons: You need a large array to achieve decent directionality. The directionality depends on the frequency, so effectively any signal originating from an off-axis direction will be attenuated by a different level for each frequency. This creates a distortion of the signal, also known as coloring.

BeampatternReality Check: Sound propagates through air at a speed of 340m/sec. As a result, the wavelength of a 500Hz signal (about the middle of a speech spectrum) is 0.6m. If we use a 1m wide array (which is pretty wide), at this frequency, the delay-and-sum beamformer will produce a beam of more than 35 degrees wide at its 3dB points. That’s a 50% drop in power.

How can we improve the performance of the simple delay-and-sum beamformer? For starters, we can reduce the dependency of the beam on the frequency, that thing that causes the coloring of the interferences. We do this by using a method called constant beamwidth design. Instead of just summing up the output of the microphones, we can apply digital filters and then sum the filtered outputs (filter-and-sum beamformer) to alter the shape of the beam. Carefully designed filters can reshape the beams, such that their dependency on frequency will be reduced and eliminated. This is not a simple task, but the tools are available. It’s fair to mention that by reducing the dependency on the frequency, we may widen the beam even further, which brings us to the next challenge.

We’re still left with the need to have very large arrays in order to get a high level of directionality. Available solutions include super-directional arrays based on sophisticated filter designs, as we mentioned before. With the proper filters, we can shape the beam and push it to be narrower. In addition, there are techniques called differential microphone arrays. These rely on the subtraction of the signal of the microphones in an array, instead of summing them up and creating a listening beam which is along the array axis (we call it end-fire). Super directional arrays and differential arrays have their own downfalls. They will probably be more susceptible to a mismatch of the microphone elements and may produce a less flat frequency response.

There are other, non-linear methods, which have been developed to obtain high directionality and a greater level of noise reduction. The basic idea is to determine whether the signal we’re observing originated from the wanted direction, and if not, to eliminate it vigorously. There are many ways to do this. One of them is referred to as adaptive beamforming. This is a technique, or a group of techniques, that was discussed broadly in the late sixties and improved over the years. These more aggressive beamformers eliminate significantly more noise and can produce very narrow beams. They are, however, sensitive to the matching of the microphones and the integrity of the acoustic/mechanical design. Any deficiencies will translate to distortion in the wanted signal. Even if we know how to overcome these potential problems, these solutions will leave noises tainted with artifacts and “processing noise”. Unfortunately, our brain is sensitive to interferences which don’t sound natural. We can better deal with and ignore interferences (noises) which we’re used to hearing. We’re far less capable of dealing with processing noises of a digital nature. As a result, the advantage of getting higher directionality is sometimes masked by the disadvantage of having to deal with distorted signal and leftover noises that we don’t know how to ignore.

Myth #4: Highly Directional Arrays Will Improve Our Pickup Range and Quality

If so, they come with a price. Linear, super-directional methods will color our signal and are more sensitive to a mismatch of the microphone elements. Non-linear adaptive beamformers will improve the ratio between signal and noise, but the result may be less intelligible and less adequate for conferencing. In many cases, they’ll also “attack” the wanted signal itself and cause distortion.

It seems like there’s no silver bullet and wherever we go, we run into downfalls and negative effects. To a large extent, this is true. The key is to use elements from all of the above methods, and use them in moderation. In our team’s quest for the most effective solution, we started many years ago with very aggressive beamformers. Throughout the years, we learned how to tame them down, so that the audio quality would be acceptable for conferencing. We reverted to more traditional linear approaches and are now back to a mixture of all these methods. It’s a very complex stew.

What about two dimensional arrays? What are the advantages or disadvantages? A linear array, in which the microphones are all placed on a line, will provide directionality on the two dimensional plane they are placed on. A horizontal array will have horizontal directionality and no preferences for the vertical position of the source. In order to get directionality in other planes, you need to place microphones on these planes as well. If you place the microphone at the end of conference room, the vertical direction is not important and doesn’t mean much. However, if you place an array on the ceiling, it has to be two dimensional for obvious reasons. Therefore, in some installations, a planar or two dimensional array is a necessity. In other instances, there’s no advantage. Note that planar arrays require up to four times the processing of a linear array, which is a heavy price to pay if it’s not necessary.

What you should know when considering a beamforming solution:

  • Beamforming is good! It eliminates some of the interfering noises. It has the capacity to eliminate non-diffused noises. These are the noises that non-beamforming noise canceling techniques will take care of. It will also reduce the level of reverb you pick up.
  • There are many types of techniques that can be called beamforming. The large variety is reflected in the various performance levels. In other words, beamforming is good, but it’s not all the same. Your vendor isn’t going to make it easy on you to assess all the aspects of their beamformers.
  • Wide aperture and a large number of microphones is important. You can get impressive performance out of a smaller array, but it’s always better to work with a large array with multiple microphones.
  • Evaluating performance isn’t simple. In most cases, the beamforming output is fed into a mixer and goes through AGC. If you want to analyze it scientifically, you would have to freeze the beams, put the array in an anechoic chamber, and run a number of beam patterns. This isn’t something you can typically do, unless you’re the developer. The best approach is to “taste the pudding”. Listen to the output to see if you like it. Try to listen to artifact, switching noises, and all kinds of non-natural effects. Try the array from a distance, but also in close proximity. Introduce noises and see if it’s thrown off. Trust companies that have the experience and knowledge. Do your research on companies that come out with an array solution out of nowhere.

I’d like to wrap up by addressing one final topic. Many times people inquire about “multiple talkers”. The answer to this would be the same if you place many microphones around the table and fed them to a mixer. An unintelligent mixer would sum all the microphones up. By doing so, it would pick up all the talkers simultaneously along with some noise, which isn’t good. A poorly designed mixer would switch between the different microphones, selecting the microphone being used in a way that can be noticed. A well designed mixer will select the best microphone, or a combination of microphones, and at any given moment make the switch sound smooth and seamless. So when multiple people are talking at the same time, a good mixer should select a combination of microphones and modify its selection fast enough while reacting to the current situation at any given moment. If the selection is done quickly and the switching is done smoothly, the output will include all the necessary signals, meaning all the talkers will be picked up with the minimum level of noise. Now, instead of using many microphones around the table, say you use many beams, each looking at a different direction. Your mixer should be able to determine the right beam to use and how to combine it with other beams to make the switching smooth and seamless. The concern about “multiple talkers” is very legitimate, but it’s related to the mixer, not the beamformer itself.

I’m sure some of you are expert beamformer designers and you may have found this article a little too simplified. For those of you whom are less proficient, I hope it explained some of the terms and concepts you run into when coming across yet another microphone array solution. What I hope you take from this is that, it’s not a simple subject. It’s much more complicated because of the external parameters that have to be taken into consideration when implementing these solutions in real-life situations. These include the room acoustics, microphone mismatch, the mechanical design of the array, position of the array, and proximity to other objects, just to name a few. A good microphone array is one that behaves well in most cases, in most locations, for most of the time. This is the art of fine tuning performance and it can only be the result of many years of experience.

If you have any questions or wish to discuss further I’ll be more than happy to try to help. Find me on our company’s blog, or through our contact link on our website.