Voice Tracking Technology: Why Seeing is Believing

By Lumens

June 06, 2025 612

 

“We are visual creatures. Visual things stay put, whereas sounds fade.”Said Harvard psychologist Steven Pinker. 

While this may be controversial (especially with musicians!), researchers have suggested that in a face-to-face conversation at least 50% of communication is non-verbal. That means in meetings, the video part of audio-visual (AV) technology is essential.

Which gives us a problem.

▶ What is Meeting Equity?

Modern video conferencing cameras produce astonishingly clear and bright pictures. Often fitted with clever auto-framing technology, they can automatically zoom in to capture the people in a room, and not the space around them. The result is good, with the focus squarely on the attendees. But it doesn't fully solve a key problem.

For remote participants, it can still be difficult to follow conversations. It's hard to tell who is speaking or responding to whom. This creates an issue called meeting equity—where in-person attendees have a better experience than virtual participants.

▶ The Televisual Solution

Producers have been making compelling TV discussion shows for decades, creating the ideal model for focusing the attention of viewers and effectively telling the story. There are standard elements that broadcasters have replicated across the world, and across the decades. These include:
 
-    Multiple camera angles
-    Wide establishing shots to give context to a discussion
-    Close-up shots of active speaker
-    Framing multiple people during back-and-forth discussions
-    Smart camera switching to get the best angle

 

▶ Can AV match TV?

All of these techniques are available to meetings professionals. In high profile public meetings such as a G20 meeting, the AV equipment recording and broadcasting the summit will be on a par with the technology used by a TV station.

For certain executive meetings and public sessions, a single operator managing multiple PTZ cameras is the answer. Exceptional results can be achieved by a skilled producer at the controller, using experience and intuition to capture the discussion faithfully and in a compelling manner for viewers and remote participants. 

The importance of meeting confidentiality (in financial, health or social care discussions, for example), the practicalities of installing and operating complex equipment and the necessarily high-cost mean, however, that this is the exception, rather than the rule. Moreover, with the dramatic increase in virtual meetings over the last few years, having an operator available for each meeting and every conference space is clearly impractical.
 

The AV Solution: Voice Tracking Technology
▶ What is Voice Tracking?

A new breed of meeting room microphone (think Sennheiser TCC2, Yamaha RM-CG, Shure MXA920, Nureva HDL410 and the like) has arrived. These products feature DOA (direction of arrival) technology that detect the location of a sound source. Why is this important?

1.    Voice tracking microphones help eliminate common meeting frustrations such as muffled voices, distant sound pickup, and overlapping speech. They ensure that every participant, regardless of their location, can be heard clearly and effortlessly.  

2.    This can help video cameras automatically focus on the person speaking.

This is a game changer for video conferencing meetings. Voice tracking is now changing the way cameras integrate and interact with live discussions.
 

 

▶ Integrating Speaker Tracking with Cameras

Leveraging the power of voice tracking, the microphone array’s location data is shared in real-time with an external processor. This in turn links with multiple PTZ cameras. The camera/microphone combination enables the cameras to focus immediately on the active voices in a meeting space. This is achieved by directing the camera to co-ordinate with this sound tracking data. With the camera angle based on the data from the microphone, a meeting can be produced automatically, with the video output used by Teams, Zoom or most other platforms for the conference session.

▶ What is Speaker Tracking?

With products like CamConnect Pro, Lumens marries voice-tracking microphones with PTZ camera systems to deliver intelligent speaker-tracking. 

Here's how it works:

•    A speaker starts talking → Camera 1 zooms in on them.
•    The video feed switches to Camera 1.
•    Another person starts speaking → Camera 2 zooms in on them.
•    The system automatically switches to Camera 2.

The AV system is now creating a TV-like production, with no user input required. And it can manage large meetings: each camera is simply assigned to multiple delegates to give coverage of every attendee in the room. 

▶ The Evolution of Speaker Tracking

Connecting microphones with cameras isn't new. Developers have been able to programme AV controllers to respond to live voice tracking data for some years. Because of the complexity and uniqueness of each installation, the process can be expensive. What manufacturers such as Lumens has achieved is the game-changer: CamConnect can be installed on the network, configured and be ready for use with multiple microphone arrays and up-to 4 PTZ cameras in a matter minutes. No programming is required.

▶ From Meeting Equity to Mass Adoption

With the arrival of the VC-TR60A camera, Lumens has implemented speaker-tracking into the PTZ instead. Rather than relying on an installed ceiling or wall-mounted microphone to detect the position of a voice, this camera includes an array of sound detectors in its base. Using its AI-enabled image analysis tool, the VC-TR60A can identify whether the sound located by its sensors come from an individual in the room and not a door closing or a car starting outside. The VC-TR60A will then automatically frame the active voice and follow the discussion.
 

 

▶ How effective is Speaker Tracking?

Speaker tracking can be incredibly accurate, picking out an individual sitting shoulder-to-shoulder with colleagues in an ideal environment. However, there are factors that can reduce its precision. 

-    Room Size: The accuracy of the location data is fundamentally dependent on the precision of the microphone. The further the distance from the microphone, the less exact the data. The great news is that speaker tracking systems such as CamConnect can support multiple microphones that can be installed across a ceiling space or along the walls of a large venue. By correctly mapping a meeting area and setting camera preset positions, results can be truly exceptional.

-    Acoustics: Care also needs to be taken to minimise echoes and reflections: a well sound-insulated room will perform better than a cavernous wooden-floored hall. A DSP can minimize many of these issues.

▶ The Human Element

Humans fidget. They move their chairs. They rarely sit still. Where speaker-tracking is dependent on camera preset positions, this caused difficulties with older systems which were plagued by mis-framing an individual. With new AI-enabled systems such as CamConnect Pro, presets can be automatically re-framed to ensure the perfect shot.  

▶ Discussion Tracking: Avoiding the Ping Pong Effect

No one wants to watch a video call where the camera constantly jumps back and forth between speakers like a tennis match. To avoid this, some systems are able to engage a multi-voice framing mode which zooms to a wider shot that captures all the active voices.

▶ The Future of Speaker Tracking

Multi-voice framing is just the beginning. As automation improves, speaker tracking could evolve into a fully automatic AV production system, rivalling a professional TV broadcast. Only time will tell how far this technology will go—but for now, it's already transforming the way we experience virtual meetings.
 





 
Back