By Lumens
June 06, 2025 626
“We are visual creatures. Visual things stay put, whereas sounds fade.”Said Harvard psychologist Steven Pinker.
While this may be controversial (especially with musicians!), researchers have suggested that in a face-to-face conversation at least 50% of communication is non-verbal. That means in meetings, the video part of audio-visual (AV) technology is essential.
Modern video conferencing cameras produce astonishingly clear and bright pictures. Often fitted with clever auto-framing technology, they can automatically zoom in to capture the people in a room, and not the space around them. The result is good, with the focus squarely on the attendees. But it doesn't fully solve a key problem.
For remote participants, it can still be difficult to follow conversations. It's hard to tell who is speaking or responding to whom. This creates an issue called meeting equity—where in-person attendees have a better experience than virtual participants.
All of these techniques are available to meetings professionals. In high profile public meetings such as a G20 meeting, the AV equipment recording and broadcasting the summit will be on a par with the technology used by a TV station.
For certain executive meetings and public sessions, a single operator managing multiple PTZ cameras is the answer. Exceptional results can be achieved by a skilled producer at the controller, using experience and intuition to capture the discussion faithfully and in a compelling manner for viewers and remote participants.
The importance of meeting confidentiality (in financial, health or social care discussions, for example), the practicalities of installing and operating complex equipment and the necessarily high-cost mean, however, that this is the exception, rather than the rule. Moreover, with the dramatic increase in virtual meetings over the last few years, having an operator available for each meeting and every conference space is clearly impractical.
A new breed of meeting room microphone (think Sennheiser TCC2, Yamaha RM-CG, Shure MXA920, Nureva HDL410 and the like) has arrived. These products feature DOA (direction of arrival) technology that detect the location of a sound source. Why is this important?
1. Voice tracking microphones help eliminate common meeting frustrations such as muffled voices, distant sound pickup, and overlapping speech. They ensure that every participant, regardless of their location, can be heard clearly and effortlessly.
2. This can help video cameras automatically focus on the person speaking.
This is a game changer for video conferencing meetings. Voice tracking is now changing the way cameras integrate and interact with live discussions.
Leveraging the power of voice tracking, the microphone array’s location data is shared in real-time with an external processor. This in turn links with multiple PTZ cameras. The camera/microphone combination enables the cameras to focus immediately on the active voices in a meeting space. This is achieved by directing the camera to co-ordinate with this sound tracking data. With the camera angle based on the data from the microphone, a meeting can be produced automatically, with the video output used by Teams, Zoom or most other platforms for the conference session.
Here's how it works:
• A speaker starts talking → Camera 1 zooms in on them.
• The video feed switches to Camera 1.
• Another person starts speaking → Camera 2 zooms in on them.
• The system automatically switches to Camera 2.
The AV system is now creating a TV-like production, with no user input required. And it can manage large meetings: each camera is simply assigned to multiple delegates to give coverage of every attendee in the room.
Connecting microphones with cameras isn't new. Developers have been able to programme AV controllers to respond to live voice tracking data for some years. Because of the complexity and uniqueness of each installation, the process can be expensive. What manufacturers such as Lumens has achieved is the game-changer: CamConnect can be installed on the network, configured and be ready for use with multiple microphone arrays and up-to 4 PTZ cameras in a matter minutes. No programming is required.
With the arrival of the VC-TR60A camera, Lumens has implemented speaker-tracking into the PTZ instead. Rather than relying on an installed ceiling or wall-mounted microphone to detect the position of a voice, this camera includes an array of sound detectors in its base. Using its AI-enabled image analysis tool, the VC-TR60A can identify whether the sound located by its sensors come from an individual in the room and not a door closing or a car starting outside. The VC-TR60A will then automatically frame the active voice and follow the discussion.
Speaker tracking can be incredibly accurate, picking out an individual sitting shoulder-to-shoulder with colleagues in an ideal environment. However, there are factors that can reduce its precision.
- Room Size: The accuracy of the location data is fundamentally dependent on the precision of the microphone. The further the distance from the microphone, the less exact the data. The great news is that speaker tracking systems such as CamConnect can support multiple microphones that can be installed across a ceiling space or along the walls of a large venue. By correctly mapping a meeting area and setting camera preset positions, results can be truly exceptional.
- Acoustics: Care also needs to be taken to minimise echoes and reflections: a well sound-insulated room will perform better than a cavernous wooden-floored hall. A DSP can minimize many of these issues.
Humans fidget. They move their chairs. They rarely sit still. Where speaker-tracking is dependent on camera preset positions, this caused difficulties with older systems which were plagued by mis-framing an individual. With new AI-enabled systems such as CamConnect Pro, presets can be automatically re-framed to ensure the perfect shot.
No one wants to watch a video call where the camera constantly jumps back and forth between speakers like a tennis match. To avoid this, some systems are able to engage a multi-voice framing mode which zooms to a wider shot that captures all the active voices.
Multi-voice framing is just the beginning. As automation improves, speaker tracking could evolve into a fully automatic AV production system, rivalling a professional TV broadcast. Only time will tell how far this technology will go—but for now, it's already transforming the way we experience virtual meetings.