By Lumens
April 02, 2025 795
It's over three years since we wrote our popular Beginner's Guide to PTZ Cameras. At the time, we reflected on how these cameras were a breakthrough in remote production, in discreet positioning, and on reducing operational costs, with little or no compromise in image quality. It was these features that made them a mainstay in documentary television programming, lecture capture, meeting spaces, music venues and houses of worship.
Since this time however, PTZ cameras have experienced a second revolution, but not necessarily in ways that we would have predicted.
Back in 2021, PTZ cameras were being developed increasingly to meet the needs of live events and broadcasters. Sensors were becoming bigger giving beautiful cinematic picture quality. There was a push towards higher bitrates (full NDI and SMPTE 2110), broadcast-friendly 12G-SDI and XLR audio inputs and even the option of interchangeable lenses. The industry was clearly moving towards PTZ cameras as a replacement for TV studio cameras.
But in the intervening years, things have changed. Why has the industry not pursued cinematic quality as its ultimate goal? With a few exceptions, the PTZ industry has not been obsessed with achieving the ultimate picture quality above all else. There are three principal reasons for this.
1. The Rise of the Smaller Sensor
A full frame or new generation 1-inch sensor will certainly outperform a smaller type, but the real benefits are mainly seen in extreme low light conditions, scenes with extreme contrast or with shots that require a very shallow depth of field. The pay-off for selecting a large sensor is the greatly increased cost of electronics, image filters, sophisticated focusing engines and lenses, particularly where the organization wants a 20x or 30x range that retains critical sharpness at both the wide and long end of the zoom. There are users, particularly in major studios or internationally important concert halls, theatres, opera houses and convention centres that demand the nuanced picture quality that this technology can deliver, but for the growing majority, other factors take priority.
The industry has overwhelmingly taken a very different, but arguably far more exciting route. Standard (1/3" & 1/2") sensors have dramatically improved in image quality, efficiency and cost-effectiveness in recent years. The performance we saw from an early generation micro four thirds sensor or 1-inch is now matched by today's smaller component in terms of low light capabilities, dynamic range and overall image quality. The conclusion is that new smaller sensors are easily 'good enough’ for many (or even most) applications.
2. The Democratizing Effect of the PTZ
The promise of video everywhere – in meeting spaces, training suites, classrooms, houses of worship, YouTubers' bedrooms – has changed the PTZ industry. The user experience has become as important as good image quality: many customers neither have the skill nor the inclination to tweak luminance, gamma or pedestal levels to achieve a broadcast-ready shot. They simply want a great image in full auto mode. In short, most customers want the same user straightforward experience their iPhone offers them.
3. There are Bigger Fish to Fry!
The PTZ industry has far outstripped the studio camera and camcorder market in terms of innovation (and growth) in the past 5 years. So, if not in the use of larger sensors, where do we find this rapid development? In one word, automation.
In a sector that was invented to enable single person remote multi-camera production, it is hardly surprising that automation has continued to be its biggest single driving force. The PTZ camera was devised for fast and discreet installation in any location. It became popular for the ability of a single operator to control anywhere between one and twenty cameras.
So, if those cameras can be installed and operated without the need for any human intervention at all, it can be no surprise that automating every possible process has been at the forefront of manufacturers' minds.
The arrival of artificial intelligence has turned what is theoretically possible into reality, almost overnight. Let’s look at auto-tracking technology as an example.
Motion tracking cameras are not new. Early (non-intelligent) models were popular, especially in lecture capture and live presentations. They used algorithms that could identify typical human shapes, movement and skin tones and direct the camera to move to keep an individual center-stage. This worked effectively in well-lit spaces with a clean backdrop and with minimal individuals in view. Positioned in less-than-ideal environments, however, and their reliability faltered, occasionally losing the tracking subject, or randomly selecting the wrong individual.
With AI, new generation tracking cameras have an uncanny ability to recognize an individual (even in a crowd of people), fix onto that person, and track them reliably even if they turn their back occasionally or walk momentarily behind an object. New AI algorithms have transformed motion tracking to such an extent that they have become true click and forget units.
Auto-tracking cameras have become so popular that they are now widely used in multi-presenter environments. A problem that many models face is when switching from one subject to the next – the camera would have to zoom out before finding and locking onto the next target. The solution has been to implement a two-camera design, with new motion tracking cameras featuring a panoramic/analytical camera as well as a principal PTZ head. With this approach, the camera can immediately move quickly from presenter to presenter without any unnecessary and distracting hunting.
By replacing an HD with a 4K tracking sensor, the processor naturally generates four times more image data. This is a breakthrough for scene analysis – the camera can now examine four times more detail which has huge benefits. Instead of tracking individuals at a maximum 8 metres from an HD camera, a 4K unit can more than double the tracking distance to 18 metres, or beyond.
Now motion tracking cameras can be installed in many more locations – at the back of conference and lecture rooms, cathedrals and theaters for example.
So, the past five years has seen the maturing of motion tracking technology, but a potentially even more compelling advance has been in voice tracking. Why so? Because in the overwhelming majority of environments, humans are simply not in motion! Speaker tracking simply allows the camera to follow a conversation rather than the movement of an individual.
There are two ways in which voice tracking has been implemented. Firstly, cameras have been linked to direction-of-arrival (DOA) microphone arrays (think Sennheiser TCCM or TCC2, Yamaha’s RMCG and many models from Nureva, Shure and Audio Technica. These encapsulate multiple microphones to analyse the direction from which a sound is coming. Using a processing unit, such as the Lumens CamConnect AI-Box1, a camera (or several cameras) can focus on the active voice in a room and switch angle to capture a lively discussion.
Instantly (and automatically), PTZ cameras have been given the ability to produce a multi-camera event with no human intervention required. Gone are manual controllers, and gone is the inevitable wide shot that makes it difficult to work out who is speaking.
Multi-speaker tracking has delivered TV style production values to meeting spaces at a fraction of the cost of a broadcast studio. Supporting multiple microphone arrays and four cameras, this kind of installation is well suited to boardrooms and lecture halls where switching between camera angles can transform the remote viewing experience.
The second approach to voice tracking is a very recent innovation. This approach embeds this sound detection technology into the camera itself. In smaller meeting spaces, podcast studios and Vlogging suites, units such as the Lumens VC-TR60A can direct its camera head automatically to alternate between two speakers, or go into multi-voice framing mode to film a discussion with a precisely framed shot that captures all the active participants.
AI-enabled, the camera is able to detect the location of a sound and then distinguish between noise made by a human and, for example a slamming door, a squawking bird or yapping dog! Linking the camera to a reference audio line, the unit can further eliminate in-room speakers from its sound detection, making voice-tracking unerringly accurate.
The inclusion of the secondary panoramic camera again pays dividends, enabling the system to switch to a wide shot whenever the PTZ head needs to move. This eliminates all visible camera movements which can be unsettling for remote viewers.
As we have seen, the panoramic camera is multi-functional, acting as an AI-analysis tool for human detection and tracking, and as a wide shot for intelligent shot switching. The secondary camera has a third role in the latest models, allowing picture-in-picture (PIP) output directly from the HDMI, USB and (where available) SDI outputs.
Generating a PIP straight from the camera is a breakthrough for many applications where a two-shot view is required, and where users want to simplify their workflow. This development has been a result of demands from customers in interrogation suites and training rooms, where a synchronized and simultaneous close-up and wide view are vital.
The rise of IP production has been inevitable for many years and the technologies available 3-4 years ago are either still current or have evolved. This is particularly the case with the ever-popular NDI format.
Although high bandwidth video streaming (see the VC-A71P-HN) still has its place in broadcast TV, high end digital signage and the capture of tier one live events, there has been a surge of interest in low latency formats that balance compression with image quality. With the arrival of NDI HX3, video streaming and collaboration entered a new era. Supporting up-to 4K transmission over a 1GbE network, HX3 is the perfect balance of video quality, latency and bandwidth.
With wide multi-vendor support, HX3 is very well suited to live production, with a creative ecosystem that no other IP format can rival.
The NDI HX3 format is now embedded in the latest PTZ cameras, and available via mini encoders in older models that cannot be upgraded to the latest version.
The new Dante AV-H is now being implemented in PTZ cameras. It shares many of the characteristics of NDI HX3, being a low latency H.26X codec, designed to run on existing local area networks. Where NDI is celebrated for its creative production workflows, Dante AV-H is unrivalled when it comes to its compatibility with IP audio (Dante audio) and its control ecosystem (Dante Manager and Dante Controller).
With PTZ cameras supporting Dante AV-H, administrators can route, manage and secure video and audio signals using familiar Dante applications. They can also integrate PTZ cameras with third party products such as microphones, speakers and DSPs, making the format hugely appealing to AV managers working in meeting spaces, training rooms and event spaces where Dante audio is already installed.
Few could have predicted the shift in emphasis from an obsession with image quality and sensor size, to a focus on productivity gains above all else. There are certainly more gains to be enjoyed in terms of automation as the adoption of AI accelerates and the ability of artificial intelligence to continues to outstrip expectations. The future is certainly not mapped out, but it is sure to be exciting.
PS. This article was written by a human!