The ability to radically change encoder parameters based on the scene dynamics results in higher average compression ratios.
December 21, 2007
From: Video/Imaging DesignLine
By: Mark Oliver and John Monti
One of the fundamental challenges facing anyone attempting to work with
video surveillance has always been the size of the video itself.
Capturing video produces large amounts of data posing problems both in
transportation of the data from one place to another and in subsequent
storage. Analog video engineers have for years struggled to hone
techniques designed to reduce the size of the captured data without
introducing any appreciable loss in quality. Analog systems, however,
generally lack the intelligence to effectively tailor these techniques
in real time and as such only scratch the surface of what is possible.
Fortunately, with advances in digital video technology, a new set of
real time tools can be brought to bear on the video stream. Software
configurable processors targeted at video applications are allowing
closer coupling between the intelligence of the video system and the
compression stages. The result will be a quantum leap in the quality
and features of digital video surveillance systems.
When most of us think of digital video compression techniques, we think
of the CODECs standardized by the Motion Pictures Experts Group (MPEG).
These include MPEG2 of DVD and digital TV fame, as well as MPEG4 part10
(H.264), generally considered to be the natural successor to the aging
MPEG2 standard. These CODECs, rather than being rigid in their
application, are best thought of as a tool bag of possible techniques
that can be applied to a video stream to perform the desired
compression. Which tools are used and how they are applied can
significantly affect the quality and size of the compressed video
stream.
In video surveillance applications, the type of video "footage"
captured is generally very different from that captured for television
or for movies. As a result, the tool selection made by a surveillance
encoder can be very different from a broadcast encoder. A surveillance
camera might, for example, be monitoring a hallway in an office
building. In this case, the hallway might be deserted from six in the
evening until eight the following morning, and be similarly quiet
during the weekend. The encoder, therefore, can use different criteria
to select appropriate tools for the compression. Tools and techniques
that would be infeasible for other video applications might yield
perfectly acceptable results for the quiet scene observed 80 percent of
the time. Enter the Intelligent Encoder.
The Intelligent Encoder
The Intelligent Encoder consists of tightly coupled analytics and
compression engines. The analytics engine is used to examine the scene
and determine if any pre-selected criteria are met. Criteria might
include the presence of motion, the absence of motion, sudden changes
in light level or rapid scene changes. The results of the analysis are
used to configure the encoder engine for optimum quality and
compression levels based on the dynamics of the scene. The ability to
radically change encoder parameters based on the scene dynamics results
in higher average compression ratios. This lowers bit rates and makes
more efficient use of storage or transmission bandwidth. Figure 1 shows
a block diagram of a typical intelligent encoder.

Figure 1: Block Diagram of an Intelligent Encoder
The maximum efficiency of an Intelligent Encoder is obtained by
bounding encode parameters in terms of quality, bit rate, resolution,
or frame rate, and defining a time period over which the defined bounds
are to be applied. In this way, the encoder itself is able to optimize
the consumption of the "bit budget" based on the scene dynamics and the
level of interest that the observer is likely to have in the encoded
stream.
When setting bounds on the encode parameters of an Intelligent Encoder,
advantage can be taken of the volatility in bit rates. Conventional
encoders would create bit streams of reasonably constant rate and would
require provisions to be made to ensure that the network did not become
saturated. This would normally involve setting low quality levels and
dedicating network bandwidth to video traffic. The constrained bit
rates of this approach work well for low motion scenes, but when high
motion is encountered (typically the scenes of interest for a
surveillance application) quality suffers as the encoder struggles to
represent the rapidly changing scene while staying within its bit
budget.
With an Intelligent Encoder, the normal operating bit rate naturally
tends to fall to lower levels. When the analytics engine detects a
triggering event, the encoder uses the bits it needs (up to its
prescribed bound) to accurately represent the motion with the highest
possible quality. After a short period of typically just a few frames,
the bit rate can be returned to the normal operating level.
The result is an encoded stream that frugally uses bandwidth in
quiescent operation, and is still able to capture trigger events with
maximum quality. This ability brings up the concept of constant
quality. In a constant quality encoder, it is the desired quality that
is specified rather than the bit rate. The intelligence within the
encoder can be used to adjust encode parameters to deliver the required
quality level.
Constant Quality Bit Rates
In surveillance applications, quality is maintained regardless of the
scene dynamics, resulting in good motion performance with low quiescent
period bit rates.

Figure 2: Bit Rates Produced by Different Encoding Schemes
Figure 2 shows the bit rate over time of a video clip encoded with
three different encoding techniques. The red line represents a constant
bit rate encoder typically used when a known amount of bandwidth has
been allocated to the video. The encoder strives to fill the available
bandwidth by changing the encoded quality.
The blue line represents a typical variable bit rate encoder where a
target bit rate is defined, but the actual encoded bit rate is allowed
to vary on either side of the target as a result of the encoding
process.
The orange line represents the output of an intelligent encoder set for
constant quality. Here, the encode parameters are changed in real time
to account for scene dynamics. Lower quiescent period bit rates are
achieved during periods of low activity and high quality is maintained
when the scene rapidly changes. The result is a dramatic reduction in
the average bit rate generated by the encoder.
Figures 3 and 4 are single frames taken from a video sequence encoded
with constant bit rate and then constant quality. In the video clip, a
large white object moves in front of the camera temporarily blinding
it. Both images are of the same frame of video taken just as the object
moves away and the background is revealed. This simulates a vehicle or
person moving in front of the camera. In both frames, the instantaneous
bit rate can be seen in the bottom left of the image after the "BR"
label.
The constant bit rate encoder is set to 1.5Mb/S and will vary little
around that figure. As a result, as large amounts of the image change,
the quality of the encoded stream must be reduced. The result is a loss
of video quality that occurs during the event of interest. The
intelligent encoder, on the other hand, is given the flexibility to
temporarily increase the bit rate to a level just high enough to encode
the scene with a constant level of quality. Here we see an
instantaneous bit rate of about twice that of the constant bit rate
stream. This instantaneous rate is sufficient to maintain excellent
video quality. The duration of the increased rate is limited to just
the period of highest motion. The result is an overall reduction in the
number of bits required to encode the scene.

Figure 3: Frame of High Motion Video Encoded With Constant Bit Rate

Figure 4: Frame of High Motion Video Encoded with Constant Quality
Use models
When considering the specific application of video surveillance,
further advantage can be taken of the use model of the captured video.
In our original hallway example, normal office traffic might be
captured and compressed to 1.5Mb/S at 30fps. During evening or weekend
periods, the absence of motion within the scene might be used to
trigger the encoder to reduce the resolution to CIF (one-quarter of the
D1 resolution) and to drop the frame rate to two or three frames per
second. This would be sufficient resolution and frame rate to assure a
night watchman that all was well in the building. The corresponding
data rate, however, would drop from 1.5Mb/S to 25Kb/S increasing the
effective storage capacity of the system by a factor of 60. Further
configuring the CODEC to increase quantization parameters or to select
different tools from its MPEG tool bag might further reduce the bit
rate by 30 percent to 50 percent.
Trigger criteria might also be defined to be a function of time as well
as scene dynamics. In our example, motion occurring during weekend
hours might be considered more suspicious and result in the encoder
selecting the full resolution of the image sensor and perhaps capturing
high definition video streams at 30fps to preserve the best possible
video evidence.
In the preceding examples we see how providing an IP camera with
sufficient processing power allows analytic algorithms and the video
CODEC to be tightly coupled. The resulting intelligent encoder is able
to set its own bit rate within a widely defined window producing a
highly efficient camera implementation. In surveillance applications,
long periods of very low bit rates are produced. These more than
compensate for very short periods of higher bit rates when required by
scene dynamics. The reduced bandwidth consumption of the resulting
stream and lower storage requirements yield economies in installation
costs and operating expenses.
In future generations of the Intelligent Encoder, enhanced analytic
functions might be used to identify regions of added interest such as
faces or license plates. The encoder might be configured to increase
the level of constant quality in just these regions as they are tracked
through the scene. The result would be the optimum tradeoff of quality
and bit rate set not by the dynamics of the scene but by the specifics
of its content.
Part Two: The importance of image quality when applying video analytics
Video analytics and intelligent encoders promise to propel IP video
cameras to new levels of functionality, optimization of bit rate,
efficiency and ultimate effectiveness for a wide range of surveillance
applications. Advanced software-configurable processors are crucial for
bridging the analytics systems with the signal processing and
compression functions of the video surveillance process.
No matter how "smart" the encoders become however, they are always
dependent upon a high quality video stream from the camera sensor. The
quality of the video feed is critical to the accuracy of the analysis
by the analytics algorithms and ultimately the efficiency of the
encoding process.
In video surveillance applications, especially when intelligent
analytics will be applied, "high quality" means more than simply
in-focus footage. It means the ability to capture detailed, actionable
images no matter what kind of lighting or environmental conditions are
present, so the analytics algorithms can operate optimally.
This enables surveillance cameras to fulfill their ultimate purpose,
whether it's to alert security personnel so they can avert potential
problems, monitor faces of shoppers or vehicle license plates for
marketing or transportation planning purposes, or document details of
the people, objects and events of a crime scene with sufficient
accuracy and clarity to enable the apprehension and/or prosecution of
those responsible.
In more technical terms, video surveillance cameras must be equipped with image processing chipsets with: