Voice Compression Techniques and Bandwidth Utilization


Buy Shoes
 Voice Compression Techniques and Bandwidth Utilization  

 

 

 

 

The desire of network users to have access to video of broadcast quality for their multimedia applications is tempered by the practical reality that internetworking bandwidth and local data storage are of limited throughput and size. The amount of data that has to be transmitted or stored in order to create even a minute of uncompressed video is staggering.

A single 640 ¥ 480 pel (a pel is a minimum-size picture element) video image with 8-bit color, without compression, requires 300K byte of storage. One minute of video at 30 frames per second requires 550M byte of storage. If color resolution is increased from 8 bits per pel to 8 bits per color plane (i.e., RGB color), storage requirements increase to 1.7G byte per minute.

DIGITAL VIDEO AND IMAGE COMPRESSION TECHNIQUES

The economical distribution and storage of both digital photographs and digital video mandates the use of specialized compression techniques. To achieve practical transmission rates and storage requirements, compression ratios in excess of 50:1 are required, and these ratios are not attainable using known lossless compression techniques.

Although existing lossless image compression algorithms allow 100% accurate reconstruction of the original image, they offer relatively low compression performance, typically in the order of 3:1. Lossy compression algorithms offer much higher compression ratios, but at the expense of some reconstruction accuracy. Therefore, lossless video compression algorithms are typically reserved for medical imaging applications and for transmission of images from satellites and planetary probes.

Through extensive research, the users of video compression algorithms have settled on Joint Photographic Experts Group (JPEG) and Moving Picture Experts Group (MPEG) compression standards for multimedia video applications. Other proprietary compression systems exist, as a means to reduce the CPU requirements of video decompression or to achieve higher compression ratios. Real-time MPEG and JPEG require specialized hardware; however, because of the projected volumes of use of these systems, the hardware is expected to become inexpensive.

This chapter explains the operation of JPEG, MPEG, and DigiCipher systems and provides insight into how and why the algorithms were selected.

JPEG, MPEG, and DigiCipher

JPEG was initially developed to compress still images and is often used in video teleconferencing systems where packet losses due to network congestion are high.

MPEG I and MPEG II were developed as extensions to JPEG and form the basis for most next-generation video services and storage. DigiCipher is a proprietary subset of MPEG II and will be used in most cable television set-top decoders.

Even in supposedly lossless compression systems, image reconstruction is not perfect. Each pel is a digital data point arrived at by sampling the signal for an analog picture, and the quality of a reconstructed image is a function of the resolution and rate at which sampling occurs. Resolution is measured in a number of dimensions — for example, the number of pels in both the horizontal and vertical directions, and the number of bits used to represent each color.

Another factor is how much motion occurred during the sampling interval. When referring to sampled data systems, the term lossless means that after the sampling process there are no additional losses due to the coding, storage, and retrieval processes.

TECHNICAL ISSUES IN MAKING IMAGES ACCEPTABLE TO THE HUMAN EYE

Substantial research has been conducted on the limitations of human vision. The resulting rules may be followed to reduce the information content in images without causing, for the viewer, an unacceptable degrading of the image’s quality. The following paragraphs discuss some important properties of human vision that are taken advantage of by compression systems.

Visual Sensitivity

Visual sensitivity is known to be inversely proportional to average light intensity. This means that the human eye is very sensitive to changes in low-intensity areas of a picture and not very sensitive to comparable changes in high-intensity areas.

Visual sensitivity to changes in intensity and color is higher along both the horizontal axis and vertical axis than along the diagonals. Sensitivity to high-frequency (rapid) changes in either intensity or color is much lower than to low-frequency (less rapid) changes. Sensitivity to changes in overall (i.e., monochrome) light intensity is higher than to comparable changes in color.

Luminance and Chrominance

For image compression purposes, substantially less information about color needs to be retained than for monochrome intensity. To take advantage of this property, image data must be converted into separate metrics for brightness and color.

The primary colors are red, green, and blue. In many computer graphics applications, video cards use individual red, green, and blue intensity values for each pel on the screen. In other cases, a card may use a single value that is indexed to a color lookup table. The color lookup table holds the individual red, green, and blue (RGB) intensity values.

For compression purposes, RGB data is converted to separate values for luminance and chrominance properties of an image. Luminance is a measurement of monochrome intensity. Chrominance consists of two measurements of pel color that are based on a reference white pel at the specified luminance level. Only one luminance and two chrominance values are required because the third color can always be directly derived from them.

Two-Dimensional Spatial Frequency

To take advantage of visual limitations as they relate to spatial changes, it is convenient to convert pictures from their two-dimensional “time domain” representation, which is what a person normally sees, into a two-dimensional spatial frequency domain representation. One dimension represents horizontal spatial frequencies, and the other dimension represents vertical spatial frequencies. As an example of this concept, when an image is scanned, slow variations in intensity or color have low spatial frequency components, and rapid variations or abrupt transitions have high spatial frequency components.

Conversion to the frequency domain is particularly useful because the eye’s sensitivity to high spatial frequency components is much lower than for low spatial frequency components. Therefore high-frequency components below the threshold of normal human perception can be discarded. In addition, measurements of spatial frequency can be quantized or reduced in accuracy as a function of frequency. Low-frequency components are generally retained at a higher resolution than higher-frequency components.

 

 

 

 

 
 
 
Copyright Manjor Inc.