Video is a very broad and topic and the amount of information available covers a much larger scope than this article can. This article is intended to give Serato Video users a better understanding of what's involved in video to ensure you have a good grounding of information when working with Serato Video.
There are a few terms which you will hear again and again in the video world so we'll start by defining those
Containers
A container is a computer file format that can contain various types of data which is commonly audio and video. The audio and video inside the container is compressed using a codec (see Codec). Some of the simpler container formats contain different types of audio codecs, while more advanced container formats can support multiple audio and video streams, subtitles, chapter-information, and meta-data (tags) - along with the synchronization information needed to play back the various streams together.
Codec
CODEC is short for 'Compressor-Decompressor', 'Coder-Decoder', or 'Compression/Decompression algorithm'. A codec is technology used for compressing and decompressing data. Codecs can be implemented in software, hardware, or a combination of both. Some popular codecs for computer video include MPEG, H.264, Indeo, and Cinepak.
To play back an encoded file you will need to have the appropriate CoDec installed. This is a usually the reason why you will be unable to play back a certain file (CoDec isn't installed).
Compression quality
Most codecs are lossy. This means that the when the audio/video is compressed some information is “thrown away” resulting in a smaller file size - but one which hopefully looks and sounds very close to the original. This is useful when you have large files which don't fit on traditional consumer storage media, or need to be optimized for different media/transmission methods for example DVD, Video CD, Web Video etc.
There are lossless codecs but for most purposes the slight increase in quality is not worth the increase in data size, which is often considerable. The main exception to this is if the data is to undergo further processing (for example editing) in which case the repeated application of lossy codecs (repeated encoding and subsequent decoding) will degrade the quality of the edited file. It is recommended to use a single codec during the encoding scheme when creating a finished product otherwise the quality can be degraded quite significantly. Unfortunately there are many situations where this is unavoidable. With the ever decreasing cost of storage and bandwidth lossless codecs may become the norm.
CoDec Design:
Many codecs are designed to emphasize certain aspects of the media to be encoded. For example, a digital video (using a DV codec) of a sports event, such as baseball or soccer, needs to encode motion well but not necessarily exact colours, while a video of an art exhibit needs to perform well encoding colour and surface texture. There are hundreds or even thousands of codecs ranging from those downloadable for free to ones costing hundreds of dollars or more. This can create compatibility and obsolescence issues. By contrast, lossless PCM audio (44.1 Khz, 16 bit stereo, as represented on an audio CD or in a .wav or .aiff file) offers more of a persistent standard across multiple platforms and over time.
Many multimedia data streams need to contain both audio and video data, and often some form of metadata that permits synchronisation of the audio and video. Each of these three streams may be handled by different programs, processes, or hardware; but for the multimedia data stream to be useful in stored or transmitted form, they must be encapsulated together in a container format.
While many people explain that AVI is a codec, they are incorrect. AVI (nowadays) is a container format, which many codecs might use (although not toISO standard). There are other well known alternative containers such as Ogg, ASF, QuickTime, RealMedia, Matroska, DivX, and MP4.
Audio codecs
Here is a list of some codecs (source Wikipedia).
Non-compression formats:
- Audio Interchange File Format (AIFF, container format)
- Resource Interchange File Format (RIFF, container format)
- WAV – Microsoft “WAVE” format (format supports compression, but it is rarely used)
- Linear Pulse Code Modulation (LPCM, generally only described as PCM, note that AIFF, WAV and MLP are all derivative forms of LPCM)
- Pulse-amplitude modulation (PAM)
Lossless data compression
- Apple Lossless Audio Codec (ALAC)
- Direct Stream Transfer (DST)
- Dolby TrueHD – Optional lossless surround sound format used by HD DVD and Blu-ray, it uses MLP but adds higher sample rates, bit rates, and more channels
- DTS-HD Master Audio – Optional lossless surround sound format used by HD DVD and Blu-ray, it was previously known as DTS++ and DTS-HD
- Free Lossless Audio Codec (FLAC)
- Lossless Audio (LA)
- Lossless Predictive Audio Compression (LPAC)
- Lossless Transform Audio Compression (LTAC)
- MPEG-4 Audio Lossless Coding (MPEG-4 ALS)
- Meridian Lossless Packing (MLP), also known as Packed PCM (PPCM), it is the standard lossless compression method for DVD-Audio content
- Monkey's Audio (APE)
- OptimFROG (OFR)
- RealAudio Lossless
- RK Audio (RKAU)
- Shorten (SHN)
- True Audio (TTA)
- WavPack (WV)
- Windows Media Audio 9 Lossless
Lossy data compression
General
(medium to high bit rate)
- Adaptive Differential (or Delta) pulse-code modulation (ADPCM, see Pulse-code modulation)
- ADX
- Adaptive Rate-Distortion Optimised sound codeR (ARDOR)
- Adaptive Transform Acoustic Coding (ATRAC, used in MiniDisc devices)
- Dolby Digital (A/52, AC3)
- DTS Coherent Acoustics (DTS, Digital Theatre System Coherent Acoustics)
- Impala FORscene audio codec
- MPEG audio
- layer-1 (MP1)
- layer-2 (MP2) (MPEG-1, MPEG-2 and non-ISO MPEG-2.5)
- layer-3 (MP3) (MPEG-1, MPEG-2 and non-ISO MPEG-2.5)
- Advanced Audio Coding (AAC, MPEG-2 and MPEG-4)
+ HE-AAC
- Harmonic and Individual Lines and Noise (HILN, MPEG-4 Parametric Audio Coding)
- Musepack
- Perceptual Audio Coding
- QDesign
- TwinVQ
- Vorbis
- Windows Media Audio (WMA)
Voice
(low bit rate, optimized for speech)
- Advanced Multi-Band Excitation (AMBE)
- Algebraic Code Excited Linear Prediction (ACELP)
- Code Excited Linear Prediction (CELP)
- Continuously variable slope delta modulation (CVSD)
- Digital Speech Standard (DSS)
- Enhanced Variable Rate Codec (EVRC)
- FS-1015 (LPC-10)
- FS-1016 (CELP)
- ITU standards:
- G.711 (a-law and μ-law)
- G.721 (superseded by G.726)
- G.722
- G.722.1
- G.722.2 (AMR-WB)
- G.723 (24 and 40 kbit/s DPCM, extension to G.721, superseded by G.726)
- G.723.1 (MPC-MLQ or ACELP)
- G.726 (ADPCM)
- G.728 (LD-CELP)
- G.729 (CS-ACELP)
- G.729a
- G.729.1
- GSM codecs:
- Full Rate
- Half Rate
- Enhanced Full Rate
- Adaptive Multi-Rate (AMR)
- AMR-WB
- AMR-WB+
- Harmonic Vector Excitation Coding (HVXC)
- Internet Low Bit Rate Codec (iLBC)
- Improved Multi-Band Excitation (IMBE)
- internet Speech Audio Codec (iSAC)
- Mixed Excitation Linear Prediction (MELP)
- QCELP
- Relaxed Code Excited Linear Prediction (RCELP)
- RTAudio - used by Microsoft Live Communication Server
- Selectable Mode Vocoder (SMV)
- Speex, patent free
- Triple Rate CODER (TRC) - used in some pocket recorders.
- Vector Sum Excited Linear Prediction (VSELP)
Frame Rate
Frame rate is a measurement of the frequency (rate) at which an imaging device produces unique consecutive images called frames. Frame rate is most often expressed in frames per second (fps), or simply hertz (Hz).
Frame rates in film and television
There are three main frame rate standards in the TV and movie-making business.
- 60i (59.94 to be exact; 60 interlaced fields = 29.976 frames) is the standard video field rate per second that is used for NTSC television.
- 50i (50 interlaced fields = 25 frames) is the standard video field rate per second for PAL and SECAM television.
- 30p, or 30-frame progressive, is a non-interlaced format and produces video at 30 frames per second. Progressive (non-interlaced) scanning emulates a film camera's frame-by-frame image capture and gives clarity for high speed subjects and a film-like appearance.
- The 24p frame rate is also a non-interlaced format, and is now widely adopted by those who plan to transfer video to film. 24p is often used by film makers to give their productions a film “look” which is due to the frame rate.
- 35 mm movie cameras use a standard exposure rate of 24 frames per second.
- 25p is a video format which runs twenty-five progressive (hence the “P”) frames per second. This frame rate is derived from the PAL television standard of 50i (or 25interlaced frames per second). While 25p captures only half the motion that normal 50i PAL registers, it yields a higher vertical resolution on moving subjects. It is also better suited to progressive-scan output (e.g. on LCD displays, computer monitors and projectors) because the interlacing is absent. Like 24p, 25p is often used to achieve “cine”-look.
- 60p is a progressive format used in high-end HDTV systems, and is almost always what is meant by “1080p” - there are many 1080p sets that do not run at 60fps]. While it (1080p60) is not technically part of the ATSC or DVB broadcast standards, it is rapidly gaining ground in the areas of set-top boxes and video recordings.
Audio
Sampling Rate
Sampling rate is defined as the number of samples per second taken from a continuous signal to make a discrete signal. For time-domain signals, it can be measured in hertz (Hz). This means that for a CD with a sampling rate of 44.1kHz, a “snapshot” is taken of the audio 44100 times per second.
Audio in Scratch Live is always internally resampled to achieve the vinyl scratch effects, therefore it can accept audio at any sample rate up to 48kHz with 44.1kHz and 48kHz being the most common sampling rates used.
The end result for the user is that it doesn't matter if the audio is from a CD or video, Scratch Live will handle the sampling rate conversion for you and output what you expect to hear.
Bit Depth
When converting an analog signal into digital audio, the bit depth determines the dynamic range and signal-to-noise ratio. The bit depth is not related to the frequency range, which is determined by the sample rate.
The general rule-of-thumb relationship between bit depth and dynamic range is, for each 1-bit increase in bit depth, the dynamic range will increase by 6 dB. 24-bit digital audio has a theoretical maximum dynamic range of 144 dB while 16-bit has a theoretical range of 96dB.
Display Resolution
The display resolution of a digital television or computer display typically refers to the number of distinct pixels in each dimension that can be displayed. It can be an ambiguous term especially as the displayed resolution is controlled by different factors in cathode ray tube (CRT) and flat panel or projection displays using fixed picture-element (pixel) arrays.
One use of the term “display resolution” applies to fixed-pixel-array displays such as plasma display panels (PDPs), liquid crystal displays (LCDs), digital light processing (DLP) projectors, or similar technologies, and is simply the physical number of columns and rows of pixels creating the display (e.g., 800×600 or 1024×768). A consequence of having a fixed grid display is that for multiformat video inputs all displays need a “scaling-engine” (a digital video processor that includes a memory array) to match the incoming picture format to the display.
2-Pass Encoding
2 Pass encoding is an encoding method where the encoder redistributes the available bandwidth as determined by the bitrate and quantizer settings in an intelligent manner. The encoder will run through the file and create a profile which maps out the places where more data allocation is needed and then the second pass will encode the file, redistributing the available bandwidth where it is needed the most.
For example, a high motion scene may receive a greater share of bandwidth than a low motion scene.
CPU vs File Size
When storing video users are often confronted with the choice of storing higher quality videos with low CPU usage needed to decode them (at the expense of much higher disk usage) or highly compressed videos which take up far less disk space at the expense of higher CPU usage needed to decode the video in real-time.
Aspect Ratio
The aspect ratio of an image is its displayed width divided by its displayed height. The most common aspect ratios you're likely to come across are 4:3 (1.33:1) which is universal for standard-definition video formats. and 16:9 (1.78:1), universal to high-definition television and European digital television. There are other cinema and video aspect ratios that exist, however they are not in common use.
Converting formats of unequal ratios is done by either cropping the original image adding horizontal black bars (letterboxing) or vertical black bars (pillarboxing) to retain the original format's aspect ratio.
4:3 standard
The 4:3 ratio for standard television has been in use since the beginning of television and many computer monitors use the same aspect ratio. 4:3 is the aspect ratio defined by the Academy of Motion Picture Arts and Sciences as a standard after the advent of optical sound-on-film.
16:9 standard
16:9 is the international standard format for HDTV and many digital video cameras have the capability to record in 16:9. Anamorphic DVD transfers store the information vertically stretched in a 4:3 aspect ratio. If the TV can handle an anamorphic image, it will horizontally decompress the signal to 16:9. If not, the DVD player can reduce scan lines and add letterboxing before sending the image to the TV. Wider ratios such as 1.85:1 and 2.40:1 are accommodated within the 16:9 DVD frame by additional black bars within the image itself.
Interlacing
Interlace is a technique of improving the picture quality of a video signal without consuming any extra bandwidth. It was invented by RCA engineer Randall C. Ballard in 1932. It was commonly used in television until the 1970s, when the needs of computer monitors resulted in the reintroduction of progressive scan. Interlace is still used for most standard definition TVs, and the 1080i HDTV broadcast standard, but not for LCD, micromirror (DLP), or plasma displays.
With progressive scan, an image is captured, transmitted and displayed in a similar way to text on a page - line by line, from top to bottom. An interlaced scan does this too but only for every alternate line. It starts from the top left corner to the bottom right corner and then is repeated again, only this time starting at the second row. This fills in the gaps left behind from the first scan.
Such scan of every second line is called interlacing. A field is an image that contains only half of the lines you would need to make a complete picture. This allows the two scanned fields to be perceived as a continuous image which allows the viewing of full horizontal detail with half the bandwidth used by a full progressive scan.
NTSC
NTSC stands for National Television Standards Committee. It is a standard for scanning television signals that is primarily used in the United States' although it has been adopted by numerous other countries. Frames are displayed at 30 frames per second.
PAL
PAL stands for Phase Alternative Line System and it is the European TV standard for scanning television signals. Frames are displayed at 25 frames per second. PAL is used in most European countries and is gaining popularity worldwide.