HTML 5 Video on the Web
Video on the Web
❧
Diving In
Anyone who has visited YouTube.com in the past four years knows that you can embed video in a web page. But prior to HTML5, there was no standards-based way to do this. Virtually all the video you’ve ever watched “on the web” has been funneled through a third-party plugin — maybe QuickTime, maybe RealPlayer, maybe Flash. (YouTube uses Flash.) These plugins integrate with your browser well enough that you may not even be aware that you’re using them. That is, until you try to watch a video on a platform that doesn’t support that plugin.
HTML5 defines a standard way to embed video in a web page, using a element. Support for the element is still evolving, which is a polite way of saying it doesn’t work yet. At least, it doesn’t work everywhere. But don’t despair! There are alternatives and fallbacks and options galore.
element supportIE8IE7Fx3.5Fx3.0Saf4Saf3ChromeOpera··✓·✓✓✓·But support for the element itself is really only a small part of the story. Before we can talk about HTML5 video, you first need to understand a little about video itself. (If you know about video already, you can skip ahead to What Works on the Web.)
❧
Video Containers
You may think of video files as “AVI files” or “MP4 files.” In reality, “AVI” and “MP4″ are just container formats. Just like a ZIP file can contain any sort of file within it, video container formats only define how to store things within them, not what kinds of data are stored. (It’s a little more complicated than that, because not all video streams are compatible with all container formats, but never mind that for now.)
A video file usually contains multiple tracks — a video track (without audio), plus one or more audio tracks (without video). Tracks are usually interrelated. An audio track contains markers within it to help synchronize the audio with the video. Individual tracks can have metadata, such as the aspect ratio of a video track, or the language of an audio track. Containers can also have metadata, such as the title of the video itself, cover art for the video, episode numbers (for television shows), and so on.
There are lots of video container formats. Some of the most popular include
- MPEG 4, usually with an
.mp4or.m4vextension. The MPEG 4 container is based on Apple’s older QuickTime container (.mov). Movie trailers on Apple’s website still use the older QuickTime container, but movies that you rent from iTunes are delivered in an MPEG 4 container.- Flash Video, usually with an
.flvextension. Flash Video is, unsurprisingly, used by Adobe Flash. Prior to Flash 9.0.60.184 (a.k.a. Flash Player 9 Update 3), this was the only container format that Flash supported. More recent versions of Flash also support the MPEG 4 container.- Ogg, usually with an
.ogvextension. Ogg is an open standard, open-source-friendly, and unencumbered by any known patents. Firefox 3.5, Chrome 4, and Opera 10 support — natively, without platform-specific plugins — the Ogg container format, Ogg video (called “Theora”), and Ogg audio (called “Vorbis”). On the desktop, Ogg is supported out-of-the-box by all major Linux distributions, and you can use it on Mac and Windows by installing the QuickTime components or DirectShow filters, respectively. It is also playable with the excellent VLC on all platforms.- Audio Video Interleave, usually with an
.aviextension. The AVI container format was invented by Microsoft in a simpler time, when the fact that computers could play video at all was considered pretty amazing. It does not officially support many of the features of more recent container formats. It does not officially support any sort of video metadata. It does not even officially support most of the modern video and audio codecs in use today. Over time, various companies have tried to extend it in generally incompatible ways to support this or that, and it is still the default container format for popular encoders such as MEncoder.❧
Video Codecs
When you talk about “watching a video,” you’re probably talking about a combination of one video stream and one audio stream. But you don’t have two different files; you just have “the video.” Maybe it’s an AVI file, or an MP4 file. These are just container formats, like a ZIP file that contains multiple kinds of files within it. The container format defines how to store the video and audio streams in a single file.
When you “watch a video,” your video player is doing several things at once:
- Interpreting the container format to find out which video and audio tracks are available, and how they are stored within the file so that it can find the data it needs to decode next
- Decoding the video stream and displaying a series of images on the screen
- Decoding the audio stream and sending the sound to your speakers
A video codec is an algorithm by which a video stream is encoded, i.e. it specifies how to do #2 above. (The word “codec” is a portmanteau, a combination of the words “coder” and “decoder.”) Your video player decodes the video stream according to the video codec, then displays a series of images, or “frames,” on the screen. Most modern video codecs use all sorts of tricks to minimize the amount of information required to display one frame after the next. For example, instead of storing each individual frame (like a screenshot), they will only store the differences between frames. Most videos don’t actually change all that much from one frame to the next, so this allows for high compression rates, which results in smaller file sizes.
There are lossy and lossless video codecs. Lossless video is much too big to be useful on the web, so I’ll concentrate on lossy codecs. A lossy video codec means that information is being irretrievably lost during encoding. Like copying an audio cassette tape, you’re losing information about the source video, and degrading the quality, every time you encode. Instead of the “hiss” of an audio cassette, a re-re-re-encoded video may look blocky, especially during scenes with a lot of motion. (Actually, this can happen even if you encode straight from the original source, if you choose a poor video codec or pass it the wrong set of parameters.) On the bright side, lossy video codecs can offer amazing compression rates, and many offer ways to “cheat” and smooth over that blockiness during playback, to make the loss less noticeable to the human eye.
There are tons of video codecs. The three most relevant codecs are MPEG-4 ASP, H.264, and Theora.
MPEG-4 ASP
MPEG-4 ASP is also known as “MPEG-4 Advanced Simple Profile.” MPEG-4 ASP was developed by the MPEG group and standardized in 2001. You may have heard of DivX, Xvid, or 3ivx; these are all competing implementations of the MPEG-4 ASP standard. Xvid is open source; DivX and 3ivx are closed source. The company behind DivX has had some mainstream success in branding “DivX” as synonymous with “MPEG-4 ASP.” For example, this “DivX-certified” DVD player can actually play most MPEG-4 ASP videos in an AVI container, even if they were created with a competing encoder. (To confuse things even further, the company behind DivX has now created their own container format.)
MPEG-4 ASP is patent-encumbered; licensing is brokered through the MPEG LA group. MPEG-4 ASP video can be embedded in most popular container formats, including AVI, MP4, and MKV.
H.264
H.264 is also known as “MPEG-4 part 10,” a.k.a. “MPEG-4 AVC,” a.k.a. “MPEG-4 Advanced Video Coding.” H.264 was also developed by the MPEG group and standardized in 2003. It aims to provide a single codec for low-bandwidth, low-CPU devices (cell phones); high-bandwidth, high-CPU devices (modern desktop computers); and everything in between. To accomplish this, the H.264 standard is split into “profiles,” which each define a set of optional features that trade complexity for file size. Higher profiles use more optional features, offer better visual quality at smaller file sizes, take longer to encode, and require more CPU power to decode in real-time.
To give you a rough idea of the range of profiles, Apple’s iPhone supports Baseline profile, the AppleTV set-top box supports Baseline and Main profiles, and Adobe Flash on a desktop PC supports Baseline, Main, and High profiles. YouTube (owned by Google, my employer) now uses H.264 to encode high-definition videos, playable through Adobe Flash; YouTube also provides H.264-encoded video to mobile devices, including Apple’s iPhone and phones running Google’s Android mobile operating system. Also, H.264 is one of the video codecs mandated by the Blu-Ray specification; Blu-Ray discs that use it generally use the High profile.
Most non-PC devices that play H.264 video (including iPhones and standalone Blu-Ray players) actually do the decoding on a dedicated chip, since their main CPUs are nowhere near powerful enough to decode the video in real-time. Many desktop graphics cards also support decoding H.264 in hardware. There are a number of competing H.264 encoders, including the open source x264 library. The H.264 standard is patent-encumbered; licensing is brokered through the MPEG LA group. H.264 video can be embedded in most popular container formats, including MP4 (used primarily by Apple’s iTunes Store) and MKV (used primarily by non-commercial video enthusiasts).
Theora
Theora evolved from the VP3 codec and has subsequently been developed by the Xiph.org Foundation. Theora is a royalty-free codec and is not encumbered by any known patents other than the original VP3 patents, which have been irrevocably licensed royalty-free. Although the standard has been “frozen” since 2004, the Theora project (which includes an open source reference encoder and decoder) only released version 1.0 in November 2008 and version 1.1 in September 2009.
Theora video can be embedded in any container format, although it is most often seen in an Ogg container. All major Linux distributions support Theora out-of-the-box, and Mozilla Firefox 3.5 includes native support for Theora video in an Ogg container. And by “native”, I mean “available on all platforms without platform-specific plugins.” You can also play Theora video on Windows or on Mac OS X after installing Xiph.org’s open source decoder software.
❧
Audio Codecs
Unless you’re going to stick to films made before 1927 or so, you’re going to want an audio track in your video. Like video codecs, audio codecs are algorithms by which an audio stream is encoded. Like video codecs, there are lossy and lossless audio codecs. And like lossless video, lossless audio is really too big to put on the web. So I’ll concentrate on lossy audio codecs.
Actually, it’s even narrower than that, because there are different categories of lossy audio codecs. Audio is used in many places where video is not (telephony, for example), and there is an entire category of audio codecs optimized for encoding speech. You wouldn’t rip a music CD with these codecs, because the result would sound like a 4-year-old singing into a speakerphone. But you would use them in an Asterisk PBX, because bandwidth is precious, and these codecs can compress human speech into a fraction of the size of general-purpose codecs. However, due to lack of support in both native browsers and third-party plugins, speech-optimized audio codecs never really took off on the web. So I’ll concentrate on general purpose lossy audio codecs.
As I mentioned in earlier, when you “watch a video,” your computer is doing several things at once:
- Interpreting the container format
- Decoding the video stream
- Decoding the audio stream and sending the sound to your speakers
The audio codec specifies how to do #3 — decoding the audio stream and turning it into digital waveforms that your speakers then turn into sound. As with video codecs, there are all sorts of tricks to minimize the amount of information stored in the audio stream. And since we’re talking about lossy audio codecs, information is being lost during the recording → encoding → decoding → listening lifecycle. Different audio codecs throw away different things, but they all have the same purpose: to trick your ears into not noticing the parts that are missing.
One concept that audio has that video does not is channels. We’re sending sound to your speakers, right? Well, how many speakers do you have? If you’re sitting at your computer, you may only have two: one on the left and one on the right. My desktop has three: left, right, and one more on the floor. So-called “surround sound” systems can have six or more speakers, strategically placed around the room. Each speaker is fed a particular channel of the original recording. The theory is that you can sit in the middle of the six speakers, literally surrounded by six separate channels of sound, and your brain synthesizes them and feels like you’re in the middle of the action. Does it work? A multi-billion-dollar industry seems to think so.
Most general-purpose audio codecs can handle two channels of sound. During recording, the sound is split into left and right channels; during encoding, both channels are stored in the same audio stream; during decoding, both channels are decoded and each is sent to the appropriate speaker. Some audio codecs can handle more than two channels, and they keep track of which channel is which and so your player can send the right sound to the right speaker.
There are lots of audio codecs. Did I say there were lots of video codecs? Forget that. There are a metric fuck-ton of audio codecs, but on the web, there are really only three you need to know about: MP3, AAC, and Vorbis.
MPEG-1 Audio Layer 3
MPEG-1 Audio Layer 3 is colloquially known as “MP3.” If you haven’t heard of MP3s, I don’t know what to do with you. Walmart sells portable music players and calls them “MP3 players.” Walmart. Anyway…
MP3s can contain up to 2 channels of sound. They can be encoded at different bitrates: 64 kbps, 128 kbps, 192 kbps, and a variety of others from 32 to 320. Higher bitrates mean larger file sizes and better quality audio, although the ratio of audio quality to bitrate is not linear. (128 kbps sounds more than twice as good as 64 kbps, but 256 kbps doesn’t sound twice as good as 128 kbps.) Furthermore, the MP3 format allows for variable bitrate encoding, which means that some parts of the encoded stream are compressed more than others. For example, silence between notes can be encoded at a very low bitrate, then the bitrate can spike up a moment later when multiple instruments start playing a complex chord. MP3s can also be encoded with a constant bitrate, which, unsurprisingly, is called constant bitrate encoding.
The MP3 standard doesn’t define exactly how to encode MP3s (although it does define exactly how to decode them); different encoders use different psychoacoustic models that produce wildly different results, but are all decodable by the same players. The open source LAME project is the best free encoder, and arguably the best encoder period for all but the lowest bitrates.
The MP3 format was standardized in 1991 and is patent-encumbered, which explains why Linux
suckscan’t play MP3 files out of the box. Pretty much every portable music player supports standalone MP3 files, and MP3 audio streams can be embedded in any video container. Adobe Flash can play both standalone MP3 files and MP3 audio streams within an MP4 video container.Advanced Audio Coding
Advanced Audio Coding is affectionately known as “AAC.” Standardized in 1997, it lurched into prominence when Apple chose it as their default format for the iTunes Store. Originally, all AAC files “bought” from the iTunes Store were encrypted with Apple’s proprietary DRM scheme, called FairPlay. Many songs in the iTunes Store are now available as unprotected AAC files, which Apple calls “iTunes Plus” because it sounds so much better than calling everything else “iTunes Minus.” The AAC format is patent-encumbered; licensing rates are available online.
AAC was designed to provide better sound quality than MP3 at the same bitrate, and it can encode audio at any bitrate. (MP3 is limited to a fixed number of bitrates, with an upper bound of 320 kbps.) AAC can encode up to 48 channels of sound, although in practice no one does that. The AAC format also differs from MP3 in defining multiple profiles, in much the same way as H.264, and for the same reasons. The “low-complexity” profile is designed to be playable in real-time on devices with limited CPU power, while higher profiles offer better sound quality at the same bitrate at the expense of slower encoding and decoding.
All current Apple products, including iPods, AppleTV, and QuickTime support certain profiles of AAC in standalone audio files and in audio streams in an MP4 video container. Adobe Flash supports all profiles of AAC in MP4, as do the open source mplayer and VLC video players. For encoding, the FAAC library is the open source option; support for it is a compile-time option in mencoder and ffmpeg.
Vorbis
Vorbis is often called “Ogg Vorbis,” although this is technically incorrect. (“Ogg” is just a container format, and Vorbis audio streams can be embedded in other containers.) Vorbis is not encumbered by any known patents and is therefore supported out-of-the-box by all major Linux distributions and by portable devices running the open source Rockbox firmware. Mozilla Firefox 3.5 supports Vorbis audio files in an Ogg container, or Ogg videos with a Vorbis audio track. Android mobile phones can also play standalone Vorbis audio files. Vorbis audio streams are usually embedded in an Ogg container, but they can also be embedded in an MP4 or MKV container (or, with some hacking, in AVI). Vorbis supports an arbitrary number of sound channels.
There are open source Vorbis encoders and decoders, including OggConvert (encoder), ffmpeg (decoder), aoTuV (encoder), and libvorbis (decoder). There are also QuickTime components for Mac OS X and DirectShow filters for Windows.
❧
What Works on the Web
If your eyes haven’t glazed over yet, you’re doing better than most. As you can tell, video (and audio) is a complicated subject — and this was the abridged version! I’m sure you’re wondering how all of this relates to HTML5. Well, HTML5 includes a element for embedding video into a web page. There are no restrictions on the video codec, audio codec, or container format you can use for your video. One element can link to multiple video files, and the browser will choose the first video file it can actually play. It is up to you to know which browsers support which containers and codecs.
As of this writing, this is the landscape of HTML5 video:
- Mozilla Firefox (3.5 and later) supports Theora video and Vorbis audio in an Ogg container.
- Google Chrome (3.0 and later) supports Theora video and Vorbis audio in an Ogg container. It also supports H.264 video (all profiles) and AAC audio (all profiles) in an MP4 container.
- Safari on Macs and Windows PCs (3.0 and later) will support anything that QuickTime supports. In theory, you could require your users to install third-party QuickTime plugins. In practice, very few users are going to do that. So you’re left with the formats that QuickTime supports “out of the box.” This is a long list, but it does not include Theora video, Vorbis audio, or the Ogg container. However, QuickTime does support H.264 video (main profile) and AAC audio in an MP4 container.
- Mobile phones like Apple’s iPhone and Google Android phones support H.264 video (baseline profile) and AAC audio (“low complexity” profile) in an MP4 container.
- Adobe Flash (9.0.60.184 and later) supports H.264 video (all profiles) and AAC audio (all profiles) in an MP4 container.
- Internet Explorer has no HTML5 video support at all, but virtually all Internet Explorer users will have the Adobe Flash plugin. Later in this chapter, I’ll show you how you can use HTML5 video but gracefully fall back to Flash.
That might be easier to digest in table form.
Video codec supportCodecs/containerFirefox 3.5SafariiPhoneAndroidChrome 3Theora+Vorbis+Ogg✓···✓H.264+AAC+MP4·✓✓✓✓And now for the knockout punch:
Professor Markup Says
There is no single combination of containers and codecs that works in all HTML5 browsers.
To make your video watchable across all of these devices and platforms, you’re going to have to encode your video more than once.
Here’s what your video workflow looks like:
- Make one version that uses Theora video and Vorbis audio in an Ogg container.
- Make another version that uses H.264 baseline video and AAC “low complexity” audio in an MP4 container.
- Link to both video files from a single element.
- If you detect a lack of HTML5 video support, replace the element with a Flash-based video player.
❧
Licensing Issues with H.264 Video
Before we continue, I need to point out that there is a cost to encoding your videos twice. Well, there’s the obvious cost, that you have to encode your videos twice, and that takes more computers and more time than just doing it once. But there’s another very real cost associated with H.264 video: licensing costs.
Remember when I first explained H.264 video, and I mentioned offhand that the video codec was patent-encumbered and licensing was brokered by the MPEG LA consortium. That turns out to be kind of important. To understand why it’s important, I direct you to The H.264 Licensing Labyrinth:
MPEG LA splits the H.264 license portfolio into two sublicenses: one for manufacturers of encoders or decoders and the other for distributors of content. …
The sublicense on the distribution side gets further split out to four key subcategories, two of which (subscription and title-by-title purchase or paid use) are tied to whether the end user pays directly for video services, and two of which (“free” television and internet broadcast) are tied to remuneration from sources other than the end viewer. …
The licensing fee for “free” television is based on one of two royalty options. The first is a one-time payment of $2,500 per AVC transmission encoder, which covers one AVC encoder “used by or on behalf of a Licensee in transmitting AVC video to the End User,” who will decode and view it. If you’re wondering whether this is a double charge, the answer is yes: A license fee has already been charged to the encoder manufacturer, and the broadcaster will in turn pay one of the two royalty options.
The second licensing fee is an annual broadcast fee. … [T]he annual broadcast fee is broken down by viewership sizes:
- $2,500 per calendar year per broadcast markets of 100,000–499,999 television households
- $5,000 per calendar year per broadcast market of 500,000–999,999 television households
- $10,000 per calendar year per broadcast market of 1,000,000 or more television households
… With all the issues around “free” television, why should someone involved in nonbroadcast delivery care? As I mentioned before, the participation fees apply to any delivery of content. After defining that “free” television meant more than just [over-the-air], MPEG LA went on to define participation fees for internet broadcasting as “AVC video that is delivered via the Worldwide Internet to an end user for which the end user does not pay remuneration for the right to receive or view.” In other words, any public broadcast, whether it is [over-the-air], cable, satellite, or the internet, is subject to participation fees. …
The fees are potentially somewhat steeper for internet broadcasts, perhaps assuming that internet delivery will grow much faster than OTA or “free” television via cable or satellite. Adding the “free television” broadcast-market fee together with an additional fee, MPEG LA grants a reprieve of sorts during the first license term, which ends on Dec. 31, 2010, and notes that “after the first term the royalty shall be no more than the economic equivalent of royalties payable during the same time for free television.”
Did you catch that last part? Legally encoding and distributing H.264 video already costs money. But starting in 2011, it’s going to cost a whole lot more.
❧
Encoding Ogg Video with Firefogg
(In this section, I’m going to use “Ogg video” as a shorthand for “Theora video and Vorbis audio in an Ogg container.” This is the combination of codecs+container that works natively in Mozilla Firefox and Google Chrome.)
FireFogg is an open source, GPL-licensed Firefox extension for encoding Ogg video. To use it, you’ll need to install Mozilla Firefox 3.5 or later, then visit firefogg.org.
FireFogg home page ↷
Click “Install Firefogg.” Firefox will prompt whether you really want to allow the site to install an extension. Click “Allow” to continue.
↶ Allow FireFogg to install
Firefox will present the standard software installation window. Click “Install” to continue.
Install FireFogg ↷
Click “Restart Firefox” to complete the installation.
↶ Restart Firefox
After restarting Firefox,
firefogg.orgwill confirm that FireFogg was successfully installed.Installation successful ↷
Click “Make Ogg Video” to start the encoding process.
↶ Let’s make some video!
Click “Select file” to select your source video.
Select your video file ↷
FireFogg has six “tabs”:
- Presets. The default preset is “web video,” which is fine for our purposes.
- Encoding range. Encoding video can take a long time. When you’re first getting started, you may want to encode just part of your video (say, the first 30 seconds) until you find a combination of settings you like.
- Basic quality and resolution control. This is where most of the important options are.
- Metadata. I won’t cover it here, but you can add metadata to your encoded video like title and author. You’ve probably added metadata to your music collection with iTunes or some other music manager. This is the same idea.
- Advanced video encoding controls. Don’t mess with these unless you know what you’re doing. (Firefogg offers interactive help on most of these options. Click the “i” symbol next to each option to learn more about it.)
- Advanced audio encoding controls. Again, don’t mess with these unless you know what you’re doing.
The only options I’m going to cover are in the “Basic quality and resolution control” tab. It contains all the important options:
- Video Quality. This is measured on a scale of 0 (lowest quality) to 10 (highest quality). Higher numbers mean bigger file sizes, so you’ll need to experiment to determine the best size/quality ratio for your needs.
- Audio Quality. This is measured on a scale of -1 (lowest quality) to 10 (highest quality). Higher numbers mean bigger file sizes, just like the video quality setting.
- Video Codec. This should always be “theora.”
- Audio Codec. This should always be “vorbis.”
- Video Width and Video Height. These defaults to the actual width and height of your source video. If you want to resize the video during encoding, you can change the width (or height) here. FireFogg will automatically adjust the other dimension to maintain the original proportions (so your video won’t end up smooshed or stretched).
In this example, I’m going to resize the video to half its original width. Notice how FireFogg automatically adjusts the height to match.
Adjust video width and height ↷
Once you’ve fiddled with all the knobs, click “Save Ogg” to start the actual encoding process. FireFogg will prompt you for a filename for the encoded video.
“Save Ogg” ↷
FireFogg will show a nice progress bar as it encodes your video. All you need to do is wait (and wait, and wait)!
↶ Encoding in progress
❧
Batch Encoding Ogg Video with ffmpeg2theora
(Just as in the previous section, in this section I’m going to use “Ogg video” as a shorthand for “Theora video and Vorbis audio in an Ogg container.” This is the combination of codecs+container that works natively in Mozilla Firefox and Google Chrome.)
There are a number of offline encoders for Ogg video. If you’re looking at batch encoding a lot of video files and you want to automate the process, you should definitely check out ffmpeg2theora.
ffmpeg2theora is an open-source, GPL-licensed application for encoding Ogg video. Pre-built binaries are available for Mac OS X, Windows, and modern Linux distributions. It can take virtually any video file as input, including DV video produced by many consumer-level camcorders.
To use ffmpeg2theora, you need to call it from the command line. (On Mac OS X, open Applications → Utilities → Terminal. On Windows, open your Start Menu → Programs → Accessories → Command Prompt.)
ffmpeg2theora can take a large number of command line flags. (Type
ffmpeg2theora --helpto read about them all.) I’ll focus on just three of them.
--video-quality Q, where “Q” is a number from 0–10.--audio-quality Q, where “Q” is a number from -2–10.--max_size=WxH, where “W” and “H” are the maximum width and height you want for the video. (The “x” in between is really just the letter “x”.) ffmpeg2theora will resize the video proportionally to fit within these dimensions, so the encoded video might be smaller thanW×H. For example, encoding a 720×480 video with--max_size 320x240will produce a video that is320×213.Thus, here is how you could encode a video with the same settings as we used in the previous section (encoding with Firefogg).
you@localhost$ ffmpeg2theora --videoquality 5 --audioquality 1 --max_size 320x240 NewOrleans2006.dvThe encoded video will be saved in the same directory as the original video, with a
.ogvextension added. You can specify a different location and/or filename by passing an--output=/path/to/encoded/videocommand line flag to ffmpeg2theora.❧
Encoding H.264 Video with HandBrake
(In this section, I’m going to use “H.264 video” as a shorthand for “H.264 baseline profile video and AAC low-complexity profile audio in an MPEG-4 container.” This is the combination of codecs+container that works natively in Safari, in Adobe Flash, on the iPhone, and on Google Android devices.)
Licensing issues aside, the easiest way to encode H.264 video is HandBrake. HandBrake is an open source, GPL-licensed application for encoding H.264 video. (It used to do other video formats too, but in the latest version the developers have dropped support for most other formats and are focusing all their efforts on H.264 video.) Pre-built binaries are available for Windows, Mac OS X, and modern Linux distributions.
HandBrake comes in two flavors: graphical and command-line. I’ll walk you through the graphical interface first, then we’ll see how my recommended settings translate into the command-line version.
After you open the HandBrake application, the first thing to do is select your source video. Click the “Source” dropdown button and choose “Video File” to select a file. HandBrake can take virtually any video file as input, including DV video produced by many consumer-level camcorders.
Select your source video ↷
HandBrake will complain that you haven’t set a default directory to save your encoded videos. You can safely ignore this warning, or you can open the options window (under the “Tools” menu) and set a default output directory.
↶ Ignore this
On the right-hand side is a list of presets. Selecting the “iPhone & iPod Touch” preset will set most of the options you need.
Select iPhone preset ↷
One important option that is off by default is the “Web optimized” option. Selecting this option reorders some of the metadata within the encoded video so you can watch the start of the video while the rest is downloading in the background. I highly recommend always checking this option. It does not affect the quality or file size of the encoded video, so there’s really no reason not to.
↶ Always optimize for web
In the “Picture” tab, you can set the maximum width and height of the encoded video. You should also select the “Keep Aspect Ratio” option to ensure that HandBrake doesn’t smoosh or stretch your video while resizing it.
Set width and height ↷
In the “Video” tab, you can set several important options.
- Video Codec. Make sure this is “H.264 (x264)”
- 2-Pass Encoding. If this is checked, HandBrake will run the video encoder twice. The first time, it just analyzes the video, looking for things like color composition, motion, and scene breaks. The second time, it actually encodes the video using the information it learned during the first pass. As you might expect, this takes about twice as long as single-pass encoding, but it results in better video without increasing file size. I always enable two-pass encoding for H.264 video. Unless you’re building the next YouTube and encoding videos 24 hours a day, you should probably use two-pass encoding too.
- Turbo First Pass. Once you enable 2-pass encoding, you can get a little bit of time back by enabling “turbo first pass.” This reduces the amount of work done in the first pass (analyzing the video), while only slightly degrading quality. I usually enable this option, but if quality is of the utmost importance to you, you should leave it disabled.
- Quality. There are several different ways to specify the “quality” of your encoded video. You can set a target file size, and HandBrake will do its best to ensure that your encoded video is not larger than that. You can set an average “bitrate,” which is the quite literally the number of bits required to store one second worth of encoded video. (It’s called an “average” bitrate because some seconds will require more bits than others.) Or you can specify a constant quality, on a scale of 0 to 100%. Higher numbers will result in better quality but larger files. There is no single right answer for what quality setting you should use.
Ask Professor Markup
☞Q: Can I use two-pass encoding on Ogg video too?
A: Yes, but due to fundamental differences in how the encoder works, you probably don’t need to. Two-pass H.264 encoding almost always results in higher quality video. Two-pass Ogg encoding of Ogg video is only useful if you’re trying to get your encoded video to be a specific file size. (Maybe that is something you’re interested in, but it’s not what these examples show, and it’s probably not worth the extra time for encoding web video.) For best Ogg video quality, use the video quality settings, and don’t worry about two-pass encoding.In this example, I’ve chosen an average bitrate of 600 kbps, which is quite high for a 320×240 encoded video. (Later in this chapter, I’ll show you a sample video encoded at 200 kbps.) I’ve also chosen 2-pass encoding with a “turbo” first pass.
↶ Video quality options
In the “Audio” tab, you probably don’t need to change anything. If your source video has multiple audio tracks, you might need to select which one you want in the encoded video. If your video is mostly a person talking (as opposed to music or general ambient sounds), you can probably reduce the audio bitrate to 96 kbps or so. Other than that, the defaults you inherited from the “iPhone” preset should be fine.
Audio quality options ↷
Next, click the “Browse” button and choose a directory and filename to save your encoded video.
↶ Set destination filename
Finally, click “Start” to start encoding.
Let’s make some video! ↷
HandBrake will display some progress statistics while it encodes your video.
↶ Patience, Grasshopper
❧
Batch Encoding H.264 Video with HandBrake
(Just as in the previous section, in this section I’m going to use “H.264 video” as a shorthand for “H.264 baseline profile video and AAC low-complexity profile audio in an MPEG-4 container.” This is the combination of codecs+container that works natively in Safari, in Adobe Flash, on the iPhone, and on Google Android devices.)
HandBrake also comes in a command-line edition. Like the graphical edition of HandBrake, you should download a recent snapshot.
As with ffmpeg2theora, the command-line edition of HandBrake offers a dizzying array of options. (Type
HandBrakeCLI --helpto read about them.) I’ll focus on just seven:
--preset "X", where “X” is the name of a HandBrake preset. The preset you want for H.264 web video is called “iPhone & iPod Touch”, and it’s important to put the entire name in quotes.--width W, where “W” is the width of your encoded video. HandBrake will automatically adjust the height to maintain the original video’s proportions.--vb Q, where “Q” is the average bitrate (measured in kilobits per second).--two-pass, which enables 2-pass encoding.--turbo, which enables turbo first pass during 2-pass encoding.--input F, where “F” is the filename of your source video.--output E, where “E” is the destination filename for your encoded video.Here is an example of calling HandBrake on the command line, with command line flags that match the settings we chose with the graphical version of HandBrake.
you@localhost$ HandBrakeCLI --preset "iPhone & iPod Touch" --width 320 --vb 600 --two-pass --turbo --input NewOrleans2006.dv --output NewOrleans2006.mp4From top to bottom, this command runs HandBrake with the “iPhone & iPod Touch” preset, resizes the video to 320×240, sets the average bitrate to 600 kbps, enables two-pass encoding with a turbo first pass, reads the file
NewOrleans2006.dv, and encodes it asNewOrleans2006.mp4. Whew!❧
At Last, The Markup
I’m pretty sure this was supposed to be an HTML book. So where’s the markup?
HTML5 gives you two ways to include video on your web page. Both of them involve the element. If you only have one video file, you can simply link to it in a
srcattribute. This is remarkably similar to including an image with antag.One video file ↷
src="NewOrleans2006.ogv">Technically, that’s all you need. But just like an
tag, you should always includewidthandheightattributes in your tags. Thewidthandheightattributes can be the same as the maximum width and height you specified during the encoding process. Don’t worry if one dimension of the video is a little smaller than that. Your browser will center the video inside the box defined by the tag. It won’t ever be smooshed or stretched out of proportion.width="320" height="240">By default, the element will not expose any sort of player controls. You can create your own controls with plain old HTML, CSS, and JavaScript. The element has methods like
play()andpause()and a read/write property calledcurrentTime. There are also read/writevolumeandmutedproperties. So you really have everything you need to build your own interface.If you don’t want to build your own interface, you can tell the browser to display a built-in set of controls. To do this, just include the
controlsattribute in your tag.controls>There are two other optional attributes I want to mention before we go any further:
autobufferandautoplay. Don’t shoot the messenger; let me explain why these are useful. Theautobufferattribute tells the browser that you would like it to start downloading the video file as soon as the page loads. This makes sense if the entire point of the page is to view the video. On the other hand, if it’s just supplementary material that only a few visitors will watch, then maybe you can leaveautobufferoff. It’s off by default.Here’s an example of a video that will start downloading (but not playing) as soon as the page loads:
autobuffer>The
autoplayattribute does exactly what it sounds like: it tells the browser that you would like it to start downloading the video file as soon as the page loads, and you would like it to start playing the video automatically as soon as possible. Some people love this; some people hate it. But let me explain why it’s important to have an attribute like this in HTML5. Some people are going to want their videos to play automatically, even if it annoys their visitors. If HTML5 didn’t define a standard way to auto-play videos, people would resort to JavaScript hacks to do it anyway. (For example, by calling the video’splay()method during the window’sloadevent.) This would be much harder for visitors to counteract. On the other hand, it’s a simple matter to add a checkbox to your browser (or write an extension that adds such an option) that basically says “ignore theautoplayattribute, I don’t ever want videos to play automatically.”Here’s an example of a video that will start downloading and playing as soon as possible after the page loads:
autoplay>And here is a Greasemonkey script that you can install in your local copy of Firefox that prevents HTML5 video from playing automatically. It uses the
autoplayDOM attribute defined by HTML5, which is the JavaScript equivalent of theautoplayattribute in your HTML markup. [disable_video_autoplay.user.js]// ==UserScript== // @name Disable video autoplay // @namespace http://diveintomark.org/projects/greasemonkey/ // @description Ensures that HTML5 video elements do not autoplay // @include * // ==/UserScript== var arVideos = document.getElementsByTagName('video'); for (var i = arVideos.length - 1; i >= 0; i--) { var elmVideo = arVideos[i]; elmVideo.autoplay = false; }But wait a second… If you’ve been following along this whole chapter, you don’t have just one video file; you have two. One is an
.ogvfile that you created with Firefogg or ffmpeg2theora, and the second is an.mp4file that you created with HandBrake. HTML5 provides a way to link to both of them: the element. Each element can contain as many elements as you need. Your browser will go down the list of video sources, in order, and play the first one it’s able to play.That raises another question: how does the browser know which video it can play? Well, in the worst case scenario, it loads each of the videos and tries to play them. That’s a big waste of bandwidth, though. You’ll save a lot of network traffic if you tell the browser up-front about each video. You do this with the
typeattribute on the element.Here’s the whole thing:
Two video files ↷
source src="NewOrleans2006.ogv" type='video/ogg; codecs="theora, vorbis"'> source src="NewOrleans2006.mp4" type='video/mp4; codecs="avc1.42E01E, mp4a.40.2"'>Let’s break that down. The element specifies the width and height for the video, but it doesn’t actually link to a video file. Inside the element are two elements. Each element links to a single video file (with the
srcattribute), and it also gives information about the video format (in thetypeattribute).The
typeattribute looks complicated — hell, it is complicated. It’s a combination of three pieces of information: the container format, the video codec, and the audio codec. For the.ogvvideo file, the container format is Ogg, represented here asvideo/ogg. (Technically speaking, that’s the MIME type for Ogg video files.) The video codec is Theora, and the audio codec is Vorbis. That’s simple enough, except the format of the attribute value is a little screwy. The value itself has to include quotation marks, which means you’ll need to use a different kind of quotation mark to surround the entire value.type='video/ogg; codecs="theora, vorbis"'>The H.264 video is even more complicated. Remember when I said that both H.264 video and AAC audio can come in different “profiles”? We encoded with the H.264 “baseline” profile and the AAC “low-complexity” profile, then wrapped it all in an MPEG-4 container. All of that information is included in the
typeattribute.type='video/mp4; codecs="avc1.42E01E, mp4a.40.2"'>The benefit of going to all this trouble is that the browser will check the
typeattribute first to see if it can play a particular video file. If it decides it can’t play a particular video, it won’t download the file. Not even part of the file. You’ll save on bandwidth, and your visitors will see the video they came for, faster.If you follow the instructions in this chapter for encoding your videos, you can just copy and paste the
typeattribute values from this example. Otherwise, you’ll need to work out thetypeparameters for yourself.❧
What About IE?
As I write this, the current version of Internet Explorer is 8.0. No version of Internet Explorer supports HTML5 video. But all is not lost! Most people who use Internet Explorer also have the Adobe Flash plugin installed. Modern versions of Adobe Flash (starting with 9.0.60.184) support H.264 video and AAC audio in an MPEG-4 container, just like Safari and the iPhone. Once you’ve encoded your H.264 video for Safari, you can play it in a Flash-based video player if you detect that one of your visitors doesn’t have an HTML5-capable browser.
FlowPlayer is an open source, GPL-licensed, Flash-based video player. (Commercial licenses are also available.) To use FlowPlayer, you need three files: two
.swffiles and one JavaScript file. To start with, include the following element at the bottom of your page....You will also need the two
.swffiles,flowplayer-X.swfandflowplayer.controls-X.swf(where “X” is the version number of FlowPlayer). You don’t need to link to these files from your HTML markup. When you call FlowPlayer from script, one of the parameters you pass to theflowplayer()function is the location of these.swffiles.There are two more pieces to the puzzle. First, FlowPlayer doesn’t know anything about the element. It won’t magically transform a tag into a Flash object. But I’ve written a piece of JavaScript that does exactly that. I call it
html5-video.js, because I suck at naming things. Nevertheless…...Second, you need to get Internet Explorer to recognize the and elements by including the HTML5 enabling script at the top of your page. This is important! If you skip this step, Internet Explorer will build the wrong DOM from your element. If Internet Explorer builds the wrong DOM, the
html5-video.jsscript won’t be able to find your element, and the whole house of cards will fall apart. Thus:…And presto-chango! Your standards-based HTML5 element will automatically turn into a FlowPlayer-powered Flash video player in browsers that do not support the element natively. The
html5-video.jsscript even respects thecontrols,autobuffer, andautoplayattributes, and translates them into the appropriate FlowPlayer commands to show the player controls, auto-download, and auto-play the video.Here is a live example of a video that uses these techniques. I encoded the source video with these two commands:
you@localhost$ ffmpeg2theora --videobitrate 200 --max_size 320x240 --output pr6.ogv pr6.dv you@localhost$ HandBrakeCLI --preset "iPhone & iPod Touch" --vb 200 --width 320 --two-pass --turbo --optimize --input pr6.dv --output pr6.mp4The markup is straightforward HTML5:
Mark Pilgrim
An excellent breakdown of html 5 video background and conversion options.