Table of contents
Introduction
You now have everything in place to embark upon the most gruelling step of the entire encoding process - the encoding itself. This step is the core of the entire process and takes the largest amount of time.
You can generally run the video and audio encoding at roughly the same time; the audio encoding will almost certainly take a lot less time than the video encoding.
Video encoding
If you are encoding for a console intended to be displayed on a TV, you will need to determine the pixel aspect ratio (PAR)[1] of your target console before proceeding (for handhelds or arcade machines, you can just use 1:1 or not specify this at all). A table of these follows for most standard resolutions.
Resolution | Ratio |
---|---|
256x224 | 7:6 |
256x240 | 5:4 |
320x224 | 14:15 |
512x240 | 5:8 |
320x240 | 1:1 |
x by y | (4/3)*(y/x) (as an integer ratio) |
Regardless of whether you are using an
.avi
or an .avs
as your input, the following command line (with some minor tweaks as noted afterwards) will initiate the video encoding process:
x264 --sar <PAR as found above> --crf 20 --keyint <keyint> --ref 16 --no-fast-pskip --bframes 16 --b-adapt 2 --direct auto --me umh --merange 64 --subme 11 --trellis 2 --partitions all --input-range pc --range pc --no-dct-decimate --tcfile-in times.txt -o video.mp4 in.avi
Before dissecting this command line, please note the following:
-
in.avi
is the file from your previous step. For an AVISynth script input, substitute.avs
for.avi
.- If you are using the DeDup plugin, the above command line is intended for the second of the two scripts.
- If you are using direct264, you can specify
--deldup 40:0.1:100:1:0.1
to employ its duplicate frame removal filter[2]. You should not do this if you are using AVISynth's DeDep plugin. - Also if you are using direct264, be sure to add
--versioninfo
in your batch file. X264 does this automatically, direct264 does not. This writes down the settings used in the batch file to make the encode.
Now, let's briefly look over the x264 command line options:
-
--crf 20
: this sets the target constant rate factor of the video, which ultimately determines the video's bit rate. 20 has been semi-arbitrarily chosen as the site's target for visual quality; you may need to adjust this slightly higher or lower in order to obtain decent video quality and/or acceptable file sizes. Lower number means better quality and higher bit rate; higher number means worse quality and lower bit rate.- For games with low visual complexity, you may be able to use
--crf 0
(lossless x264) for the best possible visual quality without too much increase in filesize. This is most likely to be possible with 8-bit systems (NES, SMS, Game Gear, and Game Boy).
- For games with low visual complexity, you may be able to use
-
--keyint <keyint>
: the maximum interval between key frames, in frames. This number is normally set to correspond to ten seconds (i.e. ten times the video's (average) frame rate) - a value selected to give a reasonable compromise between the ability to seek in the video and file size. This value can be computed fromtimes.txt
as( (number of lines) - 1) * 10000 / (last timecode)
, rounded to the nearest integer. For a more usable video, you can decrease this value; do not increase it past this point. -
--ref 16
: This sets the number of frames that any given frame can use as a reference; 16 is the maximum. Larger numbers tend to result in smaller file sizes at the expense of encoding time. -
--no-fast-pskip
: this disables fast P-frame (one of two types of reference frames) skipping, which improves visual quality at the expense of a significant amount of speed. -
--bframes 16
: sets the maximum number of B-frames (the other, smaller type of reference frame) between I-frames (fully specified picture) and P-frames; larger values reduce file size at the possible expense of seeking ability (though this has not been reported to be an issue in encodes that have used it). 16 is the maximum. -
--b-adapt 2
: selects the algorithm used for determining placement of B-frames vs. P-frames. 2 is the 'Optimal' algorithm. -
--direct auto
: x264 employs two different motion prediction methods (spatial
andtemporal
);auto
allows it to switch between them as necessary. -
--me umh
: Sets the algorithm used for motion estimation.umh
(uneven multi-hexagonal search) is a balance between encoding time and final file size. If you have plenty of computational power, consider usingesa
ortesa
(exhaustive searches) instead. -
--merange 64
: Loosely corresponds to the maximum speed in pixels per frame of motion in the video that can be detected by the motion estimation algorithms. Larger numbers will slow down encoding and will gradually reduce file size; 64 has been chosen as a good trade-off between file size and encoding time. -
--subme 11
: Sets the algorithm used for subpixel motion estimation; larger numbers use more complex algorithms, reducing file size at the expense of encoding time. 11 is the most complex algorithm. This is most useful for 3D platforms. -
--trellis 2
: Enables full trellis quantization, which reduces file size at the expense of encoding time. -
--partitions all
: Allows all types of block partitioning to be used in the encoding, increasing visual quality at the expense of encoding time. -
--input-range pc
,--range pc
: Allows the full YV12 colour space to be used in the resulting video. These should only be used if the input is PC-range YV12 or RGB (in which casex264
is doing the colour conversion itself). -
--no-dct-decimate
: Disables dropping of DCT blocks for a slight increase in visual quality at the expense of a slight increase in file size. -
--tcfile-in times.txt
: If you are using duplicate frame removal, this passes the information about frame timecodes to x264 so that it has the correct timing information when deciding how best to encode frames, so that features that are displayed for significant amounts of time can be rendered in detail. If you are not using duplicate frame removal (such as for streaming encodes), this is not necessary (as you will not have timecode information) and can be omitted.
You are encouraged to experiment with tweaking these options; this is intended as a generic set of options to serve as a good starting point, and our encoders tend to develop their own set of options which serves their needs best. The Encoders' Corner is a good place to look for advice.
Audio encoding
This step assumes that the
.wav
from the previous step is called audio.wav
.
Current encoder guidelines suggest using Ogg Vorbis or AAC as audio codecs depending on the target container format (Ogg Vorbis for MKV and AAC for MP4).
Use the following command line:
oggenc2 -q 1 audio.wav
-q 1
specifies the target quality factor, determining the bitrate of the final product relative to the complexity of the audio file (similar to --crf
with x264). 1 has been semi-arbitrarily selected as a suggested target value known to give consistently acceptable results; this value may need to be raised for games with complex sound.
Use the following command line:
neroAacEnc -q 0.25 -if audio-fixed.wav -of audio.mp4
-q 0.25
specifies a target quality factor, determining the bitrate of the final product relative to the complexity of the audio file (similar to --crf
with x264). 0.25 has been semi-arbitrarily selected as a suggested target value known to give consistently acceptable results; this value may need to be raised for games with complex sound.
After you have encoded the file, you can check the audio delay with MP4Box:
MP4Box -info audio.aac
[2]: The '40' is the minimum frame rate of the resulting video; you can reduce this significantly if desired (values as low as 0.1 are in common use). The remaining values are present to prevent most non-duplicate frames from being detected as duplicate frames. But if a game still has issues with dropped frames (There is one reported case), it is suggested you increase the minimum frame to something a bit higher, or consider not using Deldup at all. In this case, it might be a good idea to use DeDup.