I am curious how gameboy videos are currently encoded, due to my
recent experience with my pokemon yellow TAS.
When I created a demo encode for youtube, I used my own special
purpose code to generate the final rendered file. This was a very
simple implementation which simply output one png file for each frame
of video, and wrote all sound to a target wav file. Internal emulation
time was dilated to allow me to write these files regardless of the
speed of my computer. The png files and wav file were then muxed and
encoded to video using ffmpeg.
My initial attempts were met with failure because the audio would
gradually go out of sync with the video. Initially I thought that this
was because the simulated gameboy did not use exactly 60 fps as its
framerate, so I tried various other plausible framerates. Eventually I
found to my surprise that there is no constant framerate which could
work.
So, I rewrote my AV rendering code to record the exact amount of time
that had elapsed according to the recorded sound, and to adaptively
drop and duplicate frames to keep the frames synced to the sound at a
constant framerate of 60 fps, within an imperceptible drift
tolerance. I found that there are different conditions during
simulation which cause the frames to go out of sync with the
video. During restarts and bootup sequence, there are many frames
which must be added, and during normal operation, frames must be
occasionally dropped at a rate of about 1 frame every three to four
seconds. Source for this program is here:
http://hg.bortreb.com/vba-clojure/file/aeb4b676ba8b/clojure/com/aurellem/run/final_cut.clj
After making this modification to my A/V rendering code, I was able to
achieve a perfect encode of my 12 minute video using my rather
under-powered laptop over the course of a few hours.
Listen to the encode of my TAS that was generated by my program,
especially the beginning of the "My Little Pony" theme song at
12:20. Notice how there are no pops in any of the notes.
http://www.youtube.com/watch?v=p5T81yHkHtI
Now, listen to the official encode of my TAS, especially around 12:25,
where the first note of the song plays.
http://www.youtube.com/watch?v=aYQpl8Jj6Yg
You will notice some pops in the audio when the "My Little Pony" theme
song is played.
You can also hear these pops during the "Pallet Town" song that plays at
the start of each video.
Looking at several other Gameboy encodes, I can notice similar pops
a few times a minute.
So, my questions are:
How are Gameboy TAS encodes rendered?
Why are there pops in the sound for many Gameboy TASs?
Is there something wrong with vba-rerecording itself that is
creating these pops?
Would it be useful to add a command line option to vba-rerecording
that would render a vbm file to a directory of images and a soundfile?
something like
vba-rerecording <rom> --rendermovie <vbm> \
--png-dir=<output> --audio-file=<audio>
This would of course do automatic frame dropping and duplication to
make everything stay in sync.
As always, it's a pleasure to work with this great TAS community.
--Robert