Post subject: Audio Delays & Discrepancies in the Encoding Package
Joined: 10/14/2013
Posts: 335
Location: Australia
Hey fellow encoders & otherwise knowledgeable folk, I was watching a couple of my encodes before deleting the lossless dumps and I noticed a couple of discrepancies in relation to the audio. I did some investigation into the matter and discovered a few other potential problems too. I originally messaged feos directly about this, who suggested bringing it to the forums for open discussion - which was a great idea! - and so here it is. The first issue I noticed is in relation to audio delay. The sync was visibly inconsistent between the youtube encode (venc), the mp4 (neroaacenc) and the mkv (opusenc) when playing back videos (the Youtube video lined up as expected, but the mkv and mp4 played back the samples too early) and so investigation was required. To do some testing, I first created a simple test wav file, loaded it up in AVISynth via (ffms2 for consistency) & Virtualdub and then exported it as wav. I did the same for each of the other test files, to ensure the results were all treated the same way. I also did a FLAC encode as a control for the tests. Here's an archive with all the wav files for reference. My initial results were as follows: "wav_original" was the original file and therefore was perfectly in sync. "flac_control" was my control file and was also perfectly in sync, but cut off at the end (not related to this test, however). "venc_scriptdefault" was out - but not by a lot. The amount was minuscule and not audible in long run so theoretically could be considered in sync. "neroaacenc_scriptdefault_4672s" was out by a fair bit. It played the sounds audibly earlier than they were supposed to sync, which implied that trim value was incorrect. I changed it to 2464s (something I calculated myself) and tested it as "neroaacenc_scriptdefault_2464s" and it appeared to be back in sync. "opusenc_scriptdefault" was out by even more than the previous clip (I noticed it in an encode and this is why I was performing these tests). There didn't appear to be a setting to fix this in the encoder, so instead I took to AVISynth and added the command "DelayAudio(0.07258)". This file was then saved as "opusenc_delayaudio" and appears to have fixed the issue as well. In terms of implementation, I used the following line in AVISynth:
DelayAudio(0.07258)
There's probably a better way to do this, potentially if it's tied to "(i444 == true) ?", although that would mean that global.bat would need to be re-ordered so that i444 was set to true prior to the audio encoding. This way, the delay could be tied specifically to the MKV encodes (which as far as I can tell are the only ones which use opusenc for audio). Because these were made with a simple test clip, I decided to test these findings and values with a few other runs (specifically Paper Mario, as I still had the lossless files) and small handful of SNES and NES runs (as they were fairly quick to dump and encode). My goal here was, to ensure the values stay consistent in any situation. There was a discrepancy regarding MP4 and sample rates in these tests. My original test was for 44100hz, but at 48000hz it yields slightly different results. Both cases however, are far more accurately in sync than when encoding using the original trim value. The variance is slightly less than what venc was out by originally and if we've considered that to be in sync, then this is also acceptable. Of course, this isn't my call to make. As an alternative problem, I encountered two interesting situations on my NES test. I was using Morimoto's SMB3 run (the run that introduced me to the TAS world, so long ago) - probably a poor choice, but I have a soft spot for it and ideally we want this to work in any situation so it'll do (but apologies if any of the issues I mention are exclusive to Famtasia and not an issue elsewhere). Firstly, I noticed an audible click at the start and end of the logo's silence. Changing the following in the script fixed the problem (the only line that was altered was the last line): Before:
d = ImageSource(file=file, start=0, end=int((g.FrameRate * 2) - 1), fps=g.FrameRate) \
   .ConvertToRGB32().AssumeFPS(g.FrameRateNumerator, g.FrameRateDenominator)
e = BlankClip(d, audio_rate=g.AudioRate, channels=g.AudioChannels)
f = AudioDub(d, BlankClip(g)).LanczosResize(g.width, g.height, taps=2)
After:
d = ImageSource(file=file, start=0, end=int((g.FrameRate * 2) - 1), fps=g.FrameRate) \
   .ConvertToRGB32().AssumeFPS(g.FrameRateNumerator, g.FrameRateDenominator)
e = BlankClip(d, audio_rate=g.AudioRate, channels=g.AudioChannels)
f = AudioDub(d, e).LanczosResize(g.width, g.height, taps=2)
To be entirely honest, "e" isn't used elsewhere in that section of the script and looks like it was meant to be used this way all along. This fixes the above issue and keeps the sync identical (so should be able to be used without any adverse effects) and should prevent this problem from occurring in the future. And lastly, venc crashes when given an 8-bit input, such as the one Famtasia produced, and doesn't output audio at all. ConvertAudioTo16bit() was required in this case to ensure it worked properly. I'm not sure on the correct AVISynth syntax, but a check for the "hd" variable (to ensure this tweak is only applied to the Youtube stream, which is the encode that uses venc) along with a second check to see if the audio is 8-bit - and if BOTH are true (so only apply it when it's completely necessary) having it convert the audio to 16 bit might be a useful addition to the script (to avoid potential errors if these circumstances are ever replicated). So I guess there are three points of discussion there - the audio sync, the logo silence and the 8bit>16bit conversion. I've not been a publisher long, so I'd love to see what some of the veterans think about this and the potential solutions!
I'm not as active as I once was, but I can be reached here if I should be needed.
Site Admin, Skilled player (1255)
Joined: 4/17/2010
Posts: 11495
Location: Lake Char­gogg­a­gogg­man­chaugg­a­gogg­chau­bun­a­gung­a­maugg
Warning: When making decisions, I try to collect as much data as possible before actually deciding. I try to abstract away and see the principles behind real world events and people's opinions. I try to generalize them and turn into something clear and reusable. I hate depending on unpredictable and having to make lottery guesses. Any problem can be solved by systems thinking and acting.
Publisher
Joined: 4/23/2009
Posts: 1283
The delays in NeroAAC is there in the very old encoding guide here: http://tasvideos.org/EncodingGuide/PreEncoding.html There is a difference from HE-AAC and AAC. I think this is moot though, as we should be using qaac which has different delays. For qaac, it is 2112 samples for AAC and 5186 samples for HE-AAC. When I went through opus-tools parameters, I didn't see anything for the delay, like you said, but since I knew about the AAC delays, I could have sworn I did testing to make sure it didn't have a delay. I guess I'll revisit that. In the past, we used a square wave to easily tell when there is a delay as shown here: http://tasvideos.org/forum/p/282143#282143 I also do use the param "--padding 0" but that's for metadata. As for oggenc, I thought someone else did the test to see if there is a delay, or I now faintly remember there was a known delay, but as you said, too small to care. PS: I highly recommend not using ffms2, which has terrible video syncing, (unsure in audio). At least it is consistent in being bad in video, I guess. PPS: This may help: https://en.wikipedia.org/wiki/Gapless_playback Edit: Well due to the notes in the Wiki link, I found the flaw in my testing. Since I used opusdec to decode the opus file, it took account the gapless support on the ogg level auto removing it. Once I mux the opus file to a MKV and then output to WAV with ffmpeg, I do indeed see a delay. I'll do more research and check what the exact delay is. Edit 2: For my test file of 44100 kHz with ~64 kbit/s opus file, I got a audio delay of 286 samples. This is ~6ms longer than the original that must be trimmed off. So it would have been a negative delay in AviSynth, but the recommend method is use SoX to remove the exact sample amount. I'll do more test in 48 kHz. Edit 3: Good job to thecoreyburton for noticing this!
Joined: 10/14/2013
Posts: 335
Location: Australia
I appreciate you looking into this for me! I'm still seeing (both when compared to an a accompanying video & in waveform lineups in Audacity) the opus stream playing significantly earlier than the original. Without the trim of the 286 samples, it's about ~75ms early (just an estimate). This was using opus-tools 0.1.10.
I'm not as active as I once was, but I can be reached here if I should be needed.
Publisher
Joined: 4/23/2009
Posts: 1283
I'm using the same version. Can you post the source WAV somewhere?
Joined: 10/14/2013
Posts: 335
Location: Australia
Edit: Link removed.
I'm not as active as I once was, but I can be reached here if I should be needed.
Publisher
Joined: 4/23/2009
Posts: 1283
I still think there is something wrong with your test method, as opus can't be cut off samples or else gapless playback is impossible. I just did one test, and opus again added 287 samples. I'll finish the others and update this. Edit: Okay, I've finished testing. All 44.1 kHz samples had a 287 sample delay and the only 48 kHz sample had a 312 sample delay. If you convert both to seconds, it's about 6.5 ms. So in general, I think you always need to cut off 6.5 ms.
Joined: 10/14/2013
Posts: 335
Location: Australia
"./programs/avs2pipemod" -wav encode.avs | "./programs/sox" -t wav - -t wav - trim 2112s | "./programs/qaac64" --quality 0.25 - -o "./temp/audio.mp4"
"./programs/avs2pipemod" -wav encode.avs | "./programs/sox" -t wav - -t wav - trim 0.0065 | "./programs/opusenc" --bitrate 64 - "./temp/audio.opus"
I just used those two command lines (and the existing YT one) and got timings which were perfect on a series of 44100hz and 48000hz files. It's all fixed and I'm satisfied with that. Thanks again for the time you put in to testing this Aktan!
I'm not as active as I once was, but I can be reached here if I should be needed.
Publisher
Joined: 4/23/2009
Posts: 1283
After chatting with thecoreyburton, we found out that the delay for HE-AAC for qaac was wrong. So I went and check (almost) all the samples and found out it is missing 957 samples all the time (didn't check 1 44.1 kHz sample). I now think the param for qaac should be:
-v 0 --he -q 2 --delay 957s --threading --no-smart-padding
The "--no-delay" param could not be used since it only works in AAC-LC. This is a big change vs what I had before, so I'm not sure if it is due to iTunes or qaac changing it. I will list the versions of each I am using. qaac Version 2.64 Apple Application Support (64-bit) Version 5.6
Joined: 10/14/2013
Posts: 335
Location: Australia
Well, this is interesting. I'm getting 80.5ms of delay on the output files in contrast with the original - that is, at 32khz, 44.1khz and 48khz the aac files seem to have a delay of roughly 80.5ms when done that way. I think what bothers me about this is the fact that with the 957 samples cut, all the output files line up correctly together - implying your findings are correct. For testing purposes, I'm using this batch file to quickly encode and decode wav tests to aac and back:
qaac64 -v 0 --he -q 2 --delay 957s --threading --no-smart-padding 32.wav -o 32-aac.mp4
qaac64 -v 0 --he -q 2 --delay 957s --threading --no-smart-padding 44.wav -o 44-aac.mp4
qaac64 -v 0 --he -q 2 --delay 957s --threading --no-smart-padding 48.wav -o 48-aac.mp4
ffmpeg -i 32-aac.mp4 32-output.wav
ffmpeg -i 44-aac.mp4 44-output.wav
ffmpeg -i 48-aac.mp4 48-output.wav
My software versions are identical to the ones you posted.
I'm not as active as I once was, but I can be reached here if I should be needed.
Publisher
Joined: 4/23/2009
Posts: 1283
It's not a cut, it's an addition. qaac, at least the version I currently have, which is relatively new, HE-AAC actually removes samples from the original source, so you need to add samples. It doesn't make sense since gapless playback wouldn't work at all, but maybe HE-AAC isn't meant for gapless? I have no idea.
Joined: 10/14/2013
Posts: 335
Location: Australia
Apologies, there was a flaw in my files. Everything was done at the right sample rate but I forgot to change the export settings and in the end they were all at 44100hz. I fixed the files and now they're out by varying amounts (like I'd expect). Here's a link to my test files, batch file and output files so you can see what's going on.
I'm not as active as I once was, but I can be reached here if I should be needed.
Joined: 10/14/2013
Posts: 335
Location: Australia
Just as an update for anyone else following this, Aktan and I are regularly chatting about it still. Aktan noticed there are inconsistencies in the delay regarding aac before and after it's muxed (delays which vary depending on muxer, also) and that still needs to be worked out and then trialed.
I'm not as active as I once was, but I can be reached here if I should be needed.
Publisher
Joined: 4/23/2009
Posts: 1283
Okay, I did more testing. If I encode to AAC using qaac, and then mux to MP4 using MP4Box, the delay is 5187 samples (very close to the original number I said for HE-AAC of 5186 samples!). I think we went on a wild goose chase when it was all fine all this time >_>.
Joined: 10/14/2013
Posts: 335
Location: Australia
Whoops! Well, at least now we can be sure we were thorough about this in the future.
"./programs/avs2pipemod" -wav encode.avs | "./programs/qaac64" -v 0 --he -q 2 --delay -5187s --threading --no-smart-padding - -o "./temp/audio.mp4"
"./programs/avs2pipemod" -wav encode.avs | "./programs/sox" -t wav - -t wav - trim 0.0065 | "./programs/opusenc" --bitrate 64 - "./temp/audio.opus"
Do these commands look alright to you, Aktan?
I'm not as active as I once was, but I can be reached here if I should be needed.
Publisher
Joined: 4/23/2009
Posts: 1283
thecoreyburton wrote:
Whoops! Well, at least now we can be sure we were thorough about this in the future.
"./programs/avs2pipemod" -wav encode.avs | "./programs/qaac64" -v 0 --he -q 2 --delay -5187s --threading --no-smart-padding - -o "./temp/audio.mp4"
"./programs/avs2pipemod" -wav encode.avs | "./programs/sox" -t wav - -t wav - trim 0.0065 | "./programs/opusenc" --bitrate 64 --padding 0 - "./temp/audio.opus"
Yep except add --padding 0 to the opus line.
Site Admin, Skilled player (1255)
Joined: 4/17/2010
Posts: 11495
Location: Lake Char­gogg­a­gogg­man­chaugg­a­gogg­chau­bun­a­gung­a­maugg
With quaac Corey sent me, and with this build: https://www.videohelp.com/software/qaac
G:\Encode\TASEncodingPackage>"./programs/avs2pipemod" -wav encode.avs | "./programs/qaac64" -v 0 --he -q 2 --delay -5187s --threading --no-smart-padding - -o "./temp/audio.mp4"
ERROR: CoreAudioToolbox.dll: Не найден указанный модуль.
avs2pipemod[info]: writing 1455.376 seconds of 44100 Hz, 2 channel audio.
avs2pipemod[info]: finished, wrote 0.023 seconds [0%].
avs2pipemod[info]: total elapsed time is 0.000 sec.
avs2pipemod[error]: only wrote 1012 of 64182103 samples.
qaac is a command line AAC/ALAC encoder frontend based on Apple encoder. Since 1.00, qaac directly uses CoreAudioToolbox.dll. Therefore, QuickTime installation is no more required. However, Apple Application Support is required. AAC-LC, AAC-HE, ALAC encoding are supported.
Can we avoid actually installing this thing and just have an exe or dll portably?
Warning: When making decisions, I try to collect as much data as possible before actually deciding. I try to abstract away and see the principles behind real world events and people's opinions. I try to generalize them and turn into something clear and reusable. I hate depending on unpredictable and having to make lottery guesses. Any problem can be solved by systems thinking and acting.
Site Admin, Skilled player (1255)
Joined: 4/17/2010
Posts: 11495
Location: Lake Char­gogg­a­gogg­man­chaugg­a­gogg­chau­bun­a­gung­a­maugg
Alright so I switched to qaac in the package, and added all the dependencies righ there. Test!!! Make sure you don't have iTunes/QuickTime installed, since it'd invalidate the portability I'm trying to get here. https://github.com/TASVideos/TASEncodingPackage/archive/x64.zip
Warning: When making decisions, I try to collect as much data as possible before actually deciding. I try to abstract away and see the principles behind real world events and people's opinions. I try to generalize them and turn into something clear and reusable. I hate depending on unpredictable and having to make lottery guesses. Any problem can be solved by systems thinking and acting.
Joined: 10/14/2013
Posts: 335
Location: Australia
As far as I can tell it's all good - at least on my machines! It's worth noting that iTunes was removed prior to the testing. Whilst I don't think think that this would affect the testing process (and removing the newly included files caused qaac to fail as expected) there could still be some interference if there were unforeseen components left over.
I'm not as active as I once was, but I can be reached here if I should be needed.
Site Admin, Skilled player (1255)
Joined: 4/17/2010
Posts: 11495
Location: Lake Char­gogg­a­gogg­man­chaugg­a­gogg­chau­bun­a­gung­a­maugg
The only question is, are we even allowed to distribute their files like this?
Warning: When making decisions, I try to collect as much data as possible before actually deciding. I try to abstract away and see the principles behind real world events and people's opinions. I try to generalize them and turn into something clear and reusable. I hate depending on unpredictable and having to make lottery guesses. Any problem can be solved by systems thinking and acting.
Joined: 10/14/2013
Posts: 335
Location: Australia
I'm not sure, that's something that's definitely beyond my field of expertise. If not though, it could probably be re-organized to have a prereq installer of some sort that grabs the appropriate stuff, perhaps.
I'm not as active as I once was, but I can be reached here if I should be needed.