Hey fellow encoders & otherwise knowledgeable folk,
I was watching a couple of my encodes before deleting the lossless dumps and I noticed a couple of discrepancies in relation to the audio. I did some investigation into the matter and discovered a few other potential problems too. I originally messaged feos directly about this, who suggested bringing it to the forums for open discussion - which was a great idea! - and so here it is.
The first issue I noticed is in relation to audio delay. The sync was visibly inconsistent between the youtube encode (venc), the mp4 (neroaacenc) and the mkv (opusenc) when playing back videos (the Youtube video lined up as expected, but the mkv and mp4 played back the samples too early) and so investigation was required. To do some testing, I first created a simple test wav file, loaded it up in AVISynth via (ffms2 for consistency) & Virtualdub and then exported it as wav. I did the same for each of the other test files, to ensure the results were all treated the same way. I also did a FLAC encode as a control for the tests.
Here's an archive with all the wav files for reference.
My initial results were as follows:
"wav_original" was the original file and therefore was perfectly in sync.
"flac_control" was my control file and was also perfectly in sync, but cut off at the end (not related to this test, however).
"venc_scriptdefault" was out - but not by a lot. The amount was minuscule and not audible in long run so theoretically could be considered in sync.
"neroaacenc_scriptdefault_4672s" was out by a fair bit. It played the sounds audibly earlier than they were supposed to sync, which implied that trim value was incorrect. I changed it to 2464s (something I calculated myself) and tested it as "neroaacenc_scriptdefault_2464s" and it appeared to be back in sync.
"opusenc_scriptdefault" was out by even more than the previous clip (I noticed it in an encode and this is why I was performing these tests). There didn't appear to be a setting to fix this in the encoder, so instead I took to AVISynth and added the command "DelayAudio(0.07258)". This file was then saved as "opusenc_delayaudio" and appears to have fixed the issue as well. In terms of implementation, I used the following line in AVISynth:
DelayAudio(0.07258)
There's probably a better way to do this, potentially if it's tied to "(i444 == true) ?", although that would mean that global.bat would need to be re-ordered so that i444 was set to true prior to the audio encoding. This way, the delay could be tied specifically to the MKV encodes (which as far as I can tell are the only ones which use opusenc for audio).
Because these were made with a simple test clip, I decided to test these findings and values with a few other runs (specifically Paper Mario, as I still had the lossless files) and small handful of SNES and NES runs (as they were fairly quick to dump and encode). My goal here was, to ensure the values stay consistent in any situation. There was a discrepancy regarding MP4 and sample rates in these tests. My original test was for 44100hz, but at 48000hz it yields slightly different results. Both cases however, are far more accurately in sync than when encoding using the original trim value. The variance is slightly less than what venc was out by originally and if we've considered that to be in sync, then this is also acceptable. Of course, this isn't my call to make.
As an alternative problem, I encountered two interesting situations on my NES test. I was using Morimoto's SMB3 run (the run that introduced me to the TAS world, so long ago) - probably a poor choice, but I have a soft spot for it and ideally we want this to work in any situation so it'll do (but apologies if any of the issues I mention are exclusive to Famtasia and not an issue elsewhere). Firstly, I noticed an audible click at the start and end of the logo's silence. Changing the following in the script fixed the problem (the only line that was altered was the last line):
Before:
d = ImageSource(file=file, start=0, end=int((g.FrameRate * 2) - 1), fps=g.FrameRate) \
.ConvertToRGB32().AssumeFPS(g.FrameRateNumerator, g.FrameRateDenominator)
e = BlankClip(d, audio_rate=g.AudioRate, channels=g.AudioChannels)
f = AudioDub(d, BlankClip(g)).LanczosResize(g.width, g.height, taps=2)
After:
d = ImageSource(file=file, start=0, end=int((g.FrameRate * 2) - 1), fps=g.FrameRate) \
.ConvertToRGB32().AssumeFPS(g.FrameRateNumerator, g.FrameRateDenominator)
e = BlankClip(d, audio_rate=g.AudioRate, channels=g.AudioChannels)
f = AudioDub(d, e).LanczosResize(g.width, g.height, taps=2)
To be entirely honest, "e" isn't used elsewhere in that section of the script and looks like it was meant to be used this way all along. This fixes the above issue and keeps the sync identical (so should be able to be used without any adverse effects) and should prevent this problem from occurring in the future.
And lastly, venc crashes when given an 8-bit input, such as the one Famtasia produced, and doesn't output audio at all. ConvertAudioTo16bit() was required in this case to ensure it worked properly. I'm not sure on the correct AVISynth syntax, but a check for the "hd" variable (to ensure this tweak is only applied to the Youtube stream, which is the encode that uses venc) along with a second check to see if the audio is 8-bit - and if BOTH are true (so only apply it when it's completely necessary) having it convert the audio to 16 bit might be a useful addition to the script (to avoid potential errors if these circumstances are ever replicated).
So I guess there are three points of discussion there - the audio sync, the logo silence and the 8bit>16bit conversion. I've not been a publisher long, so I'd love to see what some of the veterans think about this and the potential solutions!