I've been experimenting with
[video] forcerate on Worms Armageddon with high refresh rates (using its DirectDraw 8-bit hardware renderer, for much better emulation speed).
At first, I was trying to make it work well only by increasing
[cpu] cycles. This did not work very well; at 120 fps, there was always lag no matter how high I pushed the cycles. There were three main kinds of lag:
- A lag spike during the fade-in at the beginning of a level, in which many game engine frames were skipped. Not only is this unaesthetic, but it breaks level TASes in which the player's worm's turn starts very early, because some of the needed input would've been on the skipped frames.
- A game engine frame only got 1 emulator frame, making it impossible to release and press the same key that had been pressed on the previous game frame.
- A game engine frame seemed to get 2 emulator frames, but the second one had the same timestamp as the first, and input was only polled on the first and ignored on the second.
- A game engine frame would be dropped entirely.
In WA these can be patched over by pressing Escape to pause time, then pressing Escape again later, swallowing up the lag harmlessly — but this makes the TAS longer, especially for type #1. (The replay files saved by the emulated WA onto its .hdd image will still be just as optimal using this method.)
The test at 300 fps went better; lag types #2, #3, and #4 were eliminated entirely (presumably because 300 is a perfect multiple of the engine framerate of 50 fps), leaving only type #1, and only in some levels. But that was still annoying. Worse, though, there were some occasional double-buffering flicker glitches, and
cycles had to be set so high that it made the loading screens be emulated very slowly (paradoxically – though in the movie itself the loading would be faster due to the increased cycles).
In both of these tests, increasing
cycles didn't make the in-game emulation (i.e. the levels themselves) slower, seemingly due to Vsync letting the emulated CPU idle for most of the time.
But now I've found that DOSBox-X actually does
not detect Vsync-waiting loops and give them an idleness bonus. The only reason this actually happens is
[dosbox] iodelay having a value of about 1000 ns by default. Thus a Vsync-waiting loop will spend most of its time sleeping that 1000 ns every time it polls port 0x3DA.
I've found a fairly good solution to make high refresh rate work well: decreasing some of the I/O delays. It's not practical to decrease
iodelay, because then emulation will be much slower as it will be spending less of its time sleeping during port
0x3DA polls. (The produced movie would be faster though.) But the 16-bit and 32-bit delays can be decreased way down harmlessly, making loading screens go by faster and getting rid of the double-buffering flicker artifacts.
(Oh, and
[video] vesa set display vsync = 1 had no effect when I tried it.)
Here's what is now working quite well in my tests, with all types of lag and flicker eliminated. Showing just the relevant portions:
[cpu]
core = dynamic_x86
[video]
machine = svga_s3vision968
forcerate = 300
[dosbox]
iodelay = 1000
iodelay16 = 5
iodelay32 = 5
irq delay ns = 5
Edit: Lowering it to
iodelay = 300 isn't too bad... I still get about 1/14 realtime speed that way, relative to getting about 1/11 realtime speed with
iodelay = 1000.
I've also added a couple of other things to my .conf to try to further improve speed, though I'm not sure how much of an effect (if any) they have – I haven't tried isolating them yet, since each test with WA takes a lot of time:
[cpu]
interruptible rep string op = 0
[dos]
unmask timer on disk io = false
And
hard drive data rate limit = 0 didn't need to be added, since it's already in BizHawk's machine preset.
But it seems impossible to make 300 fps perfect. Sometimes it really does render smooth 300 fps, but a lot of the time its frames are doubled, providing something effectively more like 150 fps. I tried setting
[cpu] rdtsc rate, but it didn't help at all.
Also note that before I tried changing these iodelay settings, I found that just increasing "cycles" alone had severely diminishing returns on improving loading/booting times. The total time spent emulating a loading screen would keep increasing dramatically, only to provide small benefits in terms of reduction of total emulated-machine time taken by the loading screen. This didn't make sense to me, as I expected the total time spent emulating a loading screen should be fairly constant, and increasing "cycles" should just change how many emulator frames it should take. But now with the iodelays decreased way down, it is much closer to making sense in that way, and the total time spent emulating the loading screens is dramatically reduced.
It's also worth noting that I tried setting BizHawk's FPS numerator and denominator to zero to make them automatic, and it did
not set them to be the same as the
forcerate value; in that test the input FPS just came out being 60 fps. So it seems that both values must always be set explictly.
I'm pretty sure that BizHawk's Machine Presets would probably be much more realistic if they incorporated changes to the I/O delays. Though again, it's not practical to decrease the 8-bit one if Vsync loop idling efficiency is important.
It would be nice if DOSBox-X added an iodelay setting just for port 0x3DA.