Post subject: lots of gui.drawBox calls seems to slow framerate
MetalMachine
He/Him
Joined: 5/22/2018
Posts: 2
Location: Phoenix, AZ, USA
Hello, I have recently written a lua script to visualize what is going on off-screen in NES Metroid. I've written it in such a way that it runs on both FCEUX and BizHawk. This script works great in FCEUX, but when I run it in BizHawk, it brings the framerate down to anywhere from 20 to 40 FPS. I'm assuming it's due to the large number of calls to gui.drawBox I am making (in the ballpark of 500 to 1000 per frame) however I'm quite surprised at the poor performance. Like I said, this exact same script works great on FCEUX (doesn't slow the performance). Am I doing something wrong? Here is a link to the script: https://www.dropbox.com/s/x8jdhu3rfctfa7g/metroid-vis-v3.lua Thanks for any help!
Judge, Skilled player (1289)
Joined: 9/12/2016
Posts: 1645
Location: Italy
Do you have a single core processor? Maybe BizHawk executes the NES core and the Lua core as separate threads, which would require at least two (maybe three) independent CPUs for the asynchronous process execution.
my personal page - my YouTube channel - my GitHub - my Discord: thunderaxe31 <Masterjun> if you look at the "NES" in a weird angle, it actually clearly says "GBA"
Editor, Experienced player (818)
Joined: 5/2/2015
Posts: 671
Location: France
A possible tradeoff would be to use a LuaCanvas to draw to another window. It runs faster on Bizhawk, however you lose the overlay. I've made a quick example with your script here: http://tasvideos.org/userfiles/info/68775810272403449
Site Admin, Skilled player (1237)
Joined: 4/17/2010
Posts: 11275
Location: RU
Every function that involves calling the C# side of hawk from Lua will be slower than working with just Lua. In FCEUX it's fast, because FCEUX binds lua in the most straightforward way: C++ <-> C. LuaInterface that's used in hawk involves a freaking ton of overhead coming from reflection (because why not) and then hawk's C# itself. If you want it to be fast in hawk, reduce the number of such calls, or create an APIHawk tool instead.
Warning: When making decisions, I try to collect as much data as possible before actually deciding. I try to abstract away and see the principles behind real world events and people's opinions. I try to generalize them and turn into something clear and reusable. I hate depending on unpredictable and having to make lottery guesses. Any problem can be solved by systems thinking and acting.
MetalMachine
He/Him
Joined: 5/22/2018
Posts: 2
Location: Phoenix, AZ, USA
Alright. I figured it was something about the implementation rather than the script itself, but wanted to make sure. Thanks for the suggestions! I'll see what I can do.
Editor, Player (182)
Joined: 4/7/2015
Posts: 330
Location: Porto Alegre, RS, Brazil
I edited the script to implement some optmizations I use for similar scripts I made. Mainly, I see 3 main changes you can make: -1) Avoid drawing outside the emu screen, because you won't see it anyway; -2) Avoid reading memory in sequence inside for loops, by reading the whole memory region of interest in one call and only when some address of this region changes; -3) Drawing polygons merging similar tiles, instead of multiple boxes. For optmization 1), I simply did a check if px or py is inside the emu screen:
Language: lua

if px + 4 > 0 and px <256> 0 and py < 232 then -- draw tiles end
For optmization 2), I changed the readbytes of the nametable loop by simply getting the value from a table previously loaded. This table, nametable_tiles, is loaded with readbyterange, with a special conversion for FCEUX:
Language: lua

local nametable_tiles = memory.readbyterange(0x6000, 0x7C0) if (FCEU ~= nil) then nametable_tiles = {string.byte(nametable_tiles, 1, 0x7C0)} -- FCEUX readbyterange returns a byte string, this is a convertion to table end
This table is updated only when the hash of this region changes. BizHawk already has a memory hash function, which is fast. FCEUX tho doesn't, so my workaround is simply concatenation the whole nametable table into a single string, which will definitely be unique for that nametable configuration, so I can still compare:
Language: lua

local nametable_hash, nametable_hash_prev local function Get_nametable_hash() if (FCEU ~= nil) then nametable_hash = table.concat({string.byte(memory.readbyterange(0x6000, 0x7C0), 1, 0x7C0)}) -- concatenating the whole table serves the purpose of a hash for comparison elseif (bizstring ~= nil) then nametable_hash = memory.hash_region(0x6000, 0x7C0) -- memory hash is BizHawk only end end
In order to check the hash change, every frame I compare the current hash with the previous frame hash:
Language: lua

if nametable_hash ~= nametable_hash_prev then nametable_tiles = memory.readbyterange(0x6000, 0x7C0) if (FCEU ~= nil) then nametable_tiles = {string.byte(nametable_tiles, 1, 0x7C0)} -- FCEUX readbyterange returns a byte string, this is a convertion to table end end
I didn't do optmization 3), because I forgot how it's done, I never did but I know jlun or ThunderAxe did for Wario Land, it should be simple. Anyway, just by implementing 1) and 2), the FPS for the first room of the game jumped from 30 to 55.
Games are basically math with a visual representation of this math, that's why I make the scripts, to re-see games as math. My things: YouTube, GitHub, Pastebin, Twitter