Lua internals
Alright, lets start from some info about Lua internals.
As for many other script languages, each mention of some variable in some Lua expression is actually taking time to get its value. For example, if I write following code:
Language: lua
for i = 0, 10 do
print(i)
end
It takes some time to get value of
i and pass it into print. More important that here
print is also variable. It's so called
global variable. Here it's storing API function provided by environment. So, getting
print function is also taking time. Again, more important, that taking here value of
print is much slower than taking value of
i. Because
i here is
local variable.
Local variables is working faster than global variables, mostly because local variables doesn't have name in runtime, instead of that, they have indexes. And all actions "take local variable value" is just getting value by index, which is fast. On the other hand, global variables are stored in
_G table. It's global table for all global variables. And getting some global variable is essentially taking value from table by its name.
So following code is same:
Language: lua
_G["print"]("qwe")
print("qwe")
And because indexing by string is more harder task, including hash calculations if it's using hashtables as implementation, or other stuff - it works much slower than just getting value by index. This means that this stupid-looking code is actually works faster:
Language: lua
local print = print -- store global value into local
for i = 0, 10 do print(i) end -- use local variable
Now, a bit about
print itself. As I said, it's API function provided by environment.
Each time you call environment function, Lua doing following:
Lua->API->Environment
1) Lua calls API function provided by environment
2) API function pulling some info about: arguments count, their types
3) API function does some sanity checks about call correctness
4) API function does actual work with environment internals
Lua<-API<-Environment
5) API function is taking results from environment
6) API function is pushing results to Lua
7) Lua continues execution of Lua code
So, basically each environment API call consists of two translations: from Lua to Environment and backwards. Even though you can't avoid calling API, you may try to reduce count of calls, to reduce unneccessary calls to improve speed.
Case
We want to draw collision data in 2D platformer. In this post I'll talk mostly about this case.
What do I mean by "collision data"? Simple: if some object is changing behavior when it's touched <- it's "collision data". As for now, I'll talk only about tile based 16x16 pixel collision data.
Alright. We have 16x16 collision blocks, they have pixels, each value of pixel may have different meaning, like: ground, ceil, wall, ladder, and so on. It's very handy to draw overlay over game screen where each collision pixel is colored by its color. But how code would look like? It would be something like this:
Language: lua
for x = ?, ? do -- here ? is some expression calculating bounds for x
for y = ?, ? do -- here ? is some expression calculating bounds for y
local tile = get_collision_tile(x, y) -- to avoid finding corresponding tile 16*16=256 times
for i = 0, 15 do -- 16 by y
for j = 0, 15 do -- 16 by x
local v = get_collision_data(tile, i, j) -- get collision pixel
if colors[v] ~= nil then -- if it has color, then you want to draw here
gui.drawPixel(x+i-scrollx, y+j-scrolly, colors[v]) -- scrollx, scrolly is camera position
end
end
end
end
end
If
colors[v] is
nil this means that
v value doesn't affect behavior at all, in other words: nothing to draw. For example you should have
v for air, and you'll set
colors[air] =
nil, to avoid draw every space where you can walk freely.
(Demolition Man for Sega Genesis)
This code above is actually already optimized. It'll only translate values to colors and position relative to tile into global position. Nothing to optimize, right? No!
First optimization that is often done is: process only tiles that is on screen. It's done by more carefully cooking ranges of first two for loops. Example is something like that:
Language: lua
local left,right,top,bottom = scrollx,scrollx+320,scrolly,scrolly+224
for y = max(0,floor(top/256)), min(hblocks-1,floor((bottom-1)/256)) do
for x = max(0,floor(left/512)), min(wblocks-1,floor((right-1)/512)) do
I don't want to explain it in details, but idea is simple.
x and
y should take values of block if and only if some part of it is visible on the screen. For 320x224 screen as in my case it will process at most 21x15=315 blocks. Also, for each tile we can calculate
sx,
sy =
x-
scrollx,
y-
scrolly and use them in
gui.drawPixel call reducing few operations: get value of
scrollx,
scrolly, and two subtractions.
Now, what is next? Next is what is all lua internals part was about. What API environment function do we see here in code above? It's
gui.drawPixel. It should be something about it. Nope. (actually yes but... later)
We are calling
get_collision_tile and
get_collision_data. But wait, it's not API environment functions. It's our custom made functions for reading game data. Yep. But what it has inside? It has multiple calls
memory.read_something calls, each of them is environment API call. First, you should make local variable for them, to avoid looking for them in global table
_G every time you want to read something from memory.
Time to remind our goal: we want to reduce count of API calls. Here API calls are
memory.read_something. How can we get rid of them? Answer is simple: read level only once, and store its whole structure on Lua side! But how can we do that? Just each time when you want to draw collision overlay, check for current level. If it changed, forget all data about previous level, and load whole level into Lua again. It'll require time only when you change level. After that, code of
get_collision_tile and
get_collision_data should not have any API calls.
After you did that, you may think: wait, why do we always convert collision pixel value(
v) into its color? (
colors[v]) Yes, now we can "bake in" collision color instead of collision value straight into collision tile data!
What's next? You may notice that many of collision tiles are just empty. Full of air. (wait, you said empty!) So, you don't want to spend time on those tiles. You may make some table of empty tiles and before processing 16x16 pixels check is it in list of empty. If it is, then skip processing. Note: when I say "check in list" - I don't actually mean that you should have it as some kind of list. Best way to implement that is just one big table, and if value at index of tile is true then it is empty. So, code is just
Language: lua
if empty_tiles[tile_id] then continue end
You may came up with some similar ideas, like store table of full_somevalue tiles, and so on...
Well, it's way long introduction to actual new approach.
Baking time!
What if we would be able to make list of calls for each collision tile? For example, if it has 16 non-empty pixels, why not to store function with its arguments into list, and then just execute it? Fortunately, we able to do that! Like this:
Language: lua
calls_list = {
{gui.drawPixel, 5, 7, 0xFFFF00FF},
{gui.drawPixel, 10, 10, 0xFF00FFFF}
}
for i = 1, #calls_list do
local b = calls_list[i] -- single list entry
b[1](x + b[2], y + b[3], b[4]) -- where x, y is shift to align with camera position.
end
Using this technique, you may bake all collision tiles into lists of calls, during level loading on Lua side.
Now we have solution for making and running call lists. How about optimizing call lists?
What else do we have besides
gui.drawPixel? We have
gui.drawLine! Can we utilize it? Of course we can!
You may think of
gui.drawLine as batch of
gui.drawPixel with same color. So, if we able to split collision pixels into bunch of lines, then we can draw them instead. It will reduce count of API calls and thus, it'll improve performance of script.
How can we do that? For example like this. If color of previous line is same as next pixel, and there is no gap in between - just increase length of line. Here is a picture:
https://i.imgur.com/k7Xcw9e.gifv
Also, good implementations of
drawLine method is doing sanity checks only in the beginning, and drawing all pixels straight ahead. This is done using so called "clipping". Original line reduced to segment of it that is on screen, and only pixels of that segment is efficiently enumerated, without checks for leaving bounds. On the other hand, each call of
drawPixel does check weather position is on screen or out of bounds. But
drawLine does it once for the whole line.
Next, you may think of
gui.drawBox as batch run of
gui.drawLine. Can we utilize it? Of course we can! Similarly to lines, we will look for previous line, and if it has appropriate starting and ending position, then we will convert it into box. Here is a picture:
https://i.imgur.com/PxeTKei.gifv
Similarly, good implementation of
drawBox does coordinates check only once. It's done by "clipping" - transforming coords passed to the method into new coords - part of box that is on screen. And no additional checks required during fill.
Now we have a small trouble.
gui.drawPixel,
gui.drawLine,
gui.drawBox have different arguments. To handle variable count of arguments for functions, you would need some workarounds. You may want to use
unpack function, but you need to add camera shift to coordinates, so it's not an elegant way. You may check vs function that you want to call, and then call it as intended. You may make supplementary functions that use always four coordinates and then calling API function, but this will lead to unnecessary addition for coords that is not used.
gui.drawPixel for example. Also,
gui.drawBox has two colors: fill and outline.
Checks like that, intermediate functions, and other stuff may reduce gain. Perhaps we can "bake" even more?
We need to bake more!
Even though Lua is interpreter, each time you include some file using
require, it's actually compiling code that is loaded in some kind of intermediate language code. This step is required to have decent performance. Otherwise you'll end up spending most of the time parsing each line over and over again.
This means, in theory we can generate .lua files and then load them using
require, with all tiles recompiled into functions, and then simply call that functions. But in practice, there is better function. There is
loadstring. It is loading code pretty much like
require does, but is loading code straight from the string! Also, it doesn't check "was it loaded already?" which is annoying part if you would use
require because you would have to always generate new name for generated file.
That's it! Just generate Lua code for each collision tile, and then simply call it like this:
Language: lua
cdata[idx1](sx,sy) -- shift x, shift y to align with camera
And, to make functions work fast, you should make local variables for functions, like this:
Language: lua
local gui_box, gui_line, gui_pixel = gui.drawBox, gui.drawLine, gui.drawPixel
So, for the following picture:
https://i.imgur.com/4kmJbBK.gifv
Code is following:
Language: lua
local gui_box, gui_line, gui_pixel = gui.drawBox, gui.drawLine, gui.drawPixel
return function (x,y)
gui_pixel(x+0,y+0,16777113)
gui_box(x+1,y+0,x+2,y+1,16777113,16777113)
gui_pixel(x+2,y+2,16777113)
gui_box(x+3,y+0,x+4,y+3,16777113,16777113)
gui_pixel(x+4,y+4,16777113)
gui_box(x+5,y+0,x+6,y+5,16777113,16777113)
gui_pixel(x+6,y+6,16777113)
gui_box(x+7,y+0,x+8,y+7,16777113,16777113)
gui_pixel(x+8,y+8,16777113)
gui_box(x+9,y+0,x+10,y+9,16777113,16777113)
gui_pixel(x+10,y+10,16777113)
gui_box(x+11,y+0,x+12,y+11,16777113,16777113)
gui_pixel(x+12,y+12,16777113)
gui_box(x+13,y+0,x+14,y+13,16777113,16777113)
gui_pixel(x+14,y+14,16777113)
gui_line(x+15,y+0,x+15,y+15,16777113)
end
So, whole problem is reduced to following task: how to split picture into minimum count of axis aligned rectangular areas of same color?
Depending on your solution of this task, you may gain more speed.
In the end, if you have way to draw pictures, you may "bake" each collision tile into a picture, and then draw it instead. But solution discussed above works even if you can't draw picture, or if it is slow.
Ahhh. Here is part of code that is actually implementation of algorithm above:
https://gist.github.com/realmonster/93090d4ef0b40c1b88d62f8840fad675/5310f3bf8463ae6e0a766e4608f268fd28762184#file-demolition-man-lua-L107
And, according to pictures above, atm it has some bug with building blocks not as intended. Anyway, it works.