Lua Performance Optimization and New Serializer and Compressor
SafeteeWoW opened this issue ยท 9 comments
Before next expansion, I will release two standalone libraries. One serializer and one compressor, which generates significantly smaller data than the currently commonly used AceSerializer and LibCompress.
-
The planned name of those libraries:
- RCSerializer
Not sure. Should I call this RC? Less readable than AceSerializer, but produce much smaller data by string reuse and several other small improvements. I was planning to implement a special optimization targeting item string, but it's not very needed with the help of a compressor, and it's not good for an general library. - LibDeflate
(Or call this RCCompressor? )A compressor uses DEFLATE algorithm. (See RFC1951). A litle bit slower, but provides much smaller data, especially if there are lots of redundancies.
- RCSerializer
-
The perfomance of LibDeflate
- The overhead of LibDeflate is of course bigger than LibCompress. So small data suitable within one WoW message (255 bytes) including header will not be compressed, to save CPU time. (Storing 5 extra bytes as header, without actual compressing, still following DEFLATE spec)
- This library is unfinished. I haven't implement decompressor. The correctness of compressor part is checked by the help of an external decompressor.
smalltest.txt
This a reconnect data serialized by RCSerializer. The original size is 28459 bytes.
LibCompress: 37ms, 18595 bytes, 65.3% of original
LibDeflate: 73ms, 10872 bytes. 38% of original, 57% of LibCompress.
gzip -1: 40ms(including loading and IO time, so not that useful), 7539 bytes
gzip -1: 44ms(including loading and IO time, so not that useful), 5873 bytes.
There are still lots of room to improve compression ratio and speed.
-
The performance of RCSerialzer was mentioned in #116, and I have another good idea to reduce the serialized size of nested table.
{{link=A, bagged=B}, {link=C, bagged=D}} will be serialized similar to { link, bagged, {A, B}, {C, D}}
- Suggestion to the name of those libraries? (Haven't pushed the code. It's too unfinished.)
- Opinion to the performance.
- Can you provide any more test data?
If you have time, can you help to compare the performance of the following IN-GAME?
All of the following needs to be executed many times to make a difference, but there is difference
- bit.rshift(x, 8) VS (x-x%256)/256 (Dont forget to localize bit.rshift)
- bit.lshift(x, 8) VS x*256 (Don't forget to localize bit.lshift)
- bit.band(x, 255) VS x%256 (Don't forget to localize bit.band)
- table.insert(t, x) VS t[#t+1]=x. Dont forget to localize table.insert
My current observation is that the performance of function is always worse than its non-function equivalence. - string.char(x) vs byteToCharTable(x), where byteToCharTable is a pre-generated table such that byteToCharTable[i] == string.char(i), for i=0..255, Dont forget to localize string.char
https://github.com/SafeteeWoW/LibDeflate
Only compression has been implemented and it's unfinished.
I have improved the performance to 41ms (better than LibCompress).
a good wiki page for lua performance tips
https://springrts.com/wiki/Lua_Performance
I am close to finish the compressor side of the LibDeflate, and has made significant improvement.
Still Use the above file as the example https://github.com/evil-morfar/RCLootCouncil2/files/1895504/smalltest.txt
Unlike LibCompress, LibDeflate does not guarantee there is no NULL character in the compressed data if there is no NULL in the input. However, NULL in the compressed data is very rare, no more than 1%, so I am not considering the impact that I need to use extra space to encode NULL characters
The original size of the file is 28459 bytes.
LibCompress reduces this to 18595 bytes (1.53 ratio), it takes 40ms
LibDeflate currently uses the same compression algorithm as GZIP, and uses the same compress configuration. Note that LibDeflate produces slightly smaller file than GZIP, because it does not have file header that stores filename, filemodification file, etc.
Compression Levels and result. Each level outputs file size slightly smaller than gzip with the corresponding settings
April 14th
- 7498 bytes in 36ms
- 7032 bytes in 39ms
- 6758 bytes in 48ms
- 6423 bytes in 60ms
- 6072 bytes in 87ms
- 5907 bytes in 142ms
- 5851 bytes in 198ms
- 5841 bytes in 232ms
- 5841 bytes in 232ms
Update: April 20th
- 7460 bytes in 17ms
- 7026 bytes in 18ms
- 6758 bytes in 22ms
- 6418 bytes in 29ms
- 6070 bytes in 37ms
- 5906 bytes in 51ms
- 5852 bytes in 69ms
- 5841 bytes in 81ms
- 5841 bytes in 81ms
Another example is https://gist.github.com/evil-morfar/b1ee4aadc17ba07024df2014b2d46194 mentioned in #125
The original size is 68831 bytes.
LibCompress reduces this to 33955 bytes in 90ms
Comparing this to LibDeflate in different compression settings.
April 14th
- 10664 bytes in 57ms
- 9993 bytes in 62ms
- 9765 bytes in 73ms
- 8717 bytes in 104ms
- 8090 bytes in 137ms
- 7670 bytes in 214ms
- 7560 bytes in 281ms
- 7524 bytes in 377ms
- 7524 bytes in 381ms
Update April 20th - 10684 bytes in 33ms
- 9938 bytes in 34ms
- 9781 bytes in 38ms
- 8704 bytes in 56ms
- 8072 bytes in 67ms
- 7662 bytes in 92ms
- 7554 bytes in 113s
- 7518 bytes in 148ms
- 7522 bytes in 149ms
Update April 22th:
In order to reduce memory usage, when the string size is > 64K, additional operation to free memory will be done, and the time of the above file increases by ~60ms.
It is significant improvement considering WoW only has 1KB/s addon channel bandwidth.
Now consider a more realistic example.
The following table contains all item strings equipped by a player, plus some additional meta data.
Encoded data by current RCSerializer: itemStrings.txt
local t = {
p=true,
ilvl=987.65,
g= {
"147013:5890:::::::110:105::6:3:3563:1517:3336",
"134497:5890:151584::::::110:105::35:4:3418:1808:1587:3337",
"137535:5890:::::::110:105::16:3:3418:1567:3528",
"147014:5437:::::::110:105::5:3:3562:1517:3337",
"142207:5439:::::::110:105::35:3:3418:1562:3337",
"152020::::::::110:105::3:3:3610:1472:3528",
"147040::::::::110:105::6:3:3563:1517:3336",
"152021::::::::110:105::5:3:3611:1487:3528",
"137361::::::::110:105::35:4:3418:42:1587:3337",
"151986::::::::110:105::5:3:3611:1487:3528",
"132366::::::::110:105:::3:3529:3459:3570",
"147039::::::::110:105::3:3:3561:1512:3337",
"137361::151585::::::110:105::35:4:3418:1808:1597:3337",
"132366::::::::110:105:::3:3529:3459:3570",
"147039::::::::110:105::6:3:3563:1522:3336",
"132864::::::::110:105:::2:1811:3630",
"152006::::::::110:105::5:3:3611:1487:3528", },
}
The origin size is 738 bytes.
All following time is measured by repeating the compress function 100 times, out of game.
LibCompress reduces this to 366 bytes in 1.46ms
Comparing this to LibDeflate in different compression settings.
April 14th
265 bytes in 1.8ms
260 bytes in 1.9ms
257 bytes in 1.8ms
249 bytes in 2.1ms
243 bytes in 2.5ms
239 bytes in 2.8ms
238 bytes in 3.3ms
238 bytes in 3.3ms
238 bytes in 3.7ms
Update Apriil 20th
- 265 bytes in 0.96ms
- 260 bytes in 0.98ms
- 257 bytes in 1.0ms
- 249 bytes in 1.18ms
- 243 bytes in 1.27ms
- 239 bytes in 1.3ms
- 238 bytes in 1.46ms
- 238 bytes in 1.53ms
- 238 bytes in 1.53ms
Do you think it's worthwhile to transmit all equipped gear if it doesn't cost much bandwidth (Fit within one SendAddonMessage API)?
I think this is useful information for people. I am implementing compression settings with higher compression ratio.
Further improve the compression ratio would need to define a preset dictionary, but it is a much harder problem.
The compression part is now functional.
Can you try the performance in-game with different compression level (1-9)?
There is only 1 file in this library "LibDeflate.lua" and currently one API LibDeflate:Compress(str, level)
https://github.com/SafeteeWoW/LibDeflate
In RCv3, I want to have the following workflow. A good habit for programmers.
- In
.git/hooks/pre_commit
, (Note that this file is local and is NOT in the Git source control.) store the following script
#!/bin/sh
luacheck -g -u . c || exit -1 # static analysis without checking global and unused related stuffs.
# Also run a test script?
This will do lua static analysis and linter check before all commits, ignoring all global and unused variables warnings. If any warnings or errors presented, such as trailing whitespaces, commit will fail.
What is git hooks: https://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks
LuaCheck download: https://github.com/mpeterv/luacheck/releases
LuaCheck documentation: http://luacheck.readthedocs.io/en/stable/cli.html#command-line-options
-
Atom Packages I have installed.
- Linter: base linter package
- language-lua: For lua API autocomplete
- language-lua-wow: To highlight WoW keywords.
- linter-luacheck: To show luacheck result on the fly. You need to set up a
.luacheckrc
in the project folder to define what global variable are available and to filter some warnings. I have made one in LibDeflate repo. I will add more WoW globals in the future. See doc in http://luacheck.readthedocs.io/en/stable/config.html - linter-lua-findglobals: Similar to above, but I don't define a whitelist for globals, thus all global variables used will be warned by this. We should ensure global variables only appears at the beginning of the file. (Linter-luacheck is to ensure we never use a global that does not exist. Linter-lua-findglobals helps us to localized global variables.). In the setting of linter-lua-findglobals, enable GETGLOBALFILE, GETGLOBALFUNC, SETGLOBALFILE, SETGLOBALFUNC, then set the error level of GETGLOBAL and SETGLOBAL to "info", so it looks different with linter-luacheck
- Script: Run Lua script.
-
Unit Tests
LuaUnit is the better Unit Test Framework so far: https://github.com/bluebird75/luaunit -
Continuous integration (Travis CI and Appveyor)
I have learned Travis CI and Appveyor in few days. They run test for every push I made. It is so convenient and fun. I have tested my partial working LibDeflate in Lua5.1/5.2/5.3, LuaJIT2.0/2.1 (5 interpreters) * (Linux/OSX/Windows, 3 operating systems) = 15 environments!
LibDeflate is now actually working.
There are two APIs currently.
LibDeflate:Compress(str, level) -- Level is a number indicating the compression level. 1 is fastest, 9 is the slowest.
LIbDeflate:Decompress(str)
Can you try it out?
- Is there any compress/decompress bug? That is. Error duing compression, or decompression result does not match the origin string. If you have seen any, please record the input string and I'll fix the bug and add the string to the test case. There is no hard cap on the length of the input string, but for the performance reasoning, < 200KB is recommended.
- Does this Library cause noticeable FPS drop?
- Do you think if it is okay that this Lib costs 2.5MB memory when nothing is happening? I am using a big table to speed things up.
Note that this library is still development status, and the current decompression speed is very slow, because I have just made it working and haven't apply any optimization. Also the initial memory usage is very high. I'll fix it later.
Hope you can get some free time recently. (I don't have much either. Developing this at midnight).
I'm sorry, I've been really stomped at work and with exams lately. I might get some time during this week. Next week I'm on vacation, and hopefully after that I'll have a few weeks with more time for RCLootCouncil.
Welcome back. Help me to test out LibDeflate?
https://github.com/safeteeWow/LibDeflate
Really spend me some time to make this project. Thanks.
Actually, I haven't tested once in-game, but it is really tested very extensively out of game.