Use a string buffer within textutils.serialize

Question

Use a string buffer within textutils.serialize

SquidDev opened this issue 5 years ago · 3 comments

SquidDev commented 5 years ago

textutils.serialize (or rather serializeImpl) currently builds up its result via immediate string concatenation. This means its complexity is quadratic, which isn't fantastic.

While textutils.serialize's performance is not a serious concern, it would still be nice to improve it. The best way to do that would be replacing our concatenation with a classic string buffer and a final concatenation:

local function serializeImpl( t, tTracking, sIndent, buffer )
    if sType == "string" then
        buffer[#buffer + 1] = string.format( "%q", t )
    elseif -- etc...
end

function serialize( t )
    local tTracking, buffer = {}, {}
    serializeImpl( t, tTracking, "", buffer )
    return table.concat(buffer)
end

One other thing to note is that getting the length of a table is O(lg(n)), so it may be worth keeping that around too. However it's probably worth profiling doing so and seeing if it is worthwhile or not.

SquidDev · Answer 1 · 2020-02-07T15:50:49.000Z

Just had a stab at this, and was unable to find a solution which consistently ended up faster than the naive implementation. Which leaves me slightly worried, as it does question everything I know about Lua performance. Welp.

SquidDev · Answer 2 · 2020-02-07T15:51:20.000Z

If people want to have a look at this themselves, happy to share my benchmarking scripts and various attempts.

Edit: Just to clarify the above, there are solutions which work better in specific (arguably pathological) cases, but not in the general case.

Here's a snapshot of profiling data. I'm number of serialize calls in a 3 second window, so higher is better:

Strategy	Small	Deep	Wide	Long
`str-concat`	42426	586	18	36
`cache-insert`	27533	3285	13	127
`str-concat-buffer`	32774	503	16	221
`inline-insert-nocache`	32516	4510	11	178
`inline-insert`	31428	4607	13	169
`upvalue`	28732	3265	11	140
`table-insert`	35770	3804	11	156

For reference, str-concat is the current implementation. You can see it totally falls apart on some edge cases (very heavily nested tables or long arrays), but for the most part it's still significantly better.

exerro · Answer 3 · 2020-03-25T15:11:30.000Z

I had a little go with this and got a pretty good result.

As shown, the new approach is better in every benchmark.

The benchmark script and impl are available here (pastebin get Kr9BdBWJ). See line 174 for changing the benchmarks run, and the tables the benchmarks are run on.

Edit: One potential issue with the function is that it relies on some maybe-implementation-specific behaviour: next has been seen to first return array keys, then hash keys. The serialize implementation uses this to skip the array part by getting the next key with next(value, length).

Share to