LibDeflate

LibDeflate

29k Downloads

Fastest way to detect a difference between 2 LibDeflate:EncodeForPrint strings?

jmaldon1 opened this issue ยท 14 comments

commented

I have to check the difference between a bunch of encoded for print strings because I want to see if any of them have updated. I do this at the load of my addon, so preferably I'd like this to be as fast as possible to not make my addon a burden when people load up the game.

This is what I have so far. Based on what we know about the encoded string is there anything faster we can do here?

So far if nothing has updated, I am doing an O(N) comparison on every string for what feels like no reason.

local function isEncodedStringDifferent(stringA, stringB)
    if #stringA ~= #stringB then
        -- O(1) check if lengths are different.
        return true
    end

    -- Compare the characters of each string in reverse because the back of the encoded strings change the most.
    for i = #stringA, 1, -1 do
        local charA = string.sub(stringA, i, i)
        local charB = string.sub(stringB, i, i)
        if charA ~= charB then
            return true
        end
    end
    return false
end

Are we able to safely assume that no matter how small the change that was made to the data, the last character of the encoded string will definitely be different?

commented

I don't think there exists algorithm faster than O(N)
EncodeForPrint maps every 3 bytes of original data into 4 printable characters.
Or to be precise, the encoded byte length = ceil( 8* len(input) / 6)

The generation of encoded table is just a linear traversal of the original input.
The last character of the encoded data is not special, and has no relationship to the earlier portion of the input.

commented

I would recommended to also store a hash value for the encoded string,
in your addon's configuration file.
You need to define a hash function.

When the encoded string is exported, also export its hash value along with it

commented

Or use LibDeflate:Adler32() as a hash function

commented

Are you saying to hash the encoded string on export and then instead of comparing the encoded strings, we just compare the hashes?
And if the hash function returns a fixed length output checking for changes would just technically be O(1)?

commented

Ok interesting, and what will the hash value provide for me?

And do you recommend any hashing algo?

commented

Also, compare Lua String simply by
a == b

Lua uses the following C code to compare them, which is significantly faster than Lua calls

int luaS_eqlngstr (TString *a, TString *b) {
  size_t len = a->u.lnglen;
  lua_assert(a->tt == LUA_VLNGSTR && b->tt == LUA_VLNGSTR);
  return (a == b) ||  /* same instance or... */
    ((len == b->u.lnglen) &&  /* equal length and ... */
     (memcmp(getstr(a), getstr(b), len) == 0));  /* equal contents */
}
commented

Copy from Lua source code. You can use this hash function

unsigned int luaS_hash (const char *str, size_t l, unsigned int seed) {
  unsigned int h = seed ^ cast_uint(l);
  for (; l > 0; l--)
    h ^= ((h<<5) + (h>>2) + cast_byte(str[l - 1]));
  return h;
}

unsigned int luaS_hashlongstr (TString *ts) {
  lua_assert(ts->tt == LUA_VLNGSTR);
  if (ts->extra == 0) {  /* no hash? */
    size_t len = ts->u.lnglen;
    ts->hash = luaS_hash(getstr(ts), len, ts->hash);
    ts->extra = 1;  /* now it has its hash */
  }
  return ts->hash;
}

Make sure hash function is not called when initializing your addon.

commented

Attempting to format the result of the hash results in an overflow error:
integer overflow attempting to store 246806894

with the following code:

local hash = LibDeflate:Adler32(printableEncode)
local d = string.format("0x%08X", hash)
commented

Adler32 may cause false positive due to its collision rate.

https://github.com/philanc/plc/ has some pure Lua implementation of sha1, md5, etc,
which is better.

But its implementation is for Lua 5.3. You may need to modify it to make it work for Lua 5.1

commented

I tried this CRC32 algo (https://github.com/openresty/lua-nginx-module/blob/master/t/lib/CRC32.lua) and this md5 algo (https://github.com/kikito/md5.lua/blob/master/md5.lua) but I get the same overflow error.

I am not too familiar with how WoW works with Lua but I assume there is something with WoW that is preventing this. Or maybe I am doing something wrong. But so far its not looking like printing the hex string of a hash is happening :/

commented

It seems to be a WoW specific issue.
But I don't think the code string.format("0x%08X", hash) is required for you.

commented

How do we get the result of these hash functions to be a fixed length hex so we can print it?

commented

Assume string.format does not exist. Write one for 32bit number from scratch.

commented

https://stackoverflow.com/a/66819728/9506134

Found this code that converts numbers to hex, just needs to be modified to pad the hex to a fixed length. Otherwise, it seems to work great!

Thanks for the help @SafeteeWoW. I am not familiar with Lua at all and the fact that it acts differently under WoW is something to get used too.

If I end up modifying the code to pad the hex, I'll add it here later, otherwise we can close this.

EDIT:
Turns out all you need in order to pad the result of that function is this: string.format("0x%08s", hex)