Unexpected extra time being taken when import used with large array in a procedure.
LadyCailinBot opened this issue ยท 18 comments
CMDHELPER-2811 - Reported by Tom.Gebbett
Here's an example that causes the problem:
proc('_test',
proc('_test2',
@before = nano_time()
@get = import('hdd.p.player')
@get['uzalu']['test'] = time()
msg((nano_time() - @before) / 1000000)
)
@before = nano_time()
_test2()
msg((nano_time() - @before) / 1000000)
)
this outputs two numbers, being the time each part takes. The first is always around 0.02, since that's about how long the array setting takes. However, the second number is always at least 80, though there is no code between these two outputs. In my experience, the two numbers should have a difference of 0.1 at most.
It seems that modifying a big array that has been imported in a procedure causes the procedure to take a long time to do something when it ends. I guess it's unloading the array in some way, but it's never done this in previous CH versions, afaik.
If I take the code out of _test2(), then the same thing happens at the end of _test(). I've put it in a procedure to demonstrate the problem clearly.
I should mention that the value in import('hdd.p.player') is a large array with all the player data in it. It is 194450 characters long.
Just to give some context, I use import and export to cache SQL persistence retrievals, since otherwise retrieving one key takes 50+ milliseconds, rather than less than one millisecond. (i've tried playing with the persistence config and can't seem to make it cache the database.) Because this is used so often, the extra time being taken is a massive hit to my server's performance.
I'm not posting to get suggestions on how to handle persistence, i'm just reporting what seems like a bug.
This issue was fixed with semi-colon statements as mentioned above with this commit: bd725d0
It was also previously fixed in strict mode here:
69f6baa
Performance will still be worse with functions returning very large arrays without semi-colons in non-strict mode, but official recommended solutions exist now without workarounds.
Comment by LadyCailin
This is curious. import/export doesn't serialize the array (unlike set/get_value) so it is just passing a tiny reference around, so I wouldn't expect that to take much time ever, much less just between the end of the procedure and beginning of resumed code.
Comment by Tom.Gebbett
Yeah, that's what I thought it did... I have seen values much greater than 80 as well, and I would imagine it scales with array size, but I haven't tested that. It's possible that this is some sort of oddity my end, so if you can't find anything, I'll investigate more here. As it stands, I am currently programming a new interface for dealing with persistence, which will involve less huge arrays, since they can cause other unforeseen consequences (though nothing I would consider a bug.)
Comment by LadyCailin
I'm reproducing the issue locally, and the code is valid, so I can definitely confirm that this is a bug in CH, not your code. Returning from a proc should be near instant, regardless of what you're doing in the proc itself.
Comment by Tom.Gebbett
That's reassuring, thank you :) Good luck finding the cause.
Comment by LadyCailin
Ok, so I've figured out the issue. Turns out there is code between the two msgs at the end, because the optimizer turns the contents of the proc into a sconcat, due to auto concatenation. Of course, this is normally transparent to you, but it is in fact a known runtime waste. (The sconcat runs, then the results are ignored.) This is unfortunately necessary atm, because functions aren't smart enough to know if the result of a function is actually required to be held onto or not. So your call to @get = import('hdd.p.player') actually returns @get (because assign returns the variable it assigned, for chaining purposes). So, one solution is to use the very old, but undocumented g() function, like so:
proc(_test2,
g(
msg('Time at top of proc: '.time()) , #Note the commas here
@get = import('hdd.p.player') , #and here
@get['uzalu']['test'] = time() , #and here
msg('Time at bottom of proc: '.time())
)
)
This prevents the compiler from inserting the sconcat into the AST. The sconcat is what's taking so long, because the @get = is a very large string, and so the msg at the bottom has actually run before the sconcat does.
Of course that workaround that I put above shouldn't be necessary. I'm going to see if I can change all proc's outer sconcats into g() in the backend, because this is one place where we can say for sure that the results of the sconcat will be ignored.
Comment by LadyCailin
Fixed in next build. Procs now know not to run sconcats at the root of the proc. Ultimately it would be nice if the compiler just didn't put in the sconcat to begin with, but even if the user inserts an sconcat there, it shouldn't be executed, because the proc will just ignore the results anyways, and adding this to the compiler will be difficult. Results from the timings are now as expected, virtually instant.
Comment by Tom.Gebbett
After testing with both versions, I can conclude that the problem is not solved. The workaround you suggested with g() did not work in either version. I did, however, find a workaround. It is as follows:
This was slow:
@val = @big_array
This was much faster, seemingly fixed completely:
if(@val = @big_array){if(@val = @big_array){1}} else {if(@val = @big_array){1}}
A very cumbersome workaround, but I should mention that entering any fewer if statements did not fix the problem. My method for the workaround was based on your g() suggestion, I figured if I could just hide the output from the assignment, it wouldn't be used. Putting it inside a single if() statement did not fix the problem, but for some reason, nesting them like this did work. I was surprised when it did.
Luckily I just had to use this in one location in my code, which is my main array handling proc.
Currently using CH build #2439, Bukkit build #2893, Minecraft 1.6.4.
Comment by LadyCailin
Alright, I'm not sure why it's not working, I'll have to look at it again and see what's going on. I've reopened the bug for now though.
Comment by LadyCailin
I should mention that eventually, I think I'm going to solve this problem with semicolons. Basically, putting a semicolon at the end of a statement will prevent it from "bubbling up" so doing:
@array = @largeArray;
moreOperations();
will essentially cause the following optimized code:
g(assign(@array, @largeArray), moreOperations())
# as opposed to
sconcat(assign(@array, @largeArray), moreOperations())
Which will then prevent the sconcat deficiencies mentioned above. Currently, semicolons are allowed, but unused, so feel free to pepper them in as necessary at the end of statements. As time goes on, and I make the improvements, it'll optimize automatically.
This bug is still on my mind though, as it's clearly a large runtime hit for many scripts.
Comment by Tom.Gebbett
I have used g() before, but not in the way you're suggesting. I actually used it in trying to understand this problem, to hide the output of my persistence retrieval procedure, to make the time taken easier to see.
Luckily I won't have to use this in a bunch of different places, just the one or two procedures that deal with persistence directly.
I've often wondered what g() was originally meant for... was it not possible to put multiple functions together on a line in a very old version of CH?
Thanks for looking into this, I'll go and test the fix and then report back.
Just saw your most recent post, Thank you for fixing it.
Comment by LadyCailin
In the veeery first version I released, each statement in code was considered a parameter, so something like
if(@condition,
func1() func2()
, #else
func3()
)
was actually converted to this:
if(@condition, func1(), func2(), func3())
which as you can imagine caused serious problems for all but the most primitive aliases, but actually, not until later on, because the very first version only had 5 functions anyways. After I started adding more features though, it became a problem. As a very quick stopgap measure, I added the "g()" function, which stands for "group," allowing scripts to combine multiple nested statements into one, while I figured out a better way. (Adding that one function took 5 minutes, so I went ahead and did it.) Obviously I figured out the right way to do it soon after, but I kept the g function in, and left it undocumented, because it has come in handy for various workarounds like this throughout the years. It's not documented though, because there's really no point in using it, and it generally only serves to make the program just that much slower, and any required usage of it is actually a bug. There's a few other undocumented functions that are used for workarounds and testing, namely p() ("p"assthrough) and dyn() ("dyn"amic"). They are in the code, and I suspect that they will return documentation if you use reflect_docs on them, if you're curious. But both of those could potentially be used for workarounds in the future if ever a need arose.
And actually, my fix is a bit more elegant than simply wrapping the proc's contents in g(), what I'm doing is ignoring the root sconcat, which is what the compiler ultimately turns things into that need auto concatenation. Technically it would ignore it if you hardcoded an sconcat in too (which you would probably never do normally), but this is in fact a common case, since the compiler will automatically do that for ALL procs with more than one statement in them.
Comment by Tom.Gebbett
I'm aware of p() and dyn(), though I haven't been able to figure out what they do. I've also seen cbrace(), cbracket() and autoconcat(). I can guess what they do to an extent.
Yes, g(), p(), and dyn() all have documentation, although p() doesn't give any functional information. Trying to get docs for those other three though doesn't work, since reflect_docs() seems to be designed for procedures to have documentation in the future, so it thinks those functions are procedures because of the underscores.
I've also known about moo() for quite a while. I assume this has some sort of vital usage in debugging CH? :P
It's cool to learn about CH history, thanks for sharing.
Comment by Tom.Gebbett
I've just been able to test the workaround you suggested earlier, it does not reduce the time taken:
proc('_test',
proc('_test2',
g(
@before = nano_time() ,
@get = import('hdd.p.player') ,
@get['uzalu']['test'] = time() ,
msg((nano_time() - @before) / 1000000)
)
)
@before = nano_time()
_test2()
msg((nano_time() - @before) / 1000000)
)
Still outputs:
0.02
100
Numbers vary but are always close to those given. Did I get something wrong in implementing the workaround?
Comment by LadyCailin
Um, yes, yes, moo, very vital for testing stuff. ;p
There is no more need for a workaround with the latest versions.. is the timing not the same, with or without the workaround?
Comment by Tom.Gebbett
I have not tested the latest build, i'll do so tomorrow. The workaround did not seem to help, no. I was wondering, would simply hiding the output of the assign() fix this? like:
if(@get = import('big_array')){null}
or even just
g(@get = import('big_array'))
Perhaps if we also included the array_set() in these too?
There was a build that mentioned 1.7, but I won't be updating for a while, I assume the latest still works with 1.6.4 craftbukkit versions?
Comment by LadyCailin
The latest should work, I think, though I haven't tested. Here's the code that I'm using for testing the time difference:
@a = array()
array_resize(@a, 10000, 'This is a large array')
export('hdd.p.player', @a)
proc('_test2'){
msg('Time at top of proc: '.time())
@get = import('hdd.p.player')
@get['uzalu']['test'] = time()
msg('Time at bottom of proc: '.time())
}
msg('Time before proc: '.time())
_test2()
msg('Time after proc: '.time())
Basically, I export a large array, then use the rest of your code. The time stamps occur just before and after calling the proc from main code, and at the top and bottom of the proc itself. I'm not sure why the workaround isn't working for you, it was working for me before I fixed it the right way...
Comment by Tom.Gebbett
That's fine, I'm sure it'll work when I update, haven't had much time recently, but I will report back when I can.
You're welcome to use your own names with the import export, I'm not even using that data management system any more :D
Thanks for your attention to this problem :)