Mekanism Additions

Mekanism Additions

21M Downloads

Performance Issues

KR33PYK1NG opened this issue ยท 24 comments

commented

Issue description:

Ever since I opened my public server with Mekanism installed (a couple months ago), I've been struggling with performance issues on Mekanism side.
Two main bottlenecks I want to point out are tile-entity ticking (takes 30-50%) and network ticking (takes around 20%).
Yesterday, after seeing commit 5fa4cf1 I grew hopeful about updating to 10.1 dev branch. After doing so, I gathered 8h long sampling results, and - unfortunately - found out nothing has changed in terms of bottlenecks. Same 30-50% for tile-entities, same 15-20% for networks.
So, from what I've seen, Mekanism always takes up to ~50-70% of server processing time.
Updating Forge or Mekanism did nothing.
I also want to make it clear that I've never had any similar issues with other mods, and Mekanism tile-entities/networks on player bases don't significantly surpass other mods' mechanisms in count (Refined Storage/Silent Mechanisms/Industrial Foregoing/etc.)

Versions:

Forge: 36.0.42 up to 36.1.13
Mekanism: 10.0.19.446 up to 10.1 dev branch (commit 07fa2a9)

Sampling reports:

https://spark.lucko.me/SsjnPXvuB1 - Most recent report, using 10.1 dev branch commit 07fa2a9
https://spark.lucko.me/pmtcYGKArI - using Mekanism 10.0.21.448
https://spark.lucko.me/ilj4QGOqtu - ~24h long report, using Mekanism 10.0.21.448

commented

Closing this as "fixed" for now with how many improvements there have been, if there are more performance issues I think they should end up being opened in a new issue.

commented

Profilers tend to blame the tick method of our base class a lot, Which doesn't do a lot by itself (no heavy looping anyway).

public void tick() {
for (ITileComponent component : components) {
component.tick();
}
if (isRemote()) {
if (hasSound()) {
updateSound();
}
onUpdateClient();
} else {
if (isActivatable()) {
if (updateDelay > 0) {
updateDelay--;
if (updateDelay == 0 && getClientActive() != currentActive) {
//If it doesn't match and we are done with the delay period, then update it
level.setBlockAndUpdate(worldPosition, Attribute.setActive(getBlockState(), currentActive));
}
}
}
onUpdateServer();
if (persists(SubstanceType.HEAT)) {
// update heat after server tick as we now have simulate changes
// we use persists, as only one reference should update
updateHeatCapacitors(null);
}
lastEnergyReceived = FloatingLong.ZERO;
}
ticker++;
if (supportsRedstone()) {
redstoneLastTick = redstone;
}
}

If you can work out where that 10% self-time is spent, that would really help. The 30% as a whole is not really relevant, as it's just a sum of the other bits that get called from all of out Tiles

commented

Profilers tend to blame the tick method of our base class a lot, Which doesn't do a lot by itself (no heavy looping anyway).
If you can work out where that 10% self-time is spent, that would really help.

I also noticed that extra 10% on base tick. Sure, I'll look into it further.

The 30% as a whole is not really relevant, as it's just a sum of the other bits that get called from all of out Tiles

It's somewhere around 40%, actually. Don't forget about transmitters which are outside of TileEntityMekanism.tick call.
What concerns me is that sum of all non-Mekanism tiles doesn't even come close to Mekanism's 40%.

commented

30% is a reference to just the tick method, I wasn't saying the complete Mekanism footprint.

Transmitters are also affected by the actions other blocks take on receipt of the contents, and drilling down on them doesn't show anything significant that I could see.

commented

It may also help some to get a general sense of a rough count of how many of various Mekanism blocks you have. There is also this which has some ideas of mine for how to potentially improve things further that I am planning to do for 10.1, though thinking about it maybe an even better solution will be to lazily mark the comparators as dirty in the sense that it marks a boolean and then we just check that in the tick if it is and then see about updating comparators. Given while that may make the tick call slightly more expensive it also in a lot of places for where contents change of things should make it quite a bit less expensive and I think may overall even reduce the expense of the ticks. I am currently working on moving some stuff to our API (mostly just have to finish documenting it), but afterwards I can at least make the simpler change of the two improvements I am thinking of so as to see if that improves the performance any for you/how much. (Even though it may not have quite as much of an effect as when I get around to implementing what I have in the listed TODO.)

commented

While it probably won't solve everything, I would be at least somewhat interested in seeing if things are better with 2d438e3 and maybe it will cleanup the profiling enough to see what things are actually going on.

commented

The most part (~8%) of 10% overhead in TileEntityMekanism.tick is caused by this loop

The profiler shouldn't be collapsing that to self time then, as the actual work is contained in the separate method. Looping itself should be trivial (most tiles only have one component)

commented

The most part (~8%) of 10% overhead in TileEntityMekanism.tick is caused by this loop

The profiler shouldn't be collapsing that to self time then, as the actual work is contained in the separate method.

If I understand you correctly, then yes - the actual component-related work is indeed contained in a separate method (component.tick), but the 'for' loop itself (not including the component.tick call) still remains in TileEntityMekanism.tick.
So, in Spark, time spent in this loop contributes to TileEntityMekanism.tick self time, while component.tick is separated into its own method as expected.
I can provide another Spark report demonstrating what I've just described if you want to see it yourself.

commented

Is the rough amounts of components per tile or per type? As I wouldn't expect numbers anywhere that high for how many are per tile and if they are I will look into if maybe there is some issue causing components to be added many times or something. Though if it is per type dividing the rough amounts you give by how many of said tile it seems a lot more reasonable for how many there are.

commented

Is the rough amounts of components per tile or per type? As I wouldn't expect numbers anywhere that high for how many are per tile and if they are I will look into if maybe there is some issue causing components to be added many times or something. Though if it is per type dividing the rough amounts you give by how many of said tile it seems a lot more reasonable for how many there are.

Damn, you're fast! I deleted my reply cause I was concerned I messed up the debug output related to components somehow.
I will reup the comment in a few minutes when I ensure the output is correct.

commented

This is a repost: below is the original deleted comment. I have verified that the values provided in report are indeed correct.
To clarify, everything in rough amounts report is counted per class, not per tile.


Came back with news:

It may also help some to get a general sense of a rough count of how many of various Mekanism blocks you have.

  1. Here is the rough amounts report for all networks/tiles/components that are being ticked: click

If you can work out where that 10% self-time is spent, that would really help.

  1. The most part (~8%) of 10% overhead in TileEntityMekanism.tick is caused by this loop:
    for (ITileComponent component : components) {
    component.tick();
    }

    The loop itself is heavy because of high component amounts (kind of confirmed by the rough amounts report above)
    The last 2% is probably just a sum of time spent on base tick (excluding component loop) for all tiles, which is believable.

While it probably won't solve everything, I would be at least somewhat interested in seeing if things are better with 2d438e3 and maybe it will cleanup the profiling enough to see what things are actually going on.

I will try running with commit 2d438e3 tomorrow and will provide feedback.

commented

Please do, the time spent doing the loop itself should be nanoseconds

commented

Please do, the time spent doing the loop itself should be nanoseconds

Here it is. I moved our loop into the new separate method named tickComponents.
In this report you can see that TileEntityMekanism.tick self time dropped to ~2%, and tickComponents self time is ~8%, which visualizes what I was describing earlier.

I will try running with commit 2d438e3 tomorrow and will provide feedback.

Results.

commented

For experimentation, can you try f639d3c

Sure thing; Results with commit f639d3c.

commented

For experimentation, can you try f639d3c

This would affect the transmitters' tick event timing.

commented

I believe aa77451 might end up improving things a fair bit due to how much you currently have going on in your world so it may be worth testing when convenient. One thing to note is that it doesn't include thiakil's above changes yet but I think there are a few bugs in his changes currently that I am discussing with him on discord.

commented

I think we're reaching the end of Spark's usefulness - one thing it does not appear to track is how many times something got called, it only gives total times which will eventually add up just because you have a bunch of Mekanism stuff in your world. e.g. that thing about the loop.

You have 26,694 Mekanism TileEntities based on your component stats. To maintain 20 TPS each tile would need to take only 0.0018 milliseconds to tick. Some of those shouldn't be doing much at all, but others can't keep that up. That's not even taking into account other mods' blocks.
Also at that point, profilers/samplers are not accurate. VisualVM only goes down to 20ms resolution.

To put it another way, take the example of repeatedly going to your fridge - you can move it closer, but eventually you can't move it closer and have to realise you should be reducing the number of times you go to it to grab something.

If you have physical/shell access to the server, running VisualVM's sampler (not profiler) instead and exporting a snapshot file will help.

commented

If you have physical/shell access to the server, running VisualVM's sampler (not profiler) instead and exporting a snapshot file will help.

Yes, I can do that. I just hope my continuous feedback is being somewhat useful.
Are you sure you want to see VisualVM sampling report instead of profiling, though? Sampler does not track number of calls, profiler does.
Also I'd like to maybe propose moving our conversation to Discord to be more in touch and for me to share my VisualVM snaps, since they could contain more sensitive information than Spark ones.

commented

image

Erm, what sensitive data do you think it would contain? The snapshots don't contain memory dumps (that I know of).

commented

Huh, interesting, you proved me wrong on both moments related to VisualVM.
Related to info - I thought VisualVM may include OS/JVM info alongside.
I guess I must've had mixed it up with another profiler.
Give me some time and I'll return with a sampling snapshot, then.

commented

I've added some more things on https://github.com/mekanism/Mekanism/tree/v10.1-experimental which might help your server's performance.

These changes seem promising.
I'll try them out and report back, as usual.

commented

Are you sure you want to see VisualVM sampling report instead of profiling, though? Sampler does not track number of calls, profiler does.

image

It turns out the "Hits" column is hidden by default and I didn't even know it existed until I saw your screenshot and googled. That's what was bugging me.
Anyway, here's the desired VisualVM snapshot and accompanying Spark report, both running commit aa77451.

commented

I've added some more things on https://github.com/mekanism/Mekanism/tree/v10.1-experimental which might help your server's performance.
There is still the bug(s) that pupnewfster mentioned though.

NB: with VisualVM, it doesn't default to the best sampling resolution, which can be adjusted like so:
image

commented

I'll try them out and report back, as usual.

Performance results with commit b722c8d, sampling frequency is set to 20ms.