Botany Pots

Botany Pots

33M Downloads

Discussion: Aggressive optimization of pot ticking

vadcx opened this issue · 3 comments

commented

I have looked at and read the pot TE code with the love it deserves and noticed great optimization efforts in the changelogs. Looks like it's still not enough, because playing on a public server, our admins identified the mass usage of BotanyPots as one of the contributing problems to low TPS. I had over 64 pots to be precise, doesn't seem too much to me when you want some of that growth product and fast. However the rulers have spoken and mandated a maximum of 10 pots per player. 1.19.2-9.0.31 btw.

What optimization approaches can be done? At a first glance (without a profiler yet)...

1. Timer-based

It would seem the permanent fix is to convert the growth from tick-based to timer-based. This would decouple the progress from the current tickrate and reduce the load to nearly zero.

I cannot find the project where this has been successfully done for furnaces (though these two links are a good overview of what exists out there). The server-side patch for a Vanilla (Spigot) server converted all furnace ticking from the regular "tick on each world tick" approach to be "timer-based" instead.

This is a huge architectural change and I'm not sure this can be applied to seeds and plants at all, if they can potentially have custom growth logic triggered on any given tick? If only Minecraft had been designed with this architecture in mind from the very beginning...

2. Batching of ticks

Failing that method, I suppose one could skip ticking on every world simulation tick, to accumulate ticks and execute them all at once ("timer coalescing" is the keyword). For example, process 5 ticks at once on every 5th TE tick (twice a second). This would allow each TE, while it's being ticked, to hit the hot code path for all but the very first tick out of the batch. In other words, utilize CPU caches and branch prediction.

One should be mindful of caches etc. when benchmarking the pots for performance. An empty creative world wouldn't have any other TEs ticking other than the Botany Pots you've setup for the test. To simulate a real environment you would need to introduce some noise by adding other ticking Tile Entities.

3. Profiling-guided and micro-optimizations

Do not change the architecture but try to get the most out of the current code. I don't expect much to be gained here, else I wouldn't have started this discussion to better understand the internals. Of course there's no way ahead without a profiler.

iirc, reducing multiple reads of this.getLevel() is one such micro-optimization example, because the JVM is not allowed to do this on its own. Yeah, the article deals with concurrency, but the relevant part is the explanation of JLS and JMM. But don't quote me, I haven't looked at the compiled assembly and it has been a while since I read it.

if (this.getLevel() != null && !this.getLevel().isClientSide) {

4. Reworking Minecraft's (tile) entity ticking

I simply present it here as an idea, that's too big in scope to implement. It's really an overhaul of the ticking loop (don't forget to ensure consistency with not ticking entities and unloaded chunks).

I do not know in what order the entities are ticked, but it's a common performance issue when regarding OOP vs data-oriented programming. While we can't escape the fact of each entity being it's own encapsulated object, maybe the performance can be helped by ticking the entities of one type together.

Suppose that currently the entities are ticked in a random order. The change would sort entities of one type and tick:

  1. All Zombies
  2. All Creepers
  3. All Furnaces
  4. All Pots

The benefits? Always hitting caches for code, but not data. If the current order is random then neither is hit.


I would like to know if (1) is technically possible and what you think of (2), the more likely approach. Of course the real work will begin once armed with a reproducible benchmark map and a profiler.

Thanks for reading, I hope your tea (or coffee) isn't cold yet :)

commented

Hey, thanks for opening this issue. In my experience server side lag is often misattributed to Botany Pots. This is often due to the connected pipe mods being poorly optimized, or client side lag being misinterpreted as server side lag. While I am open to further optimizations of the code I can not act on speculation or hypothesis without something more concrete to back it up.

Mojang has already implemented aspects of your first two suggestions into the game however the performance gains are marginal. A lot of the performance gains are lost elsewhere as the CPU still needs to do the work to calculate and update the states of the blocks. There is also a known phenomena where task scheduling and timer coalescing can counter intuitively produce less stable results as computationally heavy tasks cluster together.

Your third suggestion is the primary way I have been optimizing the code. The performance gains here can be pretty massive as there are often unnecessary and unintentionally expensive calls being made. In the past a major optimization was only looking up crop and soil definitions when the crop/soil item stack had changed. While removing duplicate reads of this.getLevel() could be seen as a micro-optimization, the impact of such code is so negligible it does not register in the profiler I use. There is still some potential for gains here though. Duplicate reads in item/soil slot lookups causes Mojang's code to validate the slot index bounds twice, and BlockEntityBotanyPot#isHopper could be avoided entirely.

commented

Hi, I took an inexcusably long amount of time to study all the popular optimization mods (list) out there, I've become something of an eksperd myself now :)

In my experience server side lag is often misattributed to Botany Pots. This is often due to the connected pipe mods being poorly optimized

I know that feel :) I was going to take it into account for benchmarking, by using hopper pots into trash cans. Afaik my admins' actions were supported by a mod that shows individual TE ticking times, so the pipes I was using (Mekanism) shouldn't have wound up in the pot timing or did it?

I can not act on speculation or hypothesis without something more concrete to back it up.

Fair, fair.

Mojang has already implemented aspects of your first two suggestions into the game however the performance gains are marginal.

I found no mentions of this anywhere.

Example: the furnace recipe lookup optimization (FastFurnace) still hooks the entity ticking for its purpose. If the furnace didn't tick each tick (→ timer-based) then this optimization would've been useless. If the ticks were coalesced, the difference wouldn't be as impressive. Code, Profiler before, after. The author says there were 9 furnaces for the test, 13k calls gives us 1444 ticks or 72s for test duration. The screenshot is from 2018, 1.12-1.13.

Looking at the wiki changelog pages for 1.13-1.19.4, searching for (optimiz, entit) yielded nothing of relevance.

"1. Timer-based" is still not a breaking change if done carefully. "2. Batching of ticks" will never be done in core Minecraft because it changes how the entities tick - but it's acceptable for pots. The current ticking heavily relies on a deterministic order too, so reordering ticks to sort by entity type would break mechanics like pistons. Example of OrderedTick in Lithium.

Your third suggestion is the primary way I have been optimizing the code.

I should've clearly separated the profiling from micro-optimizations in there. The "reads" as you used them I would call regular "function calls" whereas reads in the above mentioned article are literally repeated reads from memory. When considering this, you should also be mindful of how the code/function calls will be inlined.

the impact of such code is so negligible it does not register in the profiler I use.

To give you another fun link for reading "Lies, darn lies and sampling bias". What profiler did you use? Spark appears to support the latest and greatest async-profiler (I assume the doc refers to that same thing) but only under Linux x86-64. With that in mind I purposefully mentioned "looking at assembly" to verify the claim (ah it always is like that in optimization, lots of assumptions and hypotheticals until you dissect it).

https://github.com/tasgon/observable is too imprecise for development because it attempts to profile each tile entity using the default System.nanoTime() (Nanotrusting the Nanotime) - you would need the poll the CPU performance counters for reliable reliability ;)

Finally is there any kind of automated benchmarking? The only automated thing I found was Mojang's testing suite for code https://github.com/2No2Name/McTester (https://www.youtube.com/watch?v=vXaWOJTCYNg). I can't believe everybody has been doing manual performance testing to date... Luckily in case of pots, all we need is a headless server with loaded chunks and a prepared world.

commented

Afaik my admins' actions were supported by a mod that shows individual TE ticking times, so the pipes I was using (Mekanism) shouldn't have wound up in the pot timing or did it?

Any code that the pots interface with during their tick would be included in their tick time. For example if inserting into a pipe takes 1ms that will be added to the pots tick time if the insert was initiated by the pot.

Feel free to open a new issue when you have some actionable profiler results I can look at.