
Conduit lags (chunk uloading)
Opened this issue ยท 40 comments
Issue Description:
conduit drops TPS
See: SpongePowered/SpongeForge#1253
What happens:
Conduits have too many calls and cause TPS drop
What you expected to happen:
no tps drop
Steps to reproduce:
see: SpongePowered/SpongeForge#1253
1.
2.
3.
...
Affected Versions (Do not use "latest"):
- EnderIO: build 156
- EnderCore: build 62
- Minecraft: 1.10.2
- Forge: 2221
Your most recent log file where the issue was present:
https://timings.aikar.co/?id=606fbc9959d74cb692001d0c91a53e39
https://timings.aikar.co/?id=ae4f3b583ec74c42b078d1085110455f
PS: That tool reports that conduit network ticking is spending time firing arrows. I am 110% sure we have no arrow-firing conduits. Those Aikar timings are pure trash and completely useless.
We don't advertise. It's just there, Minecraft Timings is built into many minecraft server implementations (outside of Forge and Spigot).
And in this specific case it makes most sense for Ender to implement Timings, forcing an identification for every mod that registers itself into Ender.
Anytime your code passes into another mods code, you can wrap it with timings making it very clear which is your code and which is children.
It's up to if you want your users to have more data from your mod.
@HenryLoenwind I think you may of misread his report.
I do not see anything about Arrows in it, here is the part that is important:
http://i.imgur.com/jq5YpXr.png
Note that the lag is reported in SELF. Meaning not caused by children nodes.
Good joke. Open a ticket against you jvm then. There is no way this:
can take any measurable time.
And the arrow firing I found here:
PS: I stand by it; your tool in its current form is useless. The timings you report appear random and even nonsensical at times and there's no information there that could help me to find out what's happening. Over the last months I have spent hours trying to read anything out of those reports---and it gave me nothing. On the other hand, I've gotten jvisualvm screenshots that nailed a problem on the head so hard that it took me 10 seconds to find a code location that could use optimisations.
PS2: @hunter688: In case you missed it, I tagged this "out of date" because you are on an old version from before the time I last changed something on the server and conduit tick handling.
Theres no way looping through all of these https://github.com/SleepyTrousers/EnderIO/blob/1.10/src/main/java/crazypants/enderio/conduit/ConduitNetworkTickHandler.java#L63-L68 and ticking them can take any time?
(that arrow timing is your EntityJoinWorldEvent handler probably being called under the tick from an entity being spawned in the course of ticking the network).
Self does not mean that immediate function only. Its anything under it thats not timed by another timed section.
In other words, that you can't place the blame on someone else's code that's being executed as a child of your code.
Every single TickListener.tickStartTick/tickEnd is part of SELF, UNLESS you add Timings API to your tick listeners and give each listener their own identifiable name, so you can drill down to find which listener is causing the trouble.
Providing support for environments that do not contain timings is simple, you can see an example here:
https://github.com/aikar/commands/blob/master/src/main/java/co/aikar/commands/CommandUtil.java#L866
https://github.com/aikar/commands/blob/master/src/main/java/co/aikar/commands/CommandTiming.java
https://github.com/aikar/commands/blob/master/src/main/java/co/aikar/commands/MinecraftTiming.java
You obviously don't need the Timings v1 part.
And as Deamon said, anything child event that fires as part of your tick listeners will come under your tick event handler.
As that code looks pretty generic, it is not confusing to find misc items under it.
Yes, in this case timings is not helping you, and a profiler would help. I don't disagree there. Just please don't call the tool useless. The data can only be as good as it is fed, and Ender is not providing it any information about how to define what work the tick handler is doing.
But Timings absolutely helps server owners identify culprits in the middle of unexpected lag or spikes (if a spike is unpredictable, a profiler isn't going to do you much good there)
Then its up to developers to help the system in labeling its code sections.
Then it's still useless for tracking down the reasons for those lag spikes. It may help server owners in the decisions which mods to remove (or give false leads, e.g. Ender IO is usually spending more than half its runtime in other mods' code, may it item, fluid or energy API calls), but it certainly doesn't help mod authors in finding hotspots.
Maybe you should have advertised using your API to mod authors before you advertised installing it to server owners?
@HenryLoenwind, looks like almost all the timings reports are showing EnderIO and you're still unsure on how it would lag? I would like if you provide any test build that could help debugging this issue to test in my server. That's another Timings report: https://timings.aikar.co/?id=70f9343bed304a648fcc03a5f8146964
@Eufranio as already pointed out - EnderIO conduits do little work on their own - they mostly interact with other blocks. So what is needed is either a reproducible test case or a usable detailed profiler report.
And these "Timing reports" do not contain enough information to even start debugging the problem.
ok, you connected a fluid conduit in one chunk to a tank or machine in another (now unloaded) chunk. That case I can handle. But please don't complain if your liquid gets voided because the high-prio output was in an unloaded neighbor chunk and the conduit put the liquid into your ultra-low-prio trashcan instead.
All chunks except spawn area should be unloaded. During warmroast was server empty. Only tileentities from EnderIO remains loaded. Idk why.
I think problem is too many calls of Ender IO::EnergyUpgradePowerAdapter_attachCapabilities.
Maybe because you are still using that age-old version that has no protection against being ticked after a chunk is unloaded and thus re-loading it?
Please kill me now. I was hoping this was isolated issue then I started googling this issue and oh man this is not so isolated.
We are having the same issue with AoE pack. When someones base is unloaded as in logged off or left their chunks area without any third party chunk loaders mind you then the lag starts to happen. We at first thought was IC2 Enet and Calculator as every 30 seconds same time they would spam console about a registration error. Then we removed calculators compat support with IC2 and it was still happening. So I started to profile it and found with 2 different profilers that it kept pointing to conduits in Ender IO.
The 2 locations that caused the most lag when they logged off our left their chunks both have Ender IO Conduits quite a bit but not insane amounts and like I said you load the chunks no lag or performance issues you unload them and good bye server and hello stalls!
Here is a profile I did via tickprofiler. I'm more then happy to provide other profiler tests, etc s long as I'm not wasting my time reporting it and not getting any action back.
I think issue should be reopened. @HenryLoenwind
Whatever. I can do nothing with the available data, so it doesn't matter for me if the ticket is open or not
I've got tons of reports showing that EnderIO Conduits are causing lag, don't know if these are specifically to chunk loading, do you want them?
(Using YourKit profiler)
Funny, I do not have a single unhandled profiler report. Just some unhelpfull reporrts like "Ender IO TileEntities stay loaded", or "me, too!!!11!!!!"
Well do you want them or not?.. If you do, I will check if I can profile my servers tomorrow and send you some results.
I have about 40 nps reports to go through but, on 2 different servers(sf3 and a beyond) both servers had severe issues with enderio conduits of this nature. Any time any sort of system involving the item pipes and with/without redstone conduit control I cause a huge strain. Between the two networks several sponge/forge versions have been tested. http://i.imgur.com/Nof2aEk.png was a quick snip I grabbed before coming here. Every spike trigger report, every profiler report, and every sampler report shows those particular pipes consistently. Like yourself I am extremely busy so if there are specific nps reports you would like (Low load -2-5, higher load 10-20 people, 30 people?) or with a specific forge version let me know. I will do my best to get those compiled using whatever tool you prefer to generate them. As it stands I will likely have to remove those entities already present and blacklist those pipes in particular soon.
Just send me whatever you think shows what's happening. Something that actually breaks down the conduit work as much as possible. A lump sum of all item transport+all fluid transport+all redstone logic+all AE2 transport+all OC connections just isn't helpful.
Going to have to give me a moment. I think I am going to have to migrate the main server to my dev machine to get some accurate reports.. I think our host has some ram going out.
edit: Working on gen'ing some good reports and in the process I noticed that it appears itempipes/powerpipes are acting as chunkloaders even if the player(or other players) are not online/near the area. Also should I enable verbose logging in the ender config? It seems most of the output is pretty generic atm.
Grabbed this during one of the mini spikes. Filtered using Ender IO timings
Hey dev here is your proof that conduits are broken
and heres the NPS file
https://www.dropbox.com/s/2p1vmnvu2lbdv02/10minsample.nps?dl=0
@DreiPixel What mods are you using, what versions, what forge, whats going on in the world, etc. He'll probably want to know this extra information.
I've also got that particular AbstractConduitNetwork thing in my own server, using up quite a bit of ticks. Don't have any additional information to provide to help solve any possible problems. Quite a few players constantly use the Ender IO conduits on my server and a lot are in use. Might try and get them to use other pipes to see if it is the Ender IO conduits causing a bit of our server lag.
Basically, if most of the conduits are gone but the same amount of stuff is going on and the tickrate/lag has decreased, then the conduits might be a problem.
Also keep in mind that conduits interact with lots of inventories / tile entities - and these also take time but it is accounted to the conduits.
If you setup conduits to pull from a large amount of storage (eg many chests or very big inventories with many slots) and let it insert into many full or incompatible inventories then this will constantly consume CPU time just to check if one of the many items from the source inventories can fit into any of the destination inventories.
@DreiPixel this stack trace (from your screenshot) is part of a network setup - eg on chunk (un)load. If you have setup something which constantly loads and unloads chunks with conduits crossing this chunk boundary then this will cause constant network recreation (which uses much more CPU time then just running a not changing network). Keep in mind that many things can cause chunk loading - including vanilla things like hoppers or redstone updates on a chunk boundary.
@wghats
the dev wont fix it until he has "proof" that the conduits create lag. (even tho there is enough proof)
he is too lazy to search the problem for himself since the conduits "work".
If you know what is lagging, feel free to fix it yourself.
Up to now, the only "proof" if lagging I got is people giving me reports from more or less questionable tools. Not a single one has managed to provide me with a profiler report of the built-in Minecraft profiler---the one where we put extensive data logging to find what is actually happening. Reports that simple point to "all conduits" are more than useless. For one, conduits have tens of thousands of lines of code, and then most of what conduits do is interacting with other blocks.
You are like a guy who sends his wife to fetch beer from the store and then gives her a beating for taking too long.
For fun (I admit this is an odd definition of 'fun'; I don't get out much) I used VisuamVM to open up a snapshot (created by Player's sampler).
1093 TileConduitBundle (predominantly power, followed by item, fluid, and a smattering of redstone) were taking around two percent of the total sample time. One fifth that of EntityLiving.
While I'm happy to admit that one sample does not a definitive conclusion make, I'm not seeing the TPS issue alleged here.
@Ommina heh yeah - I had a world where I only had 5-10 TPS - until I "removed" ~300 sheep and chickens around my starting area.
If 300 sheep and chickens at spawn caused you 5-10 tps then you got other issues lol.