Extreme lag and client disconnects when shuffling multiple storage cells in an IO Port - Possible Functional Storage interaction?
mjevans opened this issue · 11 comments
Describe the bug
NOTE: No crashlog generated yet, Please see my request for how to collect useful data in the Additional Details section.
I've been trying to shuffle 18x 4K cells through an IO Port to unload items that have been added to a (storage drawers) Functional Storage setup. As usual it's a huge external inventory added by another mod, attached to a single Storage Bus.
This caused EXTREME lag spikes. Enough to disconnect other users on the server, enough to disconnect me doing it.
My goal is to eventually migrate entirely away from Cell storage so that I can use multiple mods against the same storage devices (E.G. AE2, RS, Xnet, etc)
How to reproduce the bug
I'm on a 1.18.2 pack, Enigmatica 8 2.2.1 ( + the Performant mod Beta )
forge/1.18.2-40.1.73
appliedenergistics2-11.1.6.jar
Applied-Mekanistics-1.2.1.jar
AE2-Things-1.0.3-beta.jar
functionalstorage-1.18.2-1.0.5.jar
Server specific addition:
performant-1.18.2-6.9.jar (Patreon paywalled Beta for 1.18.2)
++
My exact setup includes an Import Bus and ME Crafting Terminal like 128+ 3d blocks away on a Fluix ME Glass Cable line (the sleeping area, chunkloaded), it's close enough laterally that the intervening blocks should be chunkloaded by a player standing in either location, both endpoints of the cable are FTB Chunks chunkloaded.
There's a single ME Controller block, 2 drive bays, and the IO Port. A Storage Bus attaches to the Functional Storage controller roughly in the middle of an F3+G aligned 16x16x16 chunk, and several large banks of storage drawers are on the walls within the same chunk, attached to that block. They've been configured to DISABLE item counts, DISABLE item renders, and LOCK to not accept unregistered items (there are many empty drawers). The drawers still show 'drawer upgrades' on them (it's an option but hasn't been disabled since it shouldn't change or draw many resources).
The server's filesystem is ZFS, spinning rust (magnetic media) drives, and probably has low spare IOps and lower bandwidth generally. The lag involved feels similar to if fsync() has been called after every transferred item; which hopefully isn't the case but might be descriptive of the sort of stop the world, everything is waiting on this one thing event.
Expected behavior
No server lag, no tick lag, possibly slower item transfer if there's a wait.
Additional details
Is there any way to collect additional debug logging about why these events might be happening? Possibly some way of turning on diagnostic message items for E.G. the Storage Bus (cache / use), the IO Port progression, and slow operations generally?
I would prefer to enable any such additional diagnostic information, even if an attempt to reproduce (which I might not be able to on my local, not Raid-Z systems that do have SSDs). I would like good useful telemetry when any is generated, should such extreme measures be required as a restoration from the backups to a local test case, and crash mid-shuffle.
Which minecraft version are you using?
1.18
On which mod loaders does it happen?
Forge
Crash log
No Logs yet, didn't think to crash the server last night at 2AM when I was shuffling items.
Retested, same test case with only the update to Functional Storage 1.0.6 -- I don't even need to look at the results, it's clearly and massively improved. Zero insertion lag.
I thought about the larger meta issue in the back of my mind for a bit. It's possible the root cause is the lack of a good 'multiple first party storage system' compatible interface / API for mods to exchange path routes to storage with.
A storage tuple might include E.G. (Hops, Mod, Dimension, BlockPos, UnitDetails[List of Items/Fluids and counts, also if the item is distinct or a variation]); this would be useful when considering things like Compacting Drawers which express E.G. Block, Ingot, Nugget but from a common pool of storage. It would also allow for a list of items to be exported while maintaining their 'location' in the world and thus allow other mods to know there might be multiple paths to a single storage item. It would also allow for multiple storage systems exporting their reachable blocks to each other and the culling of less direct routes.
It isn't a crash, bit it is a series of thread dumps run while trying to reproduce the issue. Severe lag (waiting for server) did occur during these tests, but I was not booted off the server; probably because many of the items have already been stored in the external drawers.
https://gist.github.com/519b4fed6fea503ac2b84c0f838ff64f
This is the shell command I used to generate the dump files, while attempting to lag myself out with storage cell shuffles as described in the initial report.
while true ; do jcmd 28165 Thread.print > $(date +"jcmd-pid-Thread.print--%Y%m%d-%H%M%S--%s.txt") 2>&1 ; sleep 1 ; done
Forgot to add this to my collection list:
java --version
openjdk 17.0.4.1 2022-08-12 LTS
OpenJDK Runtime Environment Zulu17.36+17-CA (build 17.0.4.1+1-LTS)
OpenJDK 64-Bit Server VM Zulu17.36+17-CA (build 17.0.4.1+1-LTS, mixed mode, sharing)
- One more edit +
This seems to be a likely suspect from one of the thread traces I collected, from the jcmd-pid-Thread.print--20220905-074313--1662363793.txt file.
Though I don't know how to figure out if it's AE2, Functional Storage, neither one (some other mod), a Forge issue (needs a better API for a proper fix?), or something I hadn't considered.
"Server thread" #66 prio=8 os_prio=0 cpu=141373.89ms elapsed=240.59s tid=0x00007fc113c89e60 nid=0x6e75 runnable [0x00007fc0754fa000]
java.lang.Thread.State: RUNNABLE
at com.buuz135.functionalstorage.block.tile.ControllableDrawerTile.isVoid([email protected]/ControllableDrawerTile.java:251)
at com.buuz135.functionalstorage.block.tile.DrawerTile$1.isVoid([email protected]/DrawerTile.java:54)
at com.buuz135.functionalstorage.inventory.BigInventoryHandler.getSlots([email protected]/BigInventoryHandler.java:33)
at com.buuz135.functionalstorage.inventory.ControllerInventoryHandler.insertItem([email protected]/ControllerInventoryHandler.java:46)
at appeng.me.storage.ExternalStorageFacade$ItemHandlerFacade.insertExternal([email protected]/ExternalStorageFacade.java:134)
at appeng.me.storage.ExternalStorageFacade.insert([email protected]/ExternalStorageFacade.java:54)
at appeng.me.storage.CompositeStorage.insert([email protected]/CompositeStorage.java:47)
at appeng.me.storage.DelegatingMEInventory.insert([email protected]/DelegatingMEInventory.java:41)
at appeng.me.storage.MEInventoryHandler.insert([email protected]/MEInventoryHandler.java:80)
at appeng.me.storage.NetworkStorage.insert([email protected]/NetworkStorage.java:145)
at appeng.blockentity.storage.IOPortBlockEntity.transferContents([email protected]/IOPortBlockEntity.java:350)
at appeng.blockentity.storage.IOPortBlockEntity.tickingRequest([email protected]/IOPortBlockEntity.java:294)
at appeng.me.service.TickManagerService.unsafeTickingRequest([email protected]/TickManagerService.java:358)
at appeng.me.service.TickManagerService.tickQueue([email protected]/TickManagerService.java:121)
at appeng.me.service.TickManagerService.tickLevelQueue([email protected]/TickManagerService.java:93)
at appeng.me.service.TickManagerService.onLevelEndTick([email protected]/TickManagerService.java:78)
at appeng.me.Grid.onLevelEndTick([email protected]/Grid.java:262)
at appeng.hooks.ticking.TickHandler.onServerLevelTickEnd([email protected]/TickHandler.java:282)
at appeng.hooks.ticking.TickHandler.onLevelTick([email protected]/TickHandler.java:248)
at appeng.hooks.ticking.TickHandler$$Lambda$7986/0x0000000802211e20.accept([email protected]/Unknown Source)
at net.minecraftforge.eventbus.EventBus.doCastFilter([email protected]/EventBus.java:247)
at net.minecraftforge.eventbus.EventBus.lambda$addListener$11([email protected]/EventBus.java:239)
at net.minecraftforge.eventbus.EventBus$$Lambda$3031/0x00000008019bc8e0.invoke([email protected]/Unknown Source)
at net.minecraftforge.eventbus.EventBus$$Lambda$4314/0x0000000801b0f038.invoke([email protected]/Unknown Source)
at net.minecraftforge.eventbus.EventBus.post([email protected]/EventBus.java:302)
at net.minecraftforge.eventbus.EventBus.post([email protected]/EventBus.java:283)
at net.minecraftforge.event.ForgeEventFactory.onPostWorldTick([email protected]/ForgeEventFactory.java:900)
at net.minecraft.server.MinecraftServer.m_5703_([email protected]/MinecraftServer.java:908)
at net.minecraft.server.dedicated.DedicatedServer.m_5703_([email protected]/DedicatedServer.java:326)
at net.minecraft.server.MinecraftServer.m_5705_([email protected]/MinecraftServer.java:842)
at net.minecraft.server.MinecraftServer.m_130011_([email protected]/MinecraftServer.java:685)
at net.minecraft.server.MinecraftServer.m_177918_([email protected]/MinecraftServer.java:261)
at net.minecraft.server.MinecraftServer$$Lambda$41209/0x0000000805457260.run([email protected]/Unknown Source)
at java.lang.Thread.run([email protected]/Thread.java:833)
Hi, could you provide us with a spark profile* ? That would make figuring out this issue a lot easier, as the raw thread dump is not super useful.
* Install the spark mod on your server, run /spark profiler --timeout 30s
(or similar, I don't remember exactly), and then reproduce the issue while the profiler is running.
Another question: how many drawers are there in the drawer bank, more or less ?
To give you a bit more insight on what I think is going on... We actually don't need to figure out the exact path across the drawer bank; that should be handled transparently and optimized by Functional Storage already. However, Forge's item transfer API (how item-transferring mods communicate with each other, based around the IItemHandler
interface) works with a list of "slots", so if you have 10000 drawers it's possible that each attempt at inserting items would have to go through the 10000 individual drawers.
We can work with Forge to address this, but first I would like to see a profile to see if this is really the issue, or if something else is showing in the profile that we can address on our end.
At least 244 + 28 (1444 + 14*2), but about the a third of those drawers are probably empty at the moment, waiting block forms of the ingots if I ever want to store them.
I think currently, partly installed, one wall is 6 high, and the others are 4 or 5 high, though about two of those rows contain the 2x2 drawers. Some of the drawers have stack size and void upgrades; not sure if either affects how the storage would be presented. The end goal was to expand that as necessary to store most of the items and use chests or something for the one offs. Smart Storage or something like that added 'netherite' chests which were 11x9 slots; there are a ton of unique loot items and armor drops in the system so when I tried dumping across those too at a lower priority it filled up 4 and still hadn't drained the drives. I would classify the storage setup as towards the extreme end, but not uncommon to find. It's likely the obvious answer and one step under the classic storage shenanigans of DSUs banked into virtual chests with subnets.
I wasn't sure if they were 'too big', but some oddities with items dynamically fluctuating in and out filter results made me Insert Bus drain them back.
I'll also add that I have almost nothing else going on in my base so far, this is mostly the results of a lot of mining gadget and dungeon lootr chests stored. There's no automated ore processing nor any major industry interacting with the system so far.
+++
How can this issue be profiled? Ideally without making life a pain for others on the server?
Could be the netherite chests, who knows. These have a lot of slots.
How can this issue be profiled?
Install spark on the server, then profile while you reproduce the issue. The others will be a bit annoyed but it doesn't need to last for too long, just one minute or so for the profile to be able to capture the issue.
Thank you very much for the help, Spark is exactly the right tool to get the data I needed.
Before I open another bug report, is my understanding and interpretation of the information mostly correct, and is there anything to correct or helpfully add?
I ran the test case running on my home server dev copy of the world. It seems sufficient to point blame. It also experienced almost the same results as the reproducible but lesser test case (just after last night's daily backup).
This is my dev server idle, just my player login. https://spark.lucko.me/SGcRoYxoRW
This is my dev server, capture started just before and stopped just after the laggy transfer. It was not able to reproduce the extreme desync that happened the night before when initially draining all of the items that had matching drawer slots out.
https://spark.lucko.me/YJt0xNNrWX
Blame lines
net.minecraftforge.event.ForgeEventFactory.onPostWorldTick()84.18%
net.minecraftforge.eventbus.EventBus.post()84.18%
net.minecraftforge.eventbus.EventBus.post()84.18%
net.minecraftforge.eventbus.EventBus$$Lambda$4307.0x0000000801bdaba8.invoke()84.17%
net.minecraftforge.eventbus.EventBus$$Lambda$3030.0x00000008019b7400.invoke()83.30%
net.minecraftforge.eventbus.EventBus.lambda$addListener$11()83.30%
net.minecraftforge.eventbus.EventBus.doCastFilter()83.30%
appeng.hooks.ticking.TickHandler$$Lambda$8029.0x0000000802219930.accept()83.17%
appeng.hooks.ticking.TickHandler.onLevelTick()83.17%
appeng.hooks.ticking.TickHandler.onServerLevelTickEnd()83.17%
appeng.me.Grid.onLevelEndTick()83.16%
appeng.me.service.TickManagerService.onLevelEndTick()83.15%
appeng.me.service.TickManagerService.tickLevelQueue()83.15%
appeng.me.service.TickManagerService.tickQueue()83.15%
appeng.me.service.TickManagerService.unsafeTickingRequest()83.15%
appeng.blockentity.storage.IOPortBlockEntity.tickingRequest()83.07%
<<< Pulled left
appeng.blockentity.storage.IOPortBlockEntity.transferContents()83.07%
appeng.me.storage.NetworkStorage.insert()73.20%
(1) appeng.me.storage.MEInventoryHandler.insert()38.44%
appeng.me.storage.DelegatingMEInventory.insert()38.44%
appeng.me.storage.CompositeStorage.insert()38.44%
appeng.me.storage.ExternalStorageFacade.insert()38.43%
appeng.me.storage.ExternalStorageFacade$ItemHandlerFacade.insertExternal()38.43%
com.buuz135.functionalstorage.inventory.ControllerInventoryHandler.insertItem()38.33%
com.buuz135.functionalstorage.inventory.BigInventoryHandler.getSlots()34.16%
com.buuz135.functionalstorage.block.tile.DrawerTile$1.isVoid()12.47%
com.buuz135.functionalstorage.block.tile.ControllableDrawerTile.isVoid()12.47%
net.minecraftforge.items.ItemStackHandler.getStackInSlot()1.56%
asm_sysvec_apic_timer_interrupt (native)0.01%
java.util.ArrayList$Itr.hasNext()1.22%
com.buuz135.functionalstorage.inventory.BigInventoryHandler.insertItem()0.48%
com.buuz135.functionalstorage.inventory.CompactingInventoryHandler.insertItem()0.02%
appeng.api.stacks.AEItemKey.toStack()0.04%
net.minecraft.world.item.ItemStack.isEmpty()0.02%
I2C/C2I adapters (native)0.01%
(2) appeng.me.storage.MEInventoryHandler.isPreferredStorageFor()34.75%
appeng.me.storage.DelegatingMEInventory.extract()34.75%
appeng.me.storage.CompositeStorage.extract()34.73%
appeng.me.storage.ExternalStorageFacade.extract()34.73%
appeng.me.storage.ExternalStorageFacade$ItemHandlerFacade.extractExternal()34.73%
com.buuz135.functionalstorage.inventory.ControllerInventoryHandler.getStackInSlot()34.69%
com.buuz135.functionalstorage.inventory.BigInventoryHandler.getSlots()32.28%
com.buuz135.functionalstorage.block.tile.DrawerTile$1.isVoid()10.65%
asm_common_interrupt (native)0.01%
asm_sysvec_apic_timer_interrupt (native)0.01%
com.buuz135.functionalstorage.inventory.BigInventoryHandler.getStackInSlot()0.49%
appeng.api.stacks.AEItemKey.matches()0.02%
appeng.me.cells.BasicCellInventory.extract()0.02%
appeng.me.storage.DriveWatcher.insert()0.01%
appeng.api.storage.StorageHelper.poweredInsert()9.74%
appeng.api.storage.StorageHelper.poweredInsert()9.74%
appeng.me.storage.NetworkStorage.insert()9.73%
appeng.me.storage.MEInventoryHandler.insert()5.16%
appeng.me.storage.DelegatingMEInventory.insert()5.16%
appeng.me.storage.CompositeStorage.insert()5.16%
appeng.me.storage.ExternalStorageFacade.insert()5.16%
appeng.me.storage.ExternalStorageFacade$ItemHandlerFacade.insertExternal()5.16%
com.buuz135.functionalstorage.inventory.ControllerInventoryHandler.insertItem()5.15%
com.buuz135.functionalstorage.inventory.BigInventoryHandler.getSlots()4.65%
com.buuz135.functionalstorage.block.tile.DrawerTile$1.isVoid()1.68%
com.buuz135.functionalstorage.block.tile.ControllableDrawerTile.isVoid()1.68%
net.minecraftforge.items.ItemStackHandler.getStackInSlot()0.16%
net.minecraftforge.items.ItemStackHandler.validateSlotIndex()0.09%
net.minecraft.core.NonNullList.get()0.07%
java.util.ArrayList$Itr.hasNext()0.14%
com.buuz135.functionalstorage.inventory.BigInventoryHandler.insertItem()0.08%
appeng.api.stacks.AEItemKey.toStack()0.01%
appeng.me.storage.MEInventoryHandler.isPreferredStorageFor()4.57%
appeng.me.storage.DelegatingMEInventory.extract()4.57%
appeng.me.storage.CompositeStorage.extract()4.57%
appeng.me.storage.ExternalStorageFacade.extract()4.57%
appeng.me.storage.ExternalStorageFacade$ItemHandlerFacade.extractExternal()4.57%
com.buuz135.functionalstorage.inventory.ControllerInventoryHandler.getStackInSlot()4.55%
com.buuz135.functionalstorage.inventory.BigInventoryHandler.getSlots()4.25%
com.buuz135.functionalstorage.block.tile.DrawerTile$1.isVoid()1.36%
com.buuz135.functionalstorage.block.tile.ControllableDrawerTile.isVoid()1.36%
com.buuz135.functionalstorage.inventory.BigInventoryHandler.getStackInSlot()0.07%
appeng.api.stacks.AEItemKey.matches()0.01%
There are two parts of this which seem questionable to me.
(2) about 34% of the issue seems to be checking what's in a slot. A short duration (5s or less) or refresh on fail item cache might really help here. I'm not sure if it's unexpected for this operation to take as long as it does. If it's expected to be an expensive function then AE2's the blame for calling it so much. If it's expected to be fast, then I should probably look into opening a Functional Storage bug report about that.
appeng.me.storage.ExternalStorageFacade$ItemHandlerFacade.extractExternal()34.73%
com.buuz135.functionalstorage.inventory.ControllerInventoryHandler.getStackInSlot()34.69%
com.buuz135.functionalstorage.inventory.BigInventoryHandler.getSlots()32.28%
com.buuz135.functionalstorage.inventory.BigInventoryHandler.getSlots()32.28%
com.buuz135.functionalstorage.block.tile.DrawerTile$1.isVoid()10.65%
(1) about 38% seems to be insertItem() attempts, I wonder if that's because the drawers are exposing all of the open, but locked, drawers as empty slots but refuses items storage as they don't match? All of the items that can go into the drawers should already be inside of them, so I'm unclear why this is happening and this is my best guess based purely on behavior and general logic.
appeng.me.storage.ExternalStorageFacade$ItemHandlerFacade.insertExternal()38.43%
com.buuz135.functionalstorage.inventory.ControllerInventoryHandler.insertItem()38.33%
com.buuz135.functionalstorage.inventory.BigInventoryHandler.getSlots()34.16%
com.buuz135.functionalstorage.block.tile.DrawerTile$1.isVoid()12.47%
com.buuz135.functionalstorage.block.tile.ControllableDrawerTile.isVoid()12.47%
(0) Between the two cases, why the heck is it spending 23% of the server time (compared to the 73% of used overall) re-checking is the drawer void? That one is for sure a bug report I need to look into filing.
Before I open another bug report, is my understanding and interpretation of the information mostly correct, and is there anything to correct or helpfully add?
Looks like they closed one of the parent issues in the mod.
Buuz135/FunctionalStorage#75
Buuz135/FunctionalStorage#81
Hopefully it's fixed in the next version. I might have to re-test my test case against this later.