Storage Drawers

Storage Drawers

151M Downloads

Performance problems with massive item transfers

yueh opened this issue ยท 17 comments

commented

A heads up as I am currently testing some massive/extreme cases of item transfers with AE2 in 1.10 for IItemHandler. Using 1.10.2-3.2.7 for it.

I am speaking about item transfers it the range of 10k-100k items/t and a cube with 12x12x(4 to 12) drawers, which certainly fits the ridiculuous level. But that is where the magic starts ๐Ÿ˜‰.

Attached a visualvm snapshot.
Most notable is the Vec3i.hashCode(), which is actuall used by TileEntityController.isDrawerEnabled / TileEntityController.getGroupForCoord. Thus most is not actually spend in inserting or extracting the items but finding the drawers and if they are enabled.

https://dl.dropboxusercontent.com/u/7021308/1.10.2-storage-drawers.nps

commented

Thanks, I saw some reference to this test case on reddit, must have been yours. Would you mind sharing your test case world? I'm not quite sure what ridiculous scale is, but it would be helpful to iterate on performance.

What is AE2's approach to inserting/extracting through the interface? I expected to see more traffic in getStackInSlot but it basically never shows up.

On the other hand I'm not too surprised to see insertItemFullScan light up as hard as it is, which is the fallback mechanism when you think you've looked at all the slots and there's no space for an item. That last slot is virtual and will try and find a suitable place along with ore-dictionary-coercion if necessary. There's some room for improvement in that process.

commented

Basically just IO Ports with 3 Acceleration Card and creative cells to contain cobble of whatever you want. Each will infinitely produce 2k items/tick of the configured item. Put a storage bus on the drawer controller and add as many IO Ports as needed. Switch toggle the transfer mode to extract stuff from the drawers (Note: Only works with the configured items and you might want to switch them to be only moved once the cell is "full").

Also IO Ports go to sleep, once the target inventories are full. You can wake them by cycling the cell or any button notably the mode setting should be the easiest one.

In terms of actually inserting or extracting items, we use a pretty naive implementation, which just starts at slot 0 and tries to directly insert or extract the needed items. Mostly to avoid scanning the whole inventory, compare every item and then iterate again over the found slots to satisfy the request. As well as enable more advanced inventores to provide a virtual slot for the first slot, which can use a faster way to access items and prevent hoppers from iterating over thousands of slots or so. (As well as oredict conversion etc). But the design for IItemHandler is just so bad. It should never be slot aware. just a basic insert/extract from inventory and then a specialised subinterface for slot aware access, when actually needed.

We also use our own cap to use a potentially better access to an inventory once they provide it. But currently I would prefer to try and improve the IItemHandler approach as best as possible. As this benefits every other mod. For example to not just use a cache of the monitored inventory, but also map them to the first known slot and allow a two pass system, which tries this slot first before switching to the search mode.

And as said, a huge part is actually spent with searching for an active drawer, which could already mask other performance problems due to slower it so much down alone.

commented

How many drawers are attached behind the controller? (maybe a picture would be fine here). And to confirm, you're just dealing with one item type (cobblestone) and filling all the slots with it?

Your implementation is pretty good for SD, since any slot index you pick can result in the system re-routing the item to another location and terminating your search early. I also considered moving the virtual slot from the back to the front so that inventories potentially never search, but it seemed like an exotic idea at the time. Maybe not so much after all. I agree that the decision to persist slot indexes is bad.

Filling all slots with the same item type presents an interesting case. For normal mixed use where you insert an item into a system that already has that item in a partially filled slot, the search should terminate in one step. But in this all-same case, it's as bad as a linear scan while there's still capacity in the system, and then twice as bad when the system is totally full. A precise cache in the controller would be able to solve it; it would just be error-prone to develop.

commented

cube with 12x12x(4 to 12)

Some tests used just stone, another set used 6 different items (basically the first 6 shown bei JEI).
Also upgrade to max storage size. Either it simply fills too fast.

Your implementation is pretty good for SD, since any slot index you pick can result in the system re-routing the item to another location and terminating your search early. I also considered moving the virtual slot from the back to the front so that inventories potentially never search, but it seemed like an exotic idea at the time. Maybe not so much after all. I agree that the decision to persist slot indexes is bad.

It might be an idea to have the first and last slot use a virtual slot. Some mods might want to iterate in reverse over it.

A precise cache in the controller would be able to solve it; it would just be error-prone to develop.

Potentially overkill. Half of the performance is wasted in Vec3.hashCode(). Optimising this would already solve a good portion.

commented

Looks like a pretty good improvement.

The Vec3.hashCode() part is completely gone.

It is still nothing you really want on your server. The case with insert 6 * 2k/t (different items) is somewhat acceptable. For me it takes around 1/3 of the server thread. Probably ok for a SP. But then seriously. If I did not overlook an upgrade, each drawer has a max capacity of 2080 stacks and a 12x12x12 cube would be filled in 15min. Placing down about 166k drawers each day is ridiculuous.

What now appears is that ItemStack.copy() sucks. Hard.
To some degree it is AE2s fault, as we do a simualted extract for every inject to prefer inventories, which already contain these items. It might be an idea to move from an extract to query a cache. But that might be problematic as it would now also consider read only inventories as potential destination. But that might be an acceptable tradeoff. In a few cases I saw getStackInSlot turn up in the sampling as it copies the itemstack. The docs state nobody should modify them directly. But I can understand taking precaution as there are certainly some devs who will ignore it.

Out of curiosity, if have change the order we iterate over the inventory, so we enforce using the virtual slot in the last positon. Which basically changed the not that insane test with 12k/t from taking about 1/3 third of the CPU time for the server to just high enough that the sampler can pick it up. So having the first one being virtual would be a major improvement for inserting. (Real) Extracting is still bad thanks to ItemStack.copy(), but that is not something we can fix.
Also keeping both the last and first slot as virtual ones might be a good idea. I was thinking about reversing the order for extract. Thus it prefers a stack on the end of the inventory and it is more likely to keep the start of an inventory untouched, should someone sort it by hand.

commented

Does this release offer any meaningful improvement?

https://github.com/jaquadro/StorageDrawers/releases/tag/sd-3.4.0-a1

I've gone after some of the lower-hanging fruit. Avoiding world lookups and coordinate hashing for most of the controller ops, and fusing the enable/get operation everywhere else.

commented

The stack copy in getStackInSlot is less for protection and more that it's not really avoidable. Item counts aren't stored on item stacks as they are sometimes inferred procedurally. However I might still be able to improve the common path case and maybe avoid the copy. Along with officially disabling the legacy inventory support.

The virtual slot placement sounds like what I would have expected. I'll apply that change. Though maintaining the last v-slot shouldn't have much effect since it's not useful for extract.

So I tried to do some visualvm sampling while working on changes but didn't see a whole lot of conclusive results. What is your methodology for running the tests? Do you have a line of 6 IO ports in a row for example, and try to switch them all to output by hand as quickly as possible? Then switch them back when the target inventory is full? I also forgot to drop emerald upgrades on everything so that might be influencing my results as well.

commented

Maybe a screenshot might help http://i.imgur.com/kj4gHvf.png and maybe the world https://dl.dropboxusercontent.com/u/7021308/storage-drawers.zip

Left side to insert, right side to extract from the drawers. The redstone is just to force wake the IO Ports once they should go to sleep and some levers to shut down each side.

But essentially leave the IO ports off by switching the levers off, start visualvm, start profiling sampling, turn the lever + press button to force wake them and just wait some time.

commented

Spatial copying? Man. I thought I was a decent hand at AE2 but I didn't even remember that existed. Or that IO ports responded to redstone.

I'll grab the world later tonight. Thanks.

commented

Only when the dev tools are enabled in the config. It is really handy to stamp down prebuild stuff. Like here cubes of fully upgraded drawers. Otherwise it would be a PITA to fill 1728 or more drawers with upgrades.

Replicator card, shift right click a pylon to save the source spatial structure, right click to paste it. It will place it relative to the block you clicked before.

The ports only respond directly to redstone with a redstone card. But the redstone will cause a neighor update and wake them. You could also have pistons pushing blocks around. But why complicated.

Oh and be very careful with the eraser. It will delete delete up to 90000 identical and connected blocks. So do not use it on a filled 12x12x12 cube of drawers. Minecraft does NOT like dropping millions of items. (If you wonder why everthing is 1 block lower in this area...)


EDIT In case you want to see it. Tested a massive amount of different items instead of massive amounts of a few ones. With this http://i.imgur.com/AfcIJGX.png importing 576 distinct items/tick.
.nps with using the virtual vs non virtual approach:
https://dl.dropboxusercontent.com/u/7021308/1.10.2-sd-many-items-not-virtual.nps
https://dl.dropboxusercontent.com/u/7021308/1.10.2-sd-many-items-virtual.nps
Notice the massive difference in terms of DrawerItemHandler.getStackInSlot() in both.

commented

I tracked down some more hot spots outside of stack copy. GetMaxCapacity() was something that really stuck out, and turned out to have a stupid implementation that would walk the upgrade list every time. And it's called a lot. I've also added the front virtual slot.

The test world you posted has been useful, but I wonder if I'm just getting different sample results, either because of JVM/VVM versions or something else. Stack copy hasn't ever stuck out much, but stupid amounts of time are being attributed to TileEntityDrawers.getDrawerIfEnabled() for reasons I fail to understand.

New alpha build: https://github.com/jaquadro/StorageDrawers/releases/tag/sd-3.4.0-a2

commented

Yes, there's a virtual slot at the front and back. I didn't really get anything conclusive from sampling though, because I never saw getStackInSlot appear anyway. Item cloning did appear in my results, before and after, but the incidence didn't appear to be much different.

I also expect that optimization to be less helpful in the all-same-item (or most-same-item) test. Maybe a bit regressive, even, because it needs to fire up a generator just to scan a list of slots that contain X.

commented

I saw getMaxCapacity show up in 1 or 2 tries. But that might just be coincidence and the JVM decided to inline some code there. Sampling only goes so far here. But it usually gives some good hints.

Did you sample it with the virtual slot being moved to the front? As cloning an itemstack (actually ItemStack.setItem()) did only show up after I switched the order and let it insert into the last slot.
Or maybe the forge version we currently use has a bug.

commented

I did a few tests and did notice TileEntityDrawers.getDrawerIfEnabled() show up. But only in the really massive test with 73k items/tick.

With just around 10k it just is enough to show up during the sampling. If at all. I have 2 tries with extracting, which reports the server thread as 100% sleeping..

But even 10k is certainly an unrealistic case. I doubt many will ever achieve this throughput over a long time. Maybe something in the range of 1-5k on a server with many players could happen. This was not bad previously, but the changes are certainly a huge improvement. Especially once many mods start to fight for resources.

commented

Thanks for coming up with the benchmarks and running through a few builds. Lots of good improvements and they should help all mod interaction.

For the most part, the IItemHandler interaction should match what the IExternalStorageHandler implementation could have provided. The major exception is when no space can be found for a particular item. In that case I think the cost will be much higher due to the two virtual slots searching and failing and AE2 performing its own linear scan. Ideally, any attempt to further insert an item would be halted when the first slot insert failed. Not something that is applicable to generic inventories.

The changes are now out in 3.4.0.

commented

I don't expect the new v-slot to introduce problems that didn't already exist. Inserting into any other slot will potentially re-route the items if the slot is empty. Any insert/extract operation will potentially invalidate previous getStackInSlot results, regardless of slot index. Haven't heard it breaking with another mod yet.

commented

I could could even imagine someone caching the slot they inserted a stack and compare it to getStackInSlot and crash should they not be the same..

If any real issue should appear, there is certainly an option for a slotless capability, we could make use of.
AE2 already has a special one, which can replace the internal caching with a real observer pattern. But as long as there are improvements possible for the normal IItemHandler and it is still reasonable fast, we should stick to it. This of course does not solve the insert/extract part, just the getStackInSlot verification part. But there could certainly be a better solution provided for insert/extract. Either we introduce a cap or storage drawers. But it is probably a better idea to have it on our side as it is more AE2 specific.