
Server becomes unresponsive, when admin closes they see this crash report
KayoticEntropy opened this issue ยท 5 comments
Hi. I have an issue with FTB Infinity Evolved expert mode version 2.2.0. The server will hang, admin commands are not even going through, and all players time out. Once reaching this state, shutting down the server creates a crash log. When restarting the server, we either get the same crash log or a different one that is still related to logistics pipes. Then we revert to a backup and are able to play for 1-48 hours before it happens again.
Included are a list of crash logs that took place yesterday:
http://pastebin.com/iy7NZ9xy
http://pastebin.com/WJ7mbSZU
http://pastebin.com/Y9FEKXzK
http://pastebin.com/U36G2rWh
My hunch is that some of these have to do with automated requests that were still being fulfilled when the server was closed; upon bringing the server up, potentially not all of the logistics system is in the same chunk, and stale or invalid references are used for the RequestTree compare. Anyway, let me know if you have any questions or need any more info in order to debug this issue.
Hi,
We're running into this too. It looks like there's the possibility of an infinite loop on the main server thread here:
If something happens which means that _lostItems.poll() never returns null, this will lock the server thread forever.
Ah, something just occurred to me. AIUI (I'm not 100% familiar with Java DelayQueue behaviour), the code puts lost items back onto the queue with a 5ms delay, I suspect with the intention of processing the entire queue once per tick.
If, however, the queue takes significantly longer than 5ms to process once, it can potentially process forever, because the first item's delay will be up by the time the last one is processed. I think a different implementation that explicitly processes the queue only once per tick should avoid this lock-up on the server thread. Perhaps a "double-buffer" approach, where you have two queues A+B, and process from A->B one tick and B->A the next, would work.
So I patched the code to write log messages after 9990 iterations and bail out after 10k, then rebuilt LP for our server. I can confirm that that particular loop can reach 10k iterations when there's a lot of crafting load. I tried to make 320 big reactor turbine casing, in this instance. I think the problems arise when > 1 stack of inputs need to go to a machine, and the excess has to be routed away. This results in a lot of pipe traffic, because as soon as space for one input becomes available due to a machine finishing a cycle, all of the remaining inputs are shoved back to that machine and must be routed away again.
[22:38:36] [Server thread/ERROR] [FML]: Re-adding pair: item = '1x minecraft:Ender Pearl, 368:0', lost items length = 43.
[22:38:36] [Server thread/ERROR] [FML]: Re-adding pair: item = '1x minecraft:Ender Pearl, 368:0', lost items length = 43.
[22:38:36] [Server thread/ERROR] [FML]: Re-adding pair: item = '1x minecraft:Ender Pearl, 368:0', lost items length = 43.
[22:38:36] [Server thread/ERROR] [FML]: Re-adding pair: item = '1x minecraft:Ender Pearl, 368:0', lost items length = 43.
[22:38:36] [Server thread/ERROR] [FML]: Re-adding pair: item = '1x minecraft:Ender Pearl, 368:0', lost items length = 43.
[22:38:36] [Server thread/ERROR] [FML]: Re-adding pair: item = '1x minecraft:Ender Pearl, 368:0', lost items length = 43.
[22:38:36] [Server thread/ERROR] [FML]: Re-adding pair: item = '10x minecraft:Ender Pearl, 368:0', lost items length = 43.
[22:38:36] [Server thread/ERROR] [FML]: Re-adding pair: item = '1x minecraft:Ender Pearl, 368:0', lost items length = 43.
[22:38:36] [Server thread/ERROR] [FML]: Re-adding pair: item = '1x minecraft:Ender Pearl, 368:0', lost items length = 43.
[22:38:36] [Server thread/ERROR] [FML]: Re-adding pair: item = '2x minecraft:Ender Pearl, 368:0', lost items length = 43.
[22:38:36] [Server thread/ERROR] [FML]: Reached 10k iterations in ModuleCrafter.tick(), bailing out.
[22:38:36] [Server thread/ERROR] [FML]: Item retrieved = '1x minecraft:Ender Pearl, 368:0', lost items length = 42.
[22:38:36] [Server thread/WARN]: Can't keep up! Did the system time change, or is the server overloaded? Running 12511ms behind, skipping 250 tick(s)
Anyway. While I'm reasonably sure this is where the bug is, I don't have any clever recommendations w.r.t. fixing it. Sorry! I can create a PR with the logging changes if you are interested, but I'm not sure they are really "production ready".