Server becomes unresponsive, when admin closes they see this crash report

Question

Server becomes unresponsive, when admin closes they see this crash report

KayoticEntropy opened this issue 10 years ago · 5 comments

KayoticEntropy commented 10 years ago

Hi. I have an issue with FTB Infinity Evolved expert mode version 2.2.0. The server will hang, admin commands are not even going through, and all players time out. Once reaching this state, shutting down the server creates a crash log. When restarting the server, we either get the same crash log or a different one that is still related to logistics pipes. Then we revert to a backup and are able to play for 1-48 hours before it happens again.

Included are a list of crash logs that took place yesterday:

http://pastebin.com/iy7NZ9xy
http://pastebin.com/WJ7mbSZU
http://pastebin.com/Y9FEKXzK
http://pastebin.com/U36G2rWh

My hunch is that some of these have to do with automated requests that were still being fulfilled when the server was closed; upon bringing the server up, potentially not all of the logistics system is in the same chunk, and stale or invalid references are used for the RequestTree compare. Anyway, let me know if you have any questions or need any more info in order to debug this issue.

davboecki commented 10 years ago

27d823e

fluffle · Answer 1 · 2016-02-05T21:49:14.000Z

Hi,

We're running into this too. It looks like there's the possibility of an infinite loop on the main server thread here:

https://github.com/RS485/LogisticsPipes/blob/stable-bc7/common/logisticspipes/modules/ModuleCrafter.java#L242

If something happens which means that _lostItems.poll() never returns null, this will lock the server thread forever.

fluffle · Answer 2 · 2016-02-05T22:11:22.000Z

Also #877 and probably #871.

fluffle · Answer 3 · 2016-02-07T23:40:39.000Z

Ah, something just occurred to me. AIUI (I'm not 100% familiar with Java DelayQueue behaviour), the code puts lost items back onto the queue with a 5ms delay, I suspect with the intention of processing the entire queue once per tick.

If, however, the queue takes significantly longer than 5ms to process once, it can potentially process forever, because the first item's delay will be up by the time the last one is processed. I think a different implementation that explicitly processes the queue only once per tick should avoid this lock-up on the server thread. Perhaps a "double-buffer" approach, where you have two queues A+B, and process from A->B one tick and B->A the next, would work.

fluffle · Answer 4 · 2016-02-07T23:31:23.000Z

So I patched the code to write log messages after 9990 iterations and bail out after 10k, then rebuilt LP for our server. I can confirm that that particular loop can reach 10k iterations when there's a lot of crafting load. I tried to make 320 big reactor turbine casing, in this instance. I think the problems arise when > 1 stack of inputs need to go to a machine, and the excess has to be routed away. This results in a lot of pipe traffic, because as soon as space for one input becomes available due to a machine finishing a cycle, all of the remaining inputs are shoved back to that machine and must be routed away again.

[22:38:36] [Server thread/ERROR] [FML]: Re-adding pair: item = '1x minecraft:Ender Pearl, 368:0', lost items length = 43.
[22:38:36] [Server thread/ERROR] [FML]: Re-adding pair: item = '1x minecraft:Ender Pearl, 368:0', lost items length = 43.
[22:38:36] [Server thread/ERROR] [FML]: Re-adding pair: item = '1x minecraft:Ender Pearl, 368:0', lost items length = 43.
[22:38:36] [Server thread/ERROR] [FML]: Re-adding pair: item = '1x minecraft:Ender Pearl, 368:0', lost items length = 43.
[22:38:36] [Server thread/ERROR] [FML]: Re-adding pair: item = '1x minecraft:Ender Pearl, 368:0', lost items length = 43.
[22:38:36] [Server thread/ERROR] [FML]: Re-adding pair: item = '1x minecraft:Ender Pearl, 368:0', lost items length = 43.
[22:38:36] [Server thread/ERROR] [FML]: Re-adding pair: item = '10x minecraft:Ender Pearl, 368:0', lost items length = 43.
[22:38:36] [Server thread/ERROR] [FML]: Re-adding pair: item = '1x minecraft:Ender Pearl, 368:0', lost items length = 43.
[22:38:36] [Server thread/ERROR] [FML]: Re-adding pair: item = '1x minecraft:Ender Pearl, 368:0', lost items length = 43.
[22:38:36] [Server thread/ERROR] [FML]: Re-adding pair: item = '2x minecraft:Ender Pearl, 368:0', lost items length = 43.
[22:38:36] [Server thread/ERROR] [FML]: Reached 10k iterations in ModuleCrafter.tick(), bailing out.
[22:38:36] [Server thread/ERROR] [FML]: Item retrieved = '1x minecraft:Ender Pearl, 368:0', lost items length = 42.
[22:38:36] [Server thread/WARN]: Can't keep up! Did the system time change, or is the server overloaded? Running 12511ms behind, skipping 250 tick(s)

Anyway. While I'm reasonably sure this is where the bug is, I don't have any clever recommendations w.r.t. fixing it. Sorry! I can create a PR with the logging changes if you are interested, but I'm not sure they are really "production ready".

Share to