Possible massive lag bug in .8515
BonaireDreams opened this issue · 16 comments
Minecolonies version
version .8518
Expected behavior
20 tps
Actual behaviour
2-7 tps
Steps to reproduce the problem
Sorry but I do not have logs or details as the players were revolting that I was killing their Sunday gaming time so rolled back without detailed testing. Here's what information I do have.
Running a slimed down DW20 1.12 (down to about 120 mods) with minecolonies added in.
Server runs on a linux vm on a dell R710, 6 cores 32 Gigs, dedicated 512GB SSD assigned to vm, Minecraft server allocated 12GB and is the only thing running in the VM
With Minecolonies .8402 and any previous versions we had fairly stable TPS with 4-5 player online. Each player has a fairly early colony with only 4 citizens. (Townhall, builder, delivery man, warehouse, shepherd) We'd see the occasional tps drop to 17/18 depending on what people were doing or if dynmap was doing a render but it was fairly steady at 20.
Updated Minecolonies to .8515 yesterday and saw immediate server lag. With 3 people on TPS was 2-3. Had people log off 1 at a time while I spammed /tps from console. We'd see 5-7tps with 2 players. Then 8-11 with 1 player.
Had everyone log off. Rolled server and all clients back to .8402. restarted, and we're back to 20tps with 5 people online.
We did run lag goggles report and with 4 people on the Event Subscribers for minecolonies was at 43000 s/t.
Sorry I don't have logs or more details.
@Raycoms Pinging you.
So we are going to need that log goggles report. And if you can attach a JVM Profiler and get the result from that as well. We passed a Lag busting PR last week…… So no idea what is happening.
@BonaireDreams Yes please update to the latest Alpha, you are running around 400 versions behind.
@OrionDevelopment I had updated to the latest Alpha (version .8518), that's where we experienced the lag. We rolled back to .8402 as it was stable and has no lag. I checked the server and my client to see if laggoggles hopefully saved a log but was unable to find one. I'll have to see if there's some time this week when no one is online as I'll have to take down the server, update back to .8518 just to run the laggoggles report, then roll back to .8402 so the players can back on. If there's a quiet evening I'll do that and screencap the laggoggles report and will run warmroast. My phone just explodes with texts the moment I take the server down. :p but will see what additional info I can capture.
Op'd one of my players remotely from work this morning and grabbed some baseline on .8402. I'll upgrade server and my instance back to .8515 after work before everyone starts coming online and will get a report running on that version. My expectation is that MC's Event Subscribers will be back up between 27,000 and 43,000.
BASELINE on .8402.
With 2 Players online:
[09:34:00] [Server thread/INFO] [TickProfiler]:
| E | TE | C |
overworld/0 | 531 | 972 | 1516 |
MiningWorld/6 | 0 | 12 | 5 |
ExtraUtils2_Quarry_Dim/-9999 | 0 | 0 | 0 |
2 Players | 531 | 984 | 1521 | 89.92%
20.00 TPS [ ######################################## ]
Updated Minecolonies on server and client to .8515. No other changes/updates. With only myself logged in tps would randomly dip to 12 but bounce back to 20. Lag goggles showed minecolonies around 2500 us/t. So aside from the bounce, mostly normal.
[16:32:47] [Server thread/INFO] [TickProfiler]:
| E | TE | C |
overworld/0 | 444 | 1099 | 1328 |
MiningWorld/6 | 0 | 12 | 5 |
ExtraUtils2_Quarry_Dim/-9999 | 0 | 0 | 0 |
1 Players | 444 | 1111 | 1333 | 159.98%
12.50 TPS [ #########################~~~~~~~~~~~~~~~ ]
So I thought what else had we done when we last experienced the lag. We had 4 players on, and even having them log off one at a time it did not improve. So I /tp'd myself to all 4 bases and sat around for a minute. With each tp the lag got worse. When I got to the 4th colony I ran laggoggles again. Minecolonies had climbed to 42886 us/t
Even though I am still the only player on, its not recovering.
I logged off and back on to see if that would make a difference. Small drop, but still incredibly high.
I then rolled minecolonies back to .8402. Logged back, tp'd to all 4 bases again, spending approximately the same amount of time at each one. Then ran lag goggles again. Everything back to normal.
Hope that helps.
Can you run a WarmRoast please?
This will give us important details like Exactly what class is using the most lag. Thank you!
I'll have to see if I can figure out how to install and run that. Players online now so will try tomorrow if I can.
We have no blockbreakers yet aside from 1 rftools quarry running in the mining DIM. We're only a few days in so no one really has any machines or automation started. But I'll try the new version and see what happens.
So I tried .8573 last night. While it was an improvement, Lag Goggles still showed minecolonies using 30,000 us/t under Event Subscribers after I had TP'd to each base and was back at mine. I was the only player online. I have not yet figured out how to install/run warmroast but will see if I can get it installed onto the server this weekend.
Have not managed to get WarmRoast installed, updated my entire pack last night including the latest Minecolonies. (Pre Monster egg patch) Things seem better but there are still stutters. It may be my older Gen CPU. The single core performance of those older Xeons is not great. I have a new motherboard, i5-8600 and 16GB DDR4 ram coming and will build a dedicated MC server with that. Will see how it goes.
@Raycoms Leaving this closed for the time being but thought I'd add some additional info. I have been unsuccessful in my attempts to install Warmroast. Seems the version of Debian I have does not have some of the files in the jre folder that warmroast requires and I'm not sure how to get them. (ie attach.dll) I did make some more discoveries. Whenever 1 specific player joined is when the lag became terrible. I have recently upgraded the servers motherboard, ram and CPU. The CPU's single thread performance is a massive jump from our old one. (CPU Boss benchmark score increase from 1500 to 2500, its about 10 from the top of the list) The lag improved, we no longer saw tps dropping to 2 or 3 with this player, now it drops to 8 or 10.
Current running .9540
We did some more testing and fired all the players one by one (He only has 10 colonists and only 5 working) With each one the lag became better. Once they were all fired we had zero lag, even with the other players online. As soon as he hires a builder we get tps lag spikes every 20-30 seconds dropping to 8/10 tps. If he hires a DMAN it becomes even worse.
So we broke his Townhall block, let all citizens despawn, replaced townhall, hired first person as builder. Same issue.
We then shut down the server, deleted colony1.dat (his colony) deleted all files in the chunkinfo folder and started the serer back up. We placed his townhall, but now are unable to place a builder getting the error "you must be closer to your Town hall" It was 1am by this point so we stopped for the night.
Tonight I'm going to try using the delete colony command with false so that his buildings remain where they are and we can reconstruct them after.
One interesting note his colony.dat file was extremely large at 3717 kb. In comparison my colony2.dat (with 14 level 5 buildings, and 5 citizens only has a colony2.dat at 178kb) {We've actually all been breaking citizen hut blocks to reduce our colonies down to 5 citizens thinking that may be part of the cause of the lag} The entries causing the increased file size is over 8000 Work order entries. I'll open a seperate issue regarding this, but perhaps that was the cause of the lag as his builder was constantly looking through 8000 workorders to find the 1 building he is building.
Let me know if I can provide you any more details... I wish I could get Warmroast running to help.
did you try the rs reset command for the colony? And can you provide us the colony.dat?
Hi @Raycoms I had tried /mc colony resfresh colony: 1 and it said refreshed. I'm guessing the rs reset is different. Its a little slow at work so I just finished removing all 13K work order entries from colony1.dat. The PLayer sits across form me and he will test things tonight after work. I'm teaching Scuba tonight so won't be home until after 11pm PST. He'll still be online and can report to me if this fixed the lag issue. I did open Issue #2549 to track this colony.dat challenge. Here is a link to the big colony1.dat before I modified it. (I'll include it in 2549 as well) https://www.dropbox.com/s/f8z3hnxy1zg0pud/colony1.zip?dl=0