LagGoggles

LagGoggles

9M Downloads

Does Lag Goggles Throttle Itself During Completion?

FriendlyGamer opened this issue ยท 14 comments

commented

Hi, I am concerned that during completion the mod have no regards in the game's performance or even WatchDog. Does it really tries to use all the resources it can get during the profile completion processes? Because in my logs it said this...

[00:12:12] [Thread-14/INFO] [laggoggles]: LagGoggles finished profiling!
[00:12:28] [Server thread/WARN] [minecraft/MinecraftServer]: Can't keep up! Did the system time change, or is the server overloaded? Running 16150ms behind, skipping 323 tick(s)

So I suspect for 15 or so seconds the server was halted for. Instead of finishing in the background so the game can continues to run.

Please let me know and thanks in advance!

commented

Here is tons of data I ran by to back this up...
https://sparkprofiler.github.io/#585R10vl6r at 25 seconds

https://sparkprofiler.github.io/#dAgysZuyWY at 35

https://sparkprofiler.github.io/#ShKXce3aXZ at 45

https://sparkprofiler.github.io/#jGxCKtlUwr at 60

https://sparkprofiler.github.io/#b83NVKRdeu at 75

https://sparkprofiler.github.io/#E5SSbJE84R at 100

Then right before crashing at 130 https://sparkprofiler.github.io/#5iMn27FuBk

At 160 I got that crash report https://hastebin.com/eluhoyojeg.makefile
So that is why I feel like there no safe guards in place to prevent mass lag goggling from taking down a server literally. Yea 160 seconds may be awhile in most cases but if you got a 20 slots server even the defaults could bring the server down if enough of them did it.

commented

Correct. Keeping the server thread running whilst accessing it's data would result to race conditions. Therefore, processing is done in the server thread to mitigate this, at the cost of freezing the server whilst doing so. The other alternative would be letting the server crash every time, which is why this was implemented.

This is also the reason that NON_OPS_PROFILE_COOL_DOWN_SECONDS, NON_OPS_MAX_PROFILE_TIME and NON_OP_PERMISSION_LEVEL exists in the server config.

There's no other way to reliably process data other than halting the server, and then performing calculations.

That being said, we can halt the server and perform calculations on multiple threads to minimize impact, which is what I will do in the rewrite.

commented

"That being said, we can halt the server and perform calculations on multiple threads to minimize impact, which is what I will do in the rewrite."
I see but isn't flooding the all cores of a system would similarly causes it to hang if it does that for too long?
Also if there no ways to indeed throttle this process do that then how do you flipping pin point a server that is running at near capacity EG running at most of it's CPU cycles (like over 90% of the main thread)?

commented

LagGoggles only tracks data that has been sent during the update, which consists of the world, chunk, block position or the entity UUID being ticked.

Tracking entity names DURING profiling would be way too expensive, since we need to process tracking data as fast as possible because we don't want the server to lag even more during profiling, since there's no point.

This means after LagGoggles prints "finished profiling!" it will start grabbing entity names and block names and their respective classses, which unfortunately takes alot of time, hence the freeze in the world thread.

Please note that any lag that occurs during this time isn't recorded (it just finished profiling ;) )

Multithreading seems to be the answer to this. Flooding the cores is no problem, (thats why you have a multi core CPU, right?) We just gotta make sure we have some safety guarantees.

commented

Is there no way to increase the amount of time it takes to consolidate all the info while lowering the effect it has on the servers lag.

I can't even do 20 second scans reliably without it causing the server (Which is a dedicated server on good hardware) to time out, kick me and losing the results.
So i have to do 15 second scans that throw up a lot of false positives. Meaning it's unreliable and there for useless.

I defo can't do 15 second scans when i have 10 players online at once.

Also could anything be done to the current LagGoogles to help accuracy and the effect it has on the server, while we are waiting for the rewrite. Or is the rewrite close to coming out. Meaning doing anything on the current version is pointless.

Also using the forge version not sponge

commented

Honestly, I am completely swamped in work right now. I have had no time to work on the rewrite in quite some time because of college and a little company I have set up in order to get a some money.

At this moment, Tiquality takes priority and seems pretty stable with 1 error i need help with figuring out TerminatorNL/Tiquality#20
Tiquality takes priority because I believe that solving the problem generally speaking is better than having to keep hunting it down manually, which becomes undoable if your players start creating more and more spread out machinery.

The LagGoggles rewrite is currently in in it's infancy where the web server GUI (HTTPS encryption and everything) is working, but core functionality is still missing.

Another thing that will take some time is make LagGoggles work on it's own completely if Tiquality is not available, because I don't want to force Tiquality down everyone's throat, people need time to adjust to change and Tiquality has a pretty unique approach...

I will obviously review pull requests, if they come along, and merge them if they're up to spec.

TL;DR
I don't have the time to update LagGoggles at it's current state, and if I do have time I'd like to spend it on the rewrite, so we can finally move on

commented

Ahh that's fair enough then and understandable.

There are a few issues with Tiquality that will likely mean i will not use it. 1 being for those setup like NuclearCraft (Mod) Reactors. If they get held up/lose a tick (Someone i know has this concern and is the better person to explain it, I'll try get them to check this and comment) they will explode They think the mod would cause that to happen.

Ok thanks fair

commented

But you still didn't addressed our concerns. How are we supposed to audit our servers in the meantime? Or will a correction be made soon enough so a server in production may run an audit without terminating connections as both of us are saying if the server is used/lagging enough?

Thanks in advance because as @MacWhinny said the mod is useless if we cannot use it while the server is running.

commented

@MacWhinny

I have seen that concern before, and Tiquality has quite a lot of safeguards to prevent this already (the ticking queue which makes sure all blocks belonging to one individiual owner tick in the correct order)

If the reactor still explodes somehow, you can fiddle with the config and try to solve the issue by playing with the whitelist or the reactor mod itself and disabling explosions.

@FriendlyGamer

Make a pull request

commented

Well honestly, As you've stated before TiQuality needs to be configured correctly. So i'd put in a Mod suggestion to my Mod-PAck (ATM3 Remix) dev team to look at adding it to the pack and configured for the pack. Think that would be the best way of doing that if what you have said works and we don't get issues.

As i've stated before (and i believe you allowing this) be using both mods (LagGoogles, TiQuality Unless you'll have this feature in TiQuality) so that players can find out where they are causing lag and be able to fix it and stop being penalised for having bad setups.

I don't know if this would be possible or not, but just like LagGoogles has the local area scan, would it be possible for (In TiQuality) when players do a scan, there is a print out of all their blocks and there total hit on the server. That may be an extra statistic added to what is there/planned So that players can find out how much all the blocks they have placed are effecting the server (Also maybe even the server owners too so they can find the most unserver friendly players) and try fixing it and or to stay under the set amount that gets them on the penalised list.

commented

So effectively TiQuality will be getting the changes THEN it will be focused to Lag Goggles correct?

Because TiQuality is indeed a good "proactive" measure I agree but not all of us want to limit players to a specific amount if that what it does (set cycles/players=what one player gets)? As personally the problem I see with that is some players are more dedicated than others and thus will naturally have more "laggy" setups. So with Lag Goggles I could for example make sure there not over doing it where TiQuality may "punish" the long standing players of having many machinery added up that aren't causing normal lag spikes. Or I am wrong on what that mod is/will be doing?

commented

Yes, Tiquality will receive updates first, and then LagGoggles.
Tiquality ticks in two stages, but all within one server tick.

Roughly speaking:

granted time = (50 ms - Config_defined_time_between_ticks) / players online

1 Natural tick order: If there's updates in the queue, tick all of them or until time runs out for the player. if there's time left for the player, don't throttle. If time ran out: schedule the tick in the queue.

2 After throttling, there's probably some time left. During this period, the remaining queues of all trackers are ticked, in the correct order on a per-owner basis until the remaining time runs out.

Keep in mind that players are not 'punished', but rather face the consequences themselves rather than everyone on the server. There is no 'threshold', only a queue and the time to execute it. With Tiquality you dont instantly drop to 5 tps, but just like the actual server will gradually drop lower and lower if you keep building carelessly.

It's possible to share your tick time with someone else, so basemates effectively get twice the time.

If you're offline and have chunks loaded, you can set a multiplier in the config so your machines will still work, but at a slower rate if you are using up all your time after the multiplier has been applied.

Loads of functionality in Tiquality, but there's one mayor flaw:
Some mods don't actually tick in the block update, but do this using an event subscriber on the tick. This effectively bypasses Tiquality's functionality. It's up to the respective mod developers to fix this if it's feasable. It takes time for mod developers to adjust accordingly, which is why I wont force Tiquality.

commented

@MacWhinny

Yes! It's going to be exactly like you said. ๐Ÿ‘

commented

I can also confirm that this is happening on LagGoggles 4.3

However, Tiquality is a completely different plugin to LagGoggles, and cannot be used to profile for lag. Therefore, why has all support for LG suddenly just dropped? I don't understand.

I don't find that TiQuality actually 'fixes' lag, it purely just tries to distribute it, which may not even work at most times, especially right out of the box.

This issue is directed towards LagGoggles, and I feel that it should be acknowledged further, instead of forwarding people to Tiquality which is something entirely different.

However, I do understand that there will be a rewrite for LagGoggles in the far future, and I do appreciate that.