Blocking Methods in Event Handlers

Question

Blocking Methods in Event Handlers

Username4723 opened this issue 4 years ago · 1 comments

Username4723 commented 4 years ago

Description

If a plugin subscribing to certain events calls a blocking LuckPerms API call, commands that update an online player, such as promoting on a track, will appear to succeed (and even send a message using Redis) while not writing changes to the database (or writing them extremely delayed), making it seem as if the database has suddenly gone read-only.

Reproduction steps

Have near-zero (real) programming talent and think running a server is a good idea
Install BungeeCord and Spigot, following the Network Install settings
Use MongoDB for storage and Redis for messaging
Create a track with several groups
Create a plugin subscribing to the NodeMutateEvent. Source code locateable below
On said event, have the UserManager load the user from the database and call .join() so its blocking.
Install said plugin on your BungeeCord
Start both the BungeeCord and Spigot server
Login with 1.16.4
Try and spam promote your account to the last group in the track via BungeeCord's console

Expected behaviour

Commands should not report a success nor send a redis ping if data is not updated in the database. In addition, event handlers that block the API during an event or last longer than a reasonable amount of time should generate warnings for the developer instead of failing silently.

Environment details

Server type/version: PaperSpigot running version 1.16.4 build 344
LuckPerms version: v5.2.67

https://gist.github.com/Username4723/7d2239b2b8773ced22945291d931bf4e
Full logs, test environment build script, source code, and database credentials (*don't worry, they've been changed. And the database is only listening on 127.0.0.1) can be found in the link above.
The build.sh file is used to create a consistent and reproducible testing environment.

Any other relevant details

I tried to replicate other sources of event blocking via several methods such as Thread.sleep, a loop that counts to two billion and does nothing, or a while loop that adds two numbers together until time has passed. None of them were able to create the same error.

In addition, while the code used in the example (looking up a user whos data is already available) is... questionable... at best, I have not tested whether other blocking lookups will replicate the same bug.

Lastly, for some reason, the bug only appears to manifest itself while the user is logged in.

Edit: Changed specified version from .63 to .67. No clue where the 63 came from, as 67 is displayed everywhere in the logs. I'm starting to see (even more) why I was the first person to try and load from the database a user I already had in memory.

lucko · Answer 1 · 2021-01-01T20:14:27.000Z

Thanks for the detailed report.

Yes - it's not a good idea to make calls to loadUser within a NodeMutateEvent, because loadUser indirectly causes more NodeMutateEvents... so you end up effectively with a stack overflow that will never end (because the result gets spawned on new threads).

The same would happen (more obviously) for example if you made a call to loadGroup in a GroupLoadEvent listener.

Thing is though, there's really no reason to make a call to loadUser within that method because a User instance is already provided by the event (#getTarget()).

It's non-trivial to add warnings/errors for API users attempting to do this sort of thing, and given the unlikeliness of it happening, probably not worthwhile to implement. I can appreciate it probably has caused quite a bit of frustration trying to debug, but I hope you can appreciate why there's not really much worth doing in LP to warn against it - sorry!

Share to

Description

Reproduction steps

Expected behaviour

Environment details

Any other relevant details