Option to disable offline player matching
SergioK29 opened this issue · 8 comments
The openinv command autocompletes to the nearest player if there isn't one found, which is fine, but on a server with hundreds of thousands of player data files it causes the command to take minutes to complete and is overall frustrating, some option to disable the autocomplete would be much appreciated
Eyeballing the string similarity metric, there's a good bit of room for optimization, so doing that might help. It looks like the strings get cloned a bunch of times (3-6 depending on how you count), including just to discard the cloned "common prefix" string after calling length()
on it.
Alternatively, a higher-level solution would be to use a trie and tab-complete offline player names from it, so you can enter johnminecraft
and tab the numbers in. Wouldn't work for matching johnminecraft
to j0hnminecraft
as the similarity search does, but I'm not sure how common that is in practice.
Off the top of my head, I think traversing a trie to find the nearest-neighbor by Levenshtein distance should also be possible in logarithmic time/space, so the two approaches aren't mutually exclusive. Not so sure about Jaro-Winkler distance.
The string comparison generally is a lot less impactful than the fact that however many hundred thousand files have to be read from disk to fetch players' last known names. O/I is really expensive relative to other operations. This is why I think a db is the way to go to solve it - being able to make bulk queries would greatly reduce disk reads.
The reason I moved away from Levenshtein is that its results are less-than-impressive for partial matches. This might be slightly inaccurate, because it's been a long time since I looked at either, but to my memory, if I'm looking for "johnminecraft12345" and I can only remember "johnminecraft" then "frankminecraft" is a better match as far as Levenshtein is concerned. Jaro-Winkler solves that by valuing the matching starts higher.
Re: optimizing metric in general: Probably a good idea, but not my code, so I never worried about it much. Looking at it now I appear to have somehow removed the licensing header from it (presumably some time during an auto-update of the project's copyright headers?), which is solidly not good... It's an adaptation of code from Simmetrics, so it's absolutely not mine and I don't want to be laying claim to it beyond some light modification to mush it all into one place and accept slightly different formats.
/e: I can't seem to find the old licensing header at all poking through blame and history... possible I never noticed when the pre-commit copyright wiped it out the first time, but either way, that's messed up, I need to fix that.
Not saying this isn't some degree of problem (one of the reasons I inherited OI was because I was working on preventing cases like these from crashing the server - that used to be a main thread check!) but if you know exactly who you're looking for this shouldn't be a problem because matching is done in this order:
- UUID
- Exact online
- Cached from previous offline lookup results
- Exact offline (cached if valid)
- Inexact online
- Inexact offline (cached when found)
Your average experience should be about as fast as on any other server regardless of player file count. If your first lookup of an offline player who has played before by exact name is slow, that probably indicates that they are not in the server's cache and it has to fetch their UUID from Mojang, which isn't something OI can change without its own larger cache. Even in that case it shouldn't be more than a second.
Simply type perfectly all the time and don't make a single error on the name of IliIill1lII11, of course. Issue solved. /s
Not really a fan of disabling the feature as a whole. When I was administrating a server (a meager few thousand players, not your numbers) it was handy in a lot of cases where we'd have "that guy johnminecraft<numbers>" et al. where most of the name was available but the last little bits were an excursion into logs from 10 days ago. I get that this may not be your personal experience with OI, but I think allowing server owners to disable it would be detrimental to the average admin's experience.
Imo the easiest course of action here is for OI to maintain a complete name attempt->UUID table in a database. It already does maintain a small one in memory (very small, way too small for my taste, but it's mostly intended to cover enter-uparrow-enter on malformed names). Lookups would then be largely as performant as the database is responsive, and even new typos would likely be able to produce a result much faster due to being able to fetch many more possibilities in a single read.
This would remove the need to load player files from disk at all (barring an initial setup that could be run once on startup and loosely confirmed good by just checking that the number of players on file matches expectations).
An alternative would be to disable exact offline matching entirely and add a command to look up the best match for a name, rewording the language of the "no such player" error message to suggest it.
This would be more compatible behavior-wise with feature requests like /clearinv
with only exact offline matching, and could benefit from the same database concept.
Going to re-title your issue because I originally thought you meant tab completion, which only runs on online players.
The string comparison generally is a lot less impactful than the fact that however many hundred thousand files have to be read from disk to fetch players' last known names. O/I is really expensive relative to other operations. This is why I think a db is the way to go to solve it - being able to make bulk queries would greatly reduce disk reads.
The reason I moved away from Levenshtein is that its results are less-than-impressive for partial matches. This might be slightly inaccurate, because it's been a long time since I looked at either, but to my memory, if I'm looking for "johnminecraft12345" and I can only remember "johnminecraft" then "frankminecraft" is a better match as far as Levenshtein is concerned. Jaro-Winkler solves that by valuing the matching starts higher.
Re: optimizing metric in general: Probably a good idea, but not my code, so I never worried about it much. Looking at it now I appear to have somehow removed the licensing header from it (presumably some time during an auto-update of the project's copyright headers?), which is solidly not good... It's an adaptation of code from Simmetrics, so it's absolutely not mine and I don't want to be laying claim to it beyond some light modification to mush it all into one place and accept slightly different formats. /e: I can't seem to find the old licensing header at all poking through blame and history... possible I never noticed when the pre-commit copyright wiped it out the first time, but either way, that's messed up, I need to fix that.
yeah don't add a db, would be bloat, also can we get the option to remove the items in the openinv inventory? They are unnecessary and cause confusion
yeah don't add a db, would be bloat
A DB would mean a single extra file open (as opposed to opening each player file to read it), and would require 2 16-character strings per entry. Disk space is cheap, processor cycles less so. It would also make lookups far, far faster via proper indexing. Additionally, SQLite drivers are baked into the server already.
also can we get the option to remove the items in the openinv inventory? They are unnecessary and cause confusion
In the future, please open separate issues for separate problems.
Set the items to air. https://github.com/Jikoo/OpenInv/wiki/Customizing-Inventories#customizing-placeholders
yeah don't add a db, would be bloat
A DB would mean a single extra file open (as opposed to opening each player file to read it), and would require 2 16-character strings per entry. Disk space is cheap, processor cycles less so. It would also make lookups far, far faster via proper indexing. Additionally, SQLite drivers are baked into the server already.
also can we get the option to remove the items in the openinv inventory? They are unnecessary and cause confusion
In the future, please open separate issues for separate problems. Set the items to air. https://github.com/Jikoo/OpenInv/wiki/Customizing-Inventories#customizing-placeholders
a vast majority of the users of this plugin use it because of /openinv and /openender being able to work for offline players.
IMO just focus on making the plugin good at these 2 commands and up to date with minimal bloat.
That being said i'd rather just have the huge fuzzy search lag then implementing a whole ass database, tables and threads and a driver etc, there is enough of that crap already on my server 😭
a vast majority of the users of this plugin use it because of /openinv and /openender being able to work for offline players. IMO just focus on making the plugin good at these 2 commands and up to date with minimal bloat.
The thing you're missing here is that for the vast majority of users, those commands already work very well. The vast majority of servers have 1-2 players online on average (bstats puts weekend peak totals at ~220k players and 155k servers), and probably have seen under 10 unique players.
That being said i'd rather just have the huge fuzzy search lag then implementing a whole ass database, tables and threads and a driver etc, there is enough of that crap already on my server 😭
I think you may not understand what OI currently does and what the goal here would be.
What OI currently does when fuzzy matching is effectively use a new thread to open and read every single player file. This requires back-and-forth with the OS for each and every file read, which takes time. This is why the fuzzy match is slow for you - you have approximately one cubic heckton of player files, and each and every one must be opened, read, and closed.
The goal here would be to instead (when forced to fuzzy match) use a thread, negotiate access to a single file (which should ideally already be open), and read large chunks of data. Assuming we can bulk load 100 names per call, this would be the equivalent of replacing 300 system calls in the current system with a single call.
Additionally, we could leverage the fact that data is indexed nicely to potentially do things like only load names whose first letter matches the first input letter. That would be a massive reduction in lookup counts, and is not currently possible at all.
I would probably use SQLite for this. Its drivers are already included in Spigot (no extra bloat). It would not require configuration - the database and relevant tables would be created inside the OpenInv folder and Just Work(TM).
Yes, this will require that a file exist in the OpenInv plugin folder. Yes, it will take up some extra disk space (at a minimum, it'd be an entry per player). Frankly though, if that's such a big strain, you should probably be purging old players' data instead. Seriously, check out the size of the playerdata and advancements folders via du /path/to/minecraft_server/world/ -ch
. The fact that advancements are stored in uncompressed json is crazy.
I'm going to reopen this, but feel free to unsubscribe from it if you don't think the feature has merit. The good news is that I usually take a couple years to get to changes like this, so if you still hate the idea and don't want to discuss it, you have plenty of time to find a replacement plugin.
I'm going to reopen this, but feel free to unsubscribe from it if you don't think the feature has merit. The good news is that I usually take a couple years to get to changes like this, so if you still hate the idea and don't want to discuss it, you have plenty of time to find a replacement plugin.
If you keep closing this, I'm going to just open another issue pointing here.