Database performance
sfPlayer1 opened this issue ยท 12 comments
Database operations in BetterQuesting appear to use several ms per tick on a busy server. I checked the code and call stacks and noticed that there's quite some room for a more efficient implementation.
E.g. PartyManager.getUserParty calls SimpleDatabase.getEntries, yielding a shallow copy of the entire DB content, then iterates that to find the desired item. It'd be much faster and more scalable to have an index (HashMap<UUID, DBEntry>) than can retrieve the result in short and constant time.
In general a DB should never be iterated unless the entire contents are of interest (like for saving). TreeMap is relatively slow and should be avoided unless ordered navigation is required. TreeMap.toArray itself is surprisingly slow, taking ~90% of the lookup time in the example above.
Dense int ID operations like finding the lowest unused ID can be done fairly efficiently with BitSet (bitSet.nextClearBit(0)).
If the DB is mostly about assigning IDs densely with recycling freed IDs, I'd use a BitSet to manage ID allocations, an ArrayList where index == ID for instant id->data lookup and HashMaps for indexing frequent searches, like the UUID lookup.
I'm aware that my solution isn't the fastest of implementations and still has its issues but when I wrote it I was primarily aiming to get it off this mess. I also included a BigDatabase variant to try and mitigate some of these lingering performance issues in places where there are potentially several hundred entires. Unfortunately getEntries()
was one of the things I never got around to resolving properly, partly because some editors still used it to browse entries and also because I'm not well versed on advanced data structures.
Some of the requirements I've come up with:
- Entries must be sorted by ID. Some use cases are dependent on this fact.
- IDs must remain fixed even after add/remove operations (can't use a list order index)
- Database access must be thread safe
- Editors need to be able to browse/search entries one way or another (via a threaded task maybe)
- Implementation needs to be reusable for multiple data types
For performance issues around getUserParty()
, I was thinking of just directing that to a HashMap cache for now, similar to your suggestion, and only browsing the database to update it when the player is no longer a part of that cached party.
I'll eventually get around to resolving these scalability issues but right now I'm trying to avoid rewriting any significant portions of code now that everything is fairly stable and I can finally get around to other non-critical UX issues.
Unfortunately BigDatabase won't help where it still visits all blocks for some operations, which is often the case. You could improve on this immediately by using SimpleDatabase with a TreeMap (key=id) instead of the TreeSet to get faster id->entry lookups as in BigDatabase. QuestDatabase is actually better, it just has no efficient way to allocate IDs (->BitSet as suggested) and it is not thread safe.
A concurrent map only ensures that the actual method calls to it are safe and atomic, but anything earlier or later isn't necessarily consistent. E.g. ConcurrentHashMap.containsKey will yield the correct result for the point in time it was called, but an immediately subsequent ConcurrentHashMap.get may disagree since another thread can have modified the map in the meanwhile. If a consistent state is required beyond one access, like in QuestDatabase.nextKey, a normal map with synchronized, ReadWriteLock or StampedLock needs to be used.
I wouldn't treat the UUID->Entry HashMap (and others) as a cache, this asks for invalidation/missed update bugs. Active maintainance to always be in sync with the DB contents is easier to .
How do these changes look to you so far? (I know that getEntries()
can be modified in this version but I'm holding off modifying the interface for now)
f0be894
This looks better, a few remarks/tips for the new code:
- With TreeMap the blocking should be superfluous, the binary tree is already inherently sub-dividing in a fairly optimal way. This simplifies everything and methods like getID can iterate TreeMap.values() without the expensive getEntries() or your size() becomes just a call to TreeMap.size().
- I'd synchronize on "this", allowing you to use the synchronized modifier on the methods for cleaner code and adding the ability to synchronize on the db state externally, e.g. for iteration while locking concurrent changes out
- dbBlockMap doesn't need to be wrapped in Collections.synchronizedSortedMap since all accesses are already synchronized
- removeValue can be implemented as removeID(getID(..))
- add can use putIfAbsent and check the return value for null to generate the duplicate error
- if you happen to do many T->id lookups (directly or indirectly), you could consider adding a HashMap<T, Integer> to speed that up
Oh yea, completely forgot I can use synchronise(this)
. I'll include these new suggestions onto what I've already got and send you an updated commit when that's done. Guess all this makes BigDatabase
redundant then (besides bulk lookup which could probably be reworked later).
Thanks for helping with this by the way. Quite nice learning some very handy data structure knowledge
Looks good, I have 2 more improvements you could consider now or if performance is still insufficient in the future:
- getValue: mapDB.get(id) should always return null for !idMap.get(id), so the later check is redundant
- mapDB could still be replaced with a sparsely populated array where the index equals the id for much faster lookups if you have no huge gaps between the ids (->
synchronized T getValue(id) {
return id >= 0 && id < mapDB.length ? mapDB[id] : -1;
}
synchronized DBEntry<T> add(int id, T value) {
...
if (id >= mapDB.length || mapDB[id] == null) {
if (id >= mapDB.length) mapDB = Arrays.copyOf(mapDB, Math.max(id + 1, mapDB.length * 2));
mapDB[id] = value;
idMap.set(id);
refCache = null;
return new DBEntry<>(id, value);
} else {
throw...
synchronized boolean removeID((int key) {
if(key < 0 || key >= mapDB.length || mapDB[key] == null) {
return false;
} else {
idMap.clear(key);
mapDB[key] = null;
refCache = null;
return true;
}
}
etc.
getEntries() iterates 0..mapDB.length where index = id and skips null values, the result is obviously always sorted by id, size could be kept in a field and incremented/decremented on add/remove)
Thing is I can't guarantee that a pack dev won't delete a large chunk of quests leaving a considerable hole/fragmentation in ID use. In actuallity I see it happen when devs discard whole quest lines, particularly after importing more than necessary. I'm not sure resizing an array would a good idea especially with bulk copy/delete actions. Sure it may be faster but there's also the editors to be mindful of.
I think it should be fine unless the hole is in the tens of thousands or even more. This is however just a stretch goal and probably better left for when profiling data demands it :)
I'll close this as the problem should be resolved. Thanks for fixing.
Actually... that gives me an idea. What if I used idMap
to figure out how many unused IDs there are below the queried ID and then use that as an index offset to get it's actual location in refCache
Best of both right?
It is possible, but unfortunately a really slow operation. BitSet is not particularly magic, it still represents a linear scan, which is the slowest way to do any lookup. It however scans 64 entries at once, so it's still the best choice for some applications, like your "find first unused id".
The only in-between that could make sense is using a HashMap instead of the TreeMap, but then you lose in-order iteration and may have to sort your entry array manually with Arrays.sort after building it.
The problem is that TreeMap is not slow, HashMap even less so, giving you very little leverage for further improvement without drastic measures like the array. IMO the majority of the remaining problems will be scanning uses of getEntries similar to the party lookup in the OP, which will need targeted tweaks/caches anyway.
Fair enough. I'll leave it as is for now then untill someone is crazy enough to somehow overload this (doubt it).
Thanks again for all your help on this.
Seems to still be working fine after the changes. Didn't add the HashMap because ID lookups really shouldn't be necessary all that often. Most code should be relying on IDs more than object references.