Dynmap-Forge/Fabric

Dynmap-Forge/Fabric

888k Downloads

Dynmap fails with large SQLite database using internal webserver or standalone - PARTIAL FIX INCLUDED

argon000s opened this issue ยท 8 comments

commented

Issue Description: I have found 2 issues which I believe are related. I have a large world ~15GB and use SQLite database as storage ~25GB which has over one million tiles.

1. Using internal webserver, accessing the map fails completely, the page loads a few components but then stalls, refreshing the page causes it load no data and time-out. Shutting down the server causes multiple SQLException errors.

2. When using a standalone webserver with the SQLite database, tiles load extremely slowly, around one tile a second or even slower when there are empty regions on the map.

I have conducted my own research on the slow retrieval of tiles using an sqlite database browser. I ran a similar sql command found in SQLite_tiles.php.
example:

SELECT Tiles.Image,Tiles.Format,Tiles.HashCode,Tiles.LastUpdate,Tiles.ImageLen
FROM Maps JOIN Tiles WHERE Maps.WorldID='world' AND Maps.MapID='t' AND Maps.Variant='STANDARD'
AND Maps.ID=Tiles.MapID AND Tiles.x=-1114826 AND Tiles.y=1123189 AND Tiles.zoom=0;

Execution finished without errors.
Result: 1 rows returned in 804ms

That takes nearly a second!
And for a tile that does not exist:

Result: 0 rows returned in 1971ms

That's 2 seconds!
So the database is getting severely bogged down looking for tiles whether or not they exist. The automatic Index feature of SQLite prevents databases operations from taking this long even on extremely large databases. However it seems to be not using it as shown below when debugging the request.

EXPLAIN QUERY PLAN ... (the select query above)
Result:
SCAN TABLE Tiles
SEARCH TABLE Maps USING INTEGER PRIMARY KEY (rowid=?)

Notice: "SCAN TABLE Tiles". Its not using the index on the table but scanning the entire table (over 1 million rows), when using a table JOIN in the query. I don't know if this is a SQLite driver bug or some other issue.

However, you can force it to use the index by preceding join with cross in the select statement.

Using the new query:

SELECT (...) FROM Maps CROSS JOIN Tiles WHERE Maps.WorldID=(...)
Result: 1 rows returned in 6ms

EXPLAIN QUERY PLAN ... (new query)
Result:
SCAN TABLE Maps
SEARCH TABLE Tiles USING INDEX sqlite_autoindex_Tiles_1 (MapID=? AND x=? AND y=? AND zoom=?)

Much better, it's using the index like it's supposed to...

See https://www.sqlite.org/optoverview.html#manual_control_of_query_plans_using_cross_join for more info on sqlite index.

Im not 100% sure but I think something similar is happening inside the dynmap plugin causing it to completely stall

This may also be the cause for issue #3434
This may possibly also affect mySQL database, although I have not tested this.

  • Dynmap Version: dynmap v3.4-SNAPSHOT-768
  • Server Version: paper 1.18.1 build git-Paper-216
  • Pastebin of Configuration.txt: https://pastebin.com/Ne1HYuLE
  • Server Host (if applicable): Selfhosted
  • Pastebin of crashlogs or other relevant logs: https://pastebin.com/6uWDYpSX
  • Other Relevant Data/Screenshots: none?
  • Steps to Replicate: Have a large map with a large SQLite database

[x] I have looked at all other issues and this is not a duplicate
[x] I have been able to replicate this

commented

I am testing this, if fix deems fruitful I will create a pull report to apply this fix, thanks for pointing it out!

commented

I forget sometimes how stupid SQLite's execution planner is. Lemme fix the indexing on the Maps table (which is a tiny table, so most DBs are smart enough to resolve it first and then leverage its zero or one row hit into an indexed lookup in the Tiles table.... and an index on a table that rarely has more than a handful of rows is sort of silly...). CROSS JOIN is generally viewed as a terrible thing to use because the logical row set produced is N x M rows for a N row table CROSS JOINed with an M row table, so you get a full cartesian product and then prune the result with the WHERE, while the ON for an INNER JOIN does the same thing but prunes the product before the WHERE is resolved: the notion of it working better here is, frankly, bizarre, and really kind of implies that SQLite is applying the WHERE during the cross join vs after (e.g. aliasing 'A CROSS JOIN B WHERE XYZ' as 'A INNER JOIN B ON XYZ' perhaps), besides ignoring table size when deciding about scanning vs searching.

Anyway - build 772 (which is running right now - should be done in about 20 minutes) will add the index on the Maps table, which I've confirmed removes the scans on the query.

commented

just tested, the website loads even faster with the magic you did, thanks for again not choosing for the quick fix but using your in depth knowlegde to go that step further!

commented

Cool - I appreciate you guys bringing this one up - the "CROSS JOIN" thing really hurt my head, which made it a bit more of a puzzle I wanted to solve :). I'm interested in @argon000s 's feedback as well.

commented

Cross join is unnecessary - the code really just needs to include an appropriate ON on the join - check current code in current dev build

commented

The change in the dev build doesn't fix the issue.

Query's on the same database file using new method:

SELECT Tiles.Image,Tiles.Format,Tiles.HashCode,Tiles.LastUpdate,Tiles.ImageLen
FROM Maps JOIN Tiles ON Maps.ID=Tiles.MapID
WHERE Maps.WorldID='world' AND Maps.MapID='t' AND Maps.Variant='STANDARD'
AND Tiles.x=-1114826 AND Tiles.y=-1123189 and Tiles.zoom=0;

result:
BLOB	2	2753651856	1647215402155	332

Execution finished without errors.
Result: 1 rows returned in 808ms

This takes the same time as before.

EXPLAIN QUERY PLAN 

SCAN TABLE Tiles
SEARCH TABLE Maps USING INTEGER PRIMARY KEY (rowid=?)

Still not using auto-index.

Non-existent tile:

Execution finished without errors.
Result: 0 rows returned in 1969ms

Still around 2 seconds.

Also deleting all the world.pending files causes dynmap to function. I think what I did before this issue was to do a full render of the end - so there must be something buggy going on when attempting to read chunks and/or tiles from that dimension.

I've attached those files to see if anyone can decipher them:

world.pending.txt
world_nether.pending.txt
world_the_end.pending.txt

(Couldn't put it on pasebin because it too big)

commented

Yes, this has seemed to of fixed the standalone, map viewer. Thakyou.

However I am still having issues with Dynmap stopping functioning like in issue #3434 when doing a full render. (The only way to fix this is to stop the server, delete all world.pending files, and start the server again.) So perhaps the cause for this is not sqlite index related. I am really puzzled.

Last part of the server log at shutdown:

[01:02:27] [Server thread/INFO]: [dynmap] Disabling dynmap v3.4-SNAPSHOT-777
[01:02:27] [Server thread/INFO]: [dynmap] Unloaded 11 components.
[01:02:27] [Server thread/INFO]: [dynmap] Stopping map renderer...
[01:02:28] [Server thread/INFO]: [dynmap] Saved 584546 pending tile renders in world 'world'
[01:02:28] [Server thread/INFO]: [dynmap] Full render job saved for world 'world'
[01:02:33] [Dynmap Render Thread/ERROR]: [dynmap] SQLException: Tile matches hash error
[01:02:33] [Dynmap Render Thread/ERROR]: [dynmap]   ErrorCode: 0, SQLState=null
[01:02:33] [Dynmap Render Thread/ERROR]: [dynmap]   Message: Interruped
[01:02:33] [Dynmap Render Thread/ERROR]: [dynmap] SQLException: Tile read error
[01:02:33] [Dynmap Render Thread/ERROR]: [dynmap]   ErrorCode: 0, SQLState=null
[01:02:33] [Dynmap Render Thread/ERROR]: [dynmap]   Message: Interruped
[01:02:33] [Dynmap Render Thread/ERROR]: [dynmap] SQLException: Tile exists error
[01:02:33] [Dynmap Render Thread/ERROR]: [dynmap]   ErrorCode: 0, SQLState=null
[01:02:33] [Dynmap Render Thread/ERROR]: [dynmap]   Message: Interruped
commented

The 'interrupted' during shutdown is noise - basically, if DB writes are as slow as they often are for SQLite, I have to cancel them during shutdown in order to avoid the Minecraft server whining about taking too long to shutdown.... Basically cannot win, since the server doesn't have a way to be told that the longer shutdown is needed and not to call a timeout.