Performance is abysmal

Question

Performance is abysmal

twothe opened this issue 5 years ago · 15 comments

twothe commented 5 years ago

I've setup a very simple network with 2 chassis that allow to autocraft bricks from stone. One providing the stone, the other one attached to a crafter. This setup alone consumes about .5 ms on the server while it is actively crafting stuff, tested with LagGoggles and /cofh tps. As much as I love Logistic Pipes back then in Infinity Evolved it was also a major source of lag. And in the current implementation it would be abysmal for the performance on multiplayer servers once again.

In comparison the same setup using Thermal Logistics uses no measurable performance, ProjectRed uses a quarter of what LP uses.

bziemons · Answer 1 · 2020-04-20T16:14:11.000Z

Thanks for your feedback. A few months back we were running a server with multiple smaller and a large LP network and the biggest lags we noticed were the chunk baking of LP pipes and the render lag caused by crafting signs. We were running the server with 20 tps except for when 1000 animals were doing their path finding, but we were also at maximum 4 players.

That said I always agree that there is room for improvement, especially for old code that already survives since Java 6. Other than that you may want to think about how to approach an open source project with your first issue.

hron84 · Answer 2 · 2020-04-23T11:23:04.000Z

@twothe I second on @bziemons with a small addition: every issue report should contain information about your setup. Minecraft version, Forge version, LP version, etc. If you don't provide them, the developers will not be able to reproduce your issue and will not be able to even think about the solution for you. Please be kind for the developers who work on this project in their free time, and also be helpful in reports, provide as much info as you can.

twothe · Answer 3 · 2020-04-24T10:16:59.000Z

I made a performance measurement. It showed that this mod uses up a tremendous amount of performance. What more is there to say? If the developer feels hurt about this then.. yeah it's a measurement, not a personal attack, just milliseconds.

You can now ignore that if you want to, there are no obligations about anything you do in your free time. But if you do then people will most likely not include this in popular packs, because it would slow things down too much if more than 10 people use this mod. Or you can take it as an encouragement to fix things.

bziemons · Answer 4 · 2020-04-24T10:46:39.000Z

There is no reason to feel hurt. I've seen far worse comments and laugh them off. Following up on your answer I don't see a reason to discuss this further as there is no valuable information in this issue.

hron84 · Answer 5 · 2020-05-05T10:12:06.000Z

@twothe for later conversations: you did performance measurement but you forget to share the results of it with us (numbers). Also, you did not share the environment where you did these measurements. This information is critical to not being ignored and also to be helpful. Nobody took it as a personal attack, but we need more information to reproduce the issue you talk about. Without it, the developers have absolutely no chance to fix the problem you report. I kindly ask you to think about it.

twothe · Answer 6 · 2020-05-05T11:08:42.000Z

I did a flat-world redstone preset in peaceful. So without anything else that could negatively impact the performance. Performance measurement was done with LagGoggle and double-checked with Cofh TPS, both showed the same result.

The test-setup was the creation of Galgadorian ingots from raw materials, which were stored in BSDaM with infinity upgrade. 2 maximum upgraded Smelter Factories from Mekanism were used for the smelting. Additionally the pipe setup contained an overflow Diamond Chest from Iron Chests.

I made two measurements: one with a single Galgadorian ingot requested as warmup and general check of the setup, and then the real test with 64 requested. The later also to test the overflow handling of pipes. I made several tps checks with Cofh while the machine was running and afterwards, and a 1 minute world lag analysis with LagGoggles in between, and took note of the TPS time increase vs. the idle state before and after. I later compared the same setup to other mods. The results were as follows:

Logistic Pipes 1.2ms
TE Logistics 0.58ms
ProjectRed 0.28ms
Refined Storage: 0.07ms

Additionally Logistic Pipes begun dropping countless materials into the world when the crafters became overloaded as items were dumped into the overload chest, but new items were already taken from storage before those items even landed there. I estimate that several thousand items dropped from the pipes when doing the 64 Galgadorian ingot test.

Considering that ProjectRed does the exact same thing that Logistic Pipes does, but takes only a quarter of time for that and does not spill countless items into the world, I came to the conclusion I wrote in the title.

hron84 · Answer 7 · 2020-05-07T10:31:56.000Z

@twothe could you please, please provide versions? MC, Forge and LogisticPipes at least?

twothe · Answer 8 · 2020-05-07T10:43:49.000Z

Latest 1.12 at the time of writing, which would be
Forge: 14.23.5.2854
Logistic Pipes: 0.10.3.9

twothe · Answer 9 · 2020-05-07T12:28:04.000Z

I took a look at the code and found this excellent example that explains many of the reasons why there is most likely such a bad performance:

public IRouter getOrCreateRouter(UUID UUid, World world, int xCoord, int yCoord, int zCoord) {
  IRouter r;
  int id = getIDforUUID(UUid);
  if (id > 0) {
    getRouter(id);
  }
  if (MainProxy.isClient(world)) {
    synchronized (_routersClient) {
      for (IRouter r2 : _routersClient) {
        if (r2.isAt(world.provider.getDimension(), xCoord, yCoord, zCoord)) {
          return r2;
        }
      }
      r = new ClientRouter(UUid, world.provider.getDimension(), xCoord, yCoord, zCoord);
      _routersClient.add(r);
    }
  } else {
    synchronized (_routersServer) {
      for (IRouter r2 : _routersServer) {
        if (r2 != null && r2.isAt(world.provider.getDimension(), xCoord, yCoord, zCoord)) {
          return r2;
        }
      }
      r = new ServerRouter(UUid, world.provider.getDimension(), xCoord, yCoord, zCoord);

      int rId = r.getSimpleID();
      if (_routersServer.size() > rId) {
        _routersServer.set(rId, r);
      } else {
        _routersServer.ensureCapacity(rId + 1);
        while (_routersServer.size() <= rId) {
          _routersServer.add(null);
        }
        _routersServer.set(rId, r);
      }
      _uuidMap.put(r.getId(), r.getSimpleID());
    }
  }
  return r;
}

First of all: the use of synchronized is deprecated, and locks in general should be avoided. Using locks frequently in inner loops will cause threading to implode, and at some point the cost of locking will out-weight the gain of threading. Java gives you all the necessary tools to avoid locking at all or let it be handled internally by highly optimized functions.

In this example you can replace _routersServer with a ConcurrentHashMap, which will then give you the computeIfAbsent functionality to do the exact same code thread-safe and without a lock (on your side).

From an algorithm point of view you are trying to find a unique id for routers and have several background operations going on to ensure that this happens, like the UUID, but also when generating the simpleID internally. There is however a much simpler way to ensure that each router has a unique ID, as there can only be one router at a specific dim, x, y, z coordinate. So the simplest unique ID you can make for a router is:

final int uniqueRouterId = Object.hash(world.provider.getDimension(), xCoord, yCoord, zCoord);

Below that you try to map the id to a specific point in the array and have to bloat and re-arrange that array multiple times to get that 'simple' id->array mapping, while that is completely unnecessary. Maps using simple integer hashes are lightning fast. Using an array to speed that up even further might gain you a tiny bit of additional performance, but the pitfalls of handling all those array shifting and filling by far outweights that tiny bit of performance. Especially if all of that is then lost in the look-up loop above.

The same function (with some additional adaptations in other classes) could be written like this:

public IRouter getOrCreateRouter(UUID UUid, World world, int xCoord, int yCoord, int zCoord) {
  final int uniqueRouterId = Object.hash(world.provider.getDimension(), xCoord, yCoord, zCoord);

  if (MainProxy.isClient(world)) {
    return _routersClient.computeIfAbsent(uniqueRouterId, key-> new ClientRouter(key, world.provider.getDimension(), xCoord, yCoord, zCoord));
  } else {
    return _routersServer.computeIfAbsent(uniqueRouterId, key-> new ServerRouter(key, world.provider.getDimension(), xCoord, yCoord, zCoord));
  }
}

That function would do essentially the same and be thread-safe at the same time without having to put the whole server on hold each time it is called.

bziemons · Answer 10 · 2020-05-07T15:17:18.000Z

Looking at the code is indeed a very good idea. I fully agree that the use of synchronized must be avoided and different methods of multi-thread compliance must be used.

I think the example you chose does not reflect a good one, because that code is not run very often and LP would not work anymore after the simple changes you present here. You seem to be simplifying the problem too much. If we needed a unique ID of a router at this point, we would just use the passed UUid parameter instead of using a hash value that is most certainly not unique at some point. I am not getting into the ifs and whens of the array shifting issue. As far as I can see that issue is real, but very uncommon.

I will not join a discussion about what we (as in we developers of LP) should do to improve performance. In the end that discussion can only be done at a very technical level with a lot of knowledge of the code and its performance problems. I will gladly accept any pull requests that aim to solve those problems and discuss them at a concrete level.

AartBluestoke · Answer 11 · 2020-05-10T02:24:10.000Z

I believe the best way to get devs to act on a performance issue is to fire up jvisualvm and showing a performance trace where logistics pipes (and its locks) are stalling threads.

twothe · Answer 12 · 2020-05-10T14:51:17.000Z

@AartBluestoke I am not a dev of this mod and have no plans to become one, but I can give advice as someone with 20 years of experience in Java and performance engineering. The devs can take this advice or not, that is up to them.

The test I provided is something a professional performance engineer would do to measure this mod, and I presented the results and the comparison to other mods. Where exactly the problem is rooted I cannot tell from a quick glance at the code. And it would certainly take a few days to understand that code well enough to pinpoint the locations precisely. Doing a code performance trace without knowing the code will usually produce useless reports, and I am certain the devs of this mod are well capable of doing that on their own with a lot more knowledge on the internals. But what I can see are very basic threading issues, hence my other post. And that again is a free advice and not a request.

AartBluestoke · Answer 13 · 2020-05-12T11:18:54.000Z

@twothe , sorry i didn't mean to sound aggressive; but 1ms of cpu use per minute seems to be a reasonable use? - Thats also why i suggested peformance tracing - if all that is blocked is a secondary worker thread for a few ms, then those locks may not be an issue, and for certain access patterns (which are extremely read heavy) you actually get better performance using non-concurrent data types, and careful programming.
I havn't tested scaling limits recently, but last time i did, it scaled to millions of crafts/hr in a network of thousands of nodes without hitting performance limits (i hit the limit of number of entities that could be rendered on the screen before i hit cpu limits due to logisitics)

The reason that lp can take a little more cpu than the other named mods is that it does more - eg, we move our entities around, we do shortest path routing.

LP trades of a higher initialisation cost for lower running costs, and better scaling - try running a creative quarry through the system, and smelting all the ores into ingots, then routing each ore to a separate chest.

twothe · Answer 14 · 2020-05-12T11:49:03.000Z

I am considering this mod for large scale servers with 50 players average. People there build constructions that lag even with Applied Energistics 2, and I've seen bases that used up every available channel on AE2, so about 8704 connections. And they were in constant use, because that person was constantly generating all kinds of resources to fulfill end-game tasks for the mod-pack. That single base was quiet straining for the server.

Now the 1.2ms increase in tick time (not per minute, a tick can only last for 50ms max), means: even that simple construct could only support like 20 players (if you take into account other things that happen in the world), then it would start to lag. If that player above would have built his base with LP instead of AE2, the server would be below 1 TPS from that single player alone.

1.2ms tick time LP vs 0.28ms tick time PR - even though those numbers seem to be pretty small on their own - means in the end whether a server will be able to support 20 players or 50.

About the locks: I've been working on systems running millions and some even billions of concurrent tasks for the past ten years. This is of course not a requirement for LP, but I gained some experience in threaded programming in the meantime, and when I saw that code I could tell from experience where and when that would cause issue.

Java is very smart when it comes to concurrency and gives you great tools to maximize performance with minimal implementation, if those tools are actually used. The biggest sign that something is wrong is the word synchronized and the second sign is what is called "Reinventing the squared wheel", which essentially means: existing functionality from Java is copied, but worse. Both happens in LP a lot.

LP trades of a higher initialisation cost for lower running costs, and better scaling - try running a creative quarry through the system, and smelting all the ores into ingots, then routing each ore to a separate chest.

I did that and scaling is actually the thing LP performs worse than every other mod. I did the exact same setup with LP and ProjectRed pipes, and the bigger the installation becomes, the more time LP takes compared to PR. And if you do the same with Refined Storage instead, the timings become even worse, but that is kinda of an unfair comparison, as RS basically "cheats" and does not route items.

AartBluestoke · Answer 15 · 2020-05-12T11:58:06.000Z

which is why i asked if you could run a performance trace of cpu usage on your server, so that would highlight the components of LP that are consuming the time;
it could be those locks and a performance trace saying that 10% of all the main thread LP time was spent waiting on those locks would be specifically solvable problem,
it could be that accessing the inventory of a specific object, it could be that there is a lag spike to begin with when things are initialising, and a typical tick is low, it could be LP is accidentally recalculating something every tick.

JVisualVM doesn't significantly impact the performance of a runnign server, you can attach, record for a few minutes (the longer the better, as there will be better statistics about what is consuming cpu time), then can take a screenshot and post it here , or post the resulting file somewhere.

Share to