CC: Tweaked

CC: Tweaked

57M Downloads

Investigate alternative Lua VMs

SquidDev opened this issue · 31 comments

commented

LuaJ/Cobalt has been CC's Lua runtime for as long as the mod has existed (9 years?). While it's served us fine, there's still a fair number of annoying deficiencies and idiosyncrasies (cc-tweaked/Cobalt#31, cc-tweaked/Cobalt#22, cc-tweaked/Cobalt#51), and is becoming increasingly behind PUC Lua.

One idea asie suggested a few years ago was compiling Lua to Java bytecode via WASM and asmble. I think it'd be quite fun to look into this and see how practical it is.

I'm not expecting this to be a panacea - even if we get it working, there's a lot of changes which need to be made to get the Lua VM working in a way which is compatible with what CC needs (though craftos2-lua may be a useful starting point). However, at this stage I'm more interested in seeing what's possible at all.


Who am I kidding? I just want to make our tooling even more complex :D:.

commented

Question since it is related, Why isn't LuaJC used since it compiles Lua to java bytecode code?

commented

A couple of reasons:

  • It's very brittle, and these bugs are incredibly hard to diagnose and fix. See SquidDev/luaj.luajc#4, which took 10 months to fix. It'd probably be easier now (I've learned a lot in the last 5 years), but still ouch.
  • It's incompatible with many of the changes which have been made to Cobalt since (true coroutines, much higher stack depth).
  • All-in-all, it's not actually that much faster. A lot of the speed improvements came from it doing things wrong (i.e. no debug hooks). As those were fixed, it got much slower.

It's definitely possible to do a decent Lua->JVM bytecode compiler (see Rembulan), but it's not an effort I'm willing to undertake myself.

commented

What Java version did you benchmark Waluaigi on? ByteBuffer writes didn't always use Unsafe.

commented

8 and 16 IIRC.

commented

Just adding an idea from Discord:
What if the PUC Lua VM was run in another process? This would meet (1) (as it's in a separate process, so crashes can be handled without bringing the server down), (2) (as we can set it up with memory limits through custom allocation functions), (3) (as it'll be all in C/native code), and (4) (as it is the same standard Lua code). However, if just the VM was in another process, there would be a lot of context switches to communicate things like filesystem access, terminal I/O, etc. for each call.

A way to fix this would be to run everything in that process, except for the things that directly interface with the world. This pretty much comes down to displaying the terminal, taking input, peripherals, and turtle movement. For I/O, this could be as simple as having the process send screen updates every so often, and then Minecraft sends back events to queue on each computer. Peripheral and turtle calls would have to be manually redirected back to the main process, however.

Perhaps an easy way to test this out would be to use a CraftOS-PC raw mode background task that runs over the world file (self-promotion not intended). Then CC would essentially be a raw mode renderer for all of the running computers, like the VS Code extension. This would not support peripherals or turtles as-is, but an extra plugin or code modification could be added to implement the required IPC to make the calls. One issue with this idea is that raw mode currently only supports 256 windows at once, but the protocol could be updated to support more windows.

commented

To reduce the number of times 1 event is pushed across to the other process, you could have the Java side and Native side have their own queues. When the Native side empties its event queue it requests the Java side to send the whole queue. Should the Java side also be empty then it will forward the event when it happens.

Computers could also check if their GUI is open before asking for terminal data to be streamed, this might cause a delay when opening the GUI but one could justify that as the ingame computer having gone to sleep or something.

commented

I managed to nerd-snipe myself a couple of times this year into having another look at Waluaigi (I know, I know), and felt it was worth doing a bit of a year-end braindump.

Requirements

Firstly, worth reiterating some things I'd like to see from an alternative Lua VM:

Safe

Effectively the Lua runtime shouldn't be able to tough anything else. The runtime can crash, but this shouldn't affect other computers or Minecraft's own stability.

Unfortunately, this rules out PUC Lua. It's fairly easy to cause a segfault when you've access to the debug API. It's possible we could harden against that, but I honestly don't know if you can avoid all bugs. Thanks C (derogatory).

Any Lua VM which is written in Java is inherently safe (thanks Java! (affectionate)), though of course you could get the same effect by writing a new Lua VM in a memory safe language (Rewrite It In Rust) or running the PUC Lua in a safer runtime (WASM).

Resource constraints

Computers should shutdown if they execute for too long without yielding1. This is fairly easy to do - Cobalt (mostly) does it well, and it's easy to do the same in other Lua VMs.

What Cobalt doesn't allow us to do is constrain the memory the computer is using. Computers basically have access to all the memory available to Java. In practice, this isn't a problem very often (there's much easier ways to bring a server to its knees), so I don't think it's a must have, but would still be nice.

However, tracking memory usage basically means the Lua VM needs to manage/track its own memory, which pretty much rules out any Java-based solution (including anything using Graal).

Finally, we should be able to suspend the runtime at any point and then resume it afterwards. This is fairly easy to do in a naive way (just pause the current native thread). However, that's not suitable for the next point:

Persistence

This really is a stretch goal, but also I feel if we're going to switch Lua VMs, we should do it once and get it right. Persisting a paused Lua VM is pretty easy (conceptually, there's still a lot of code you need to write): just dump the Lua state.

However, if you allow spending at arbitrary points, things get much tricker. If you have implemented suspending by just pausing the thread, there's not really any meaningful way to persist the current callstack.

Unfortunately, doing suspending "properly" (i.e. unwinding the stack each time) is really hard to do. The easiest way is to make it possible to yield from any C function, which gets really hard for something like load.

One thing I experimented with in waluaigi is performing a CPS-rewrite to the whole program, using binaryen's asyncify pass. It actually works surprisingly well, but does come with a pretty hefty performance hit (half as fast). I'm not convinced it's worth it :(.

Notes on Waluaigi

Last year I had a look at compiling Lua to WASM and then to Java, which unfortunately turned out to be unbearably slow. The main issue here is that the main interpreter loop gets compiled to a function so large that the JIT won't touch it. I think there's some smart ways this could be avoided (method splitting, etc...), but haven't looked into it.

Instead, I spent some time fiddling with native ways of running WASM, namely wasmtime and wasm2c.

wasmtime is nice because it does a lot for us (memory safety, timeouts). Performance is better than Cobalt (about 25% faster), though I would say disappointingly slow (native Lua is 150% faster than Cobalt). wasm2c is better (~80% faster. I didn't spent a lot of time tuning optimisation flags, so I suspect it can be faster).

The big problem with any native approach is that you have to use the JNI, and oooh boy it's slow! While Java -> native calls are fine, native -> Java calls have a lot of overhead. While computers on average are quicker, any call into Java (so the term API, peripherals) are noticeably slower.

One approach here would be to convert some of CC's core libraries into native code (inside our outside of the wasm sandbox). I think this is similar to what Jack is proposing, though I have some concerns:

  • While this improves the performance of core libraries, it doesn't help with peripherals. This is especially a problem with monitors, which have the same performance needs as the term API).
  • If day-to-day maintenance of the mod requires touching native code, this massively raises the knowledge required to contribute. I think it's fine having the Lua runtime be native (most people don't need to care about it, and it's not like Cobalt is easy to contribute to), but if that bleeds into CC itself that's much more messy.

There's some tiny optimisations you can do to reduce the native -> Java overhead, but nothing which helps enough. Project Panama (aka Java's new FFI API) will help a lot (I forgot to write my numbers down, but IIRC it's ~50% faster), but it's unclear when that will be available inside Minecraft.

I also have concerns about how to handle speaker.playAudio: fetching all 130k numbers from a table is gonna be slow, so would need some additional API to make batch calls.


Sorry, bit of a ramble here. Hopefully this is useful for someone, even if that someone is just me in another 6 months!

Footnotes

  1. Now that we can pre-emptively interrupt computers, one could probably get away without this. However, it forces people to not write infinite loops, so I think it's a property worth having.

commented

Unfortunately, doing suspending "properly" (i.e. unwinding the stack each time) is really hard to do. The easiest way is to make it possible to yield from any C function, which gets really hard for something like load.

It occurs to me that one could sidestep this by doing a CPS-rewrite of just the parser. One possible way to do that would be to port it to Rust and then use async/await` to handle the transform for you.

The other problem is suspending in __gc methods. This is basically impossible to do correctly(TM), as the GC may be invoked at any point. It may be that we choose not to support __gc outside of userdata.

commented

However, tracking memory usage basically means the Lua VM needs to manage/track its own memory, which pretty much rules out any Java-based solution (including anything using Graal).

Graal has an API for monitoring allocations, for what it's worth: https://www.graalvm.org/truffle/javadoc/com/oracle/truffle/api/instrumentation/Instrumenter.html#attachAllocationListener-com.oracle.truffle.api.instrumentation.AllocationEventFilter-T-

There's also convenient resource limits... which are only available on GraalVM EE, so aren't viable: https://www.graalvm.org/22.1/reference-manual/embed-languages/sandbox-resource-limits/

commented

Graal has an API for monitoring allocations, for what it's worth

So we don't actually need to use Graal to do this - we could do this manually by invoking some hook whenever an array is allocated1. There's a couple of options of what you could do in this hook:

  • Track a subset of allocated objects, using them to estimate the total memory usage. I've got some more comprehensive notes about this on the Cobalt repo (cc-tweaked/Cobalt#66), but I've some concerns:

    • I'm not convinced this is foolproof against someone trying to break the system.
    • You're very sensitive to the JVM's memory usage. If the GC runs very infrequently (common when dealing with large heaps), computers may OOM due to unused objects waiting to be GCed, with no way to fix that.

    It's definitely better than the current situation, but maybe not enough that I'd want to commit to it long-term.

  • Periodically walk the entire object graph and compute the retained memory. From what I can tell, this is what the GrlalVM EE implementation does2. It's more accurate, though it really means you're implementing another GC (well, the marking element at least) on top of Java's existing one, which is a bit sad!

Footnotes

  1. The reasoning here being that arrays are responsible for most of our memory usage.

  2. See this section. Thanks for the link here, wasn't aware of htis!

commented

You're very sensitive to the JVM's memory usage. If the GC runs very infrequently (common when dealing with large heaps), computers may OOM due to unused objects waiting to be GCed, with no way to fix that.

Could we add a way for computers to tell the GC to collect specific objects now? So if someone has a large object that they are done with then they can clean up memory sooner.

commented

Could we add a way for computers to tell the GC to collect specific objects now?

Not without invoking a whole Java GC, which is something we really don't want to allow doing.

commented

I think this is looking very unlikely at this point.

commented

On the back of a whole slew of more Cobalt issues (#811 and friends), going to re-open this.

What I'd really like here is a Lua runtime which satisfies the following requirements:

  1. Safe - we should be able to run Lua code without the risk of sandbox escapes, for a VM with the debug library enabled. I'm less fussed about bytecode, though it would be nice to have.
  2. Allows restricting memory - this is one of the areas CC:T is the worst at. Frankly, if you want to crash a server, there's easier way to do it than OOMing via CC, but we should still guard against it.
  3. Fast. Or at least no worse than Cobalt most of the time.
  4. Maintainable.

Cobalt satisfies 1 and 2. Waluaigi satisfies 1, 2, and 4 (well, I think - the patches are pretty minimal, but no clue how back-compat will work). Native Lua satisfies 2, 3, and 4 (not 1, because Lua bugs can and do crash the game).

I'm a little wedded to Waluaigi I must confess, but obviously rather biased here. One possible solution to the performance hit would be to bundle a wasm runtime (wasmtime seems the easiest) with the jar, though that would bump the size by about 10MB (from <2) - still smaller than OC, but hardly a trifle either!

commented

In an ideal world, we could force Minecraft to launch with with -XX:+EnableJVMCI, and use Graal's wasm runtime, but I don't think that's very likely :(.

commented
1. Safe - we should be able to run Lua code without the risk of sandbox escapes, 
   for a VM _with the debug library enabled_. I'm less fussed about bytecode, 
   though it would be nice to have.

So I'm guessing anything that compiles to java bytecode would be difficult to sandbox...

commented
commented
commented
commented

It's worth noting that something which compiles to bytecode doesn't have the same risks as running arbitrary bytecode. Most sandbox escapes come from library code, which isn't an issue if you can control what's emitted!

Curious if you've got anything in mind @sir-maniac?

commented

It's worth noting that something which compiles to bytecode doesn't have the same risks as running arbitrary bytecode. Most sandbox escapes come from library code, which isn't an issue if you can control what's emitted!

I was thinking similarly, actually. Most of the checks can be done at compile time, and I assume the design of the language would simplify that task as well.

Curious if you've got anything in mind @sir-maniac?

Nothing much more than brainstorming, but after further thought, probably not a solution to improving maintainability, or helpful to this discussion.

Pipe dreams

Wondering if it might be possible to implement a close 1-to-1 java version of the PUC code. I theorize it might make maintenance easier, as any changes to PUC can be walked through with side-by-side diffing. (But I haven't gone through code bases that much yet, for all I know cobalt already does that.)

Also wondering if maintaining a fork of a C-To-Java compiler like this one, with a few hand-written bits for code that doesn't translate well, might be easier than maintaining a java clone of lua.

commented

Wondering if it might be possible to implement a close 1-to-1 java version of the PUC code

There's some things which could be done to make Cobalt closer to PUC Lua. For instance, I'd quite like to change the function calling behaviour to be closer to it (i.e. accept a Lua state and just poke the stack instead of receiving a Varargs).

It'd definitely make some things easier, and in some ways isn't too much work (well, can't be any worse than some of the other Cobalt refactorings :D). However, it definitely wouldn't solve all my problems (memory management being the key one), so would like to explore other options before dedicating too much time.

Also wondering if maintaining a fork of a C-To-Java compiler like this one, with a few hand-written bits for code that doesn't translate well, might be easier than maintaining a java clone of lua.

So waluaigi was sort-of meant to be this - compile C to Java via wasm (though to bytecode, so it's not exactly hand-editable). The issue here is that the JIT refuses to go near it, and so performance is bad (well, on Java 8. Might have changed on 16)1. Generating more idiomatic Java code might improve things - haven't tested!

Footnotes

  1. Update: It's not.

commented

This translation from chinese shows several resources, direct translation from c to java (bytecode) is doable, but none of it seems to be maintained.

Found this. Not a solution, but interesting. GraalVM could theoretically run PUC Lua itself.

commented

Yeah, Sulong (or Graal's wasm compiler) would be neat, but sadly not really feasible yet. Maybe on 1.17 (and so JDK 16) we could start a new Java VM running in Graal mode, but I suspect the cross-process communication overhead would be too expensivel

commented

Probably something already known about: rembulan apparently compiles lua directly to java bytecode.

commented

The Rembulan developer and I had a discussion about it when it came out, and I did integrate it into CCTweaks. It'd actually be much easier to get working now too, as CC's coroutine model is more sane.

There's some really nice bits about it - it's incredibly fast from memory, and there's some nice design choices. Some of Cobalt's later changers were inspired by it for sure. However, I'm pretty anxious about starting to use it - it's pretty unmaintained at this point (last commit in 2016), and that makes me pretty worried about picking it up.

commented

Yeah, I noticed that too. There does still seem to be some activity on this fork, but it does seem like it would require a new maintainer.

It's a shame, though. I can see switching from their current coroutine system to using using the kotlin coroutines CPS, which would allow the native parts to be rewritten in kotlin to improve readability and maintainability. IMHO it would be a really fun challenge, but would need enough people interested in contributing.

But in time project loom will be released, and java will natively support tail optimization and lightweight threads, and the problem would be easier to solve. Of course, assuming minecraft would eventually switch from java 8.

commented

With how small the core Lua language is, implementing it on GraalVM seems like it could be an interesting idea...

commented

One idea asie suggested a few years ago was compiling Lua to Java bytecode via WASM and asmble. I think it'd be quite fun to look into this and see how practical it is.

I've pushed some code to this repository (CC-related code here), which sort of gets this working (I can get in game and write code on a computer, nothing more is implemented).

However, in its current state, things are incredibly slow (~10x worse than Cobalt). I've not really looked into this (should probably do some fiddling with JITWatch), but I imagine the main interpreter loop (luaV_execute) is either not being jitted at all, or spitting out some truly terrible machine code (it compiles to 33k instructions, compared with 4k for Cobalt's, so hardly a surprise).

Either way, while I'd love to see this working, I don't think I can really justify any more time spent on it.

With how small the core Lua language is, implementing it on GraalVM seems like it could be an interesting idea...

Agreed. However, Graal is not really widely available (especially not given MC still defaults to Java 8), so isn't really a practical option right now.

commented

Agreed. However, Graal is not really widely available (especially not given MC still defaults to Java 8), so isn't really a practical option right now.

GraalVM supports Java 8 and can be installed via Maven: https://www.graalvm.org/reference-manual/js/RunOnJDK/

commented

While it supports it, my understanding is that one doesn't get the performance benefits of it unless running under a Grall VM (or VMs which support -XX:+EnableJVMCI, but that requires passing additional arguments to Java).

I may be wrong though - haven't looked into this enough.