Memory limits for computers

Question

Memory limits for computers

SquidDev opened this issue a year ago · 6 comments

SquidDev commented a year ago

One of ComputerCraft's longest standing issues is that computers can use an unbounded amount of memory. This means that malicious users can allocate large amounts of memory, slowing down or outright crashing servers, in a way that is hard to detect.

I was originally hoping to solve this as part of #769. However, several years later, I think it's looking unlikely that issue is going anywhere - I've done a lot of research and experimental work, and not satisfied with any of the options.

There is some discussion in that issue on how we might impose memory limits in Cobalt, and after some thought, I think it offers a way forward. My initial plan was to track a small number of allocations in the Cobalt VM, and use that to approximate the computer's memory usage (cc-tweaked/Cobalt#66). However, as that issue details, I don't think this approach is as robust as I originally hoped.

Instead, my plan is to go with something closer to what Graal does: periodically compute retained memory, and error if it's over the limit.

There's quite a lot of risk here, and I've definitely not got a concrete plan here¹, so instead I want to do this quite incrementally, in a way that each step adds some value.

Measure allocations of computers: Use ThreadMXBean.getThreadAllocatedBytes to track how much memory a computer allocates², and expose this as a per-computer metric.

While this probably isn't very useful for finding actively malicious computers, I think this is probably useful for passive monitoring of computers (i.e. cc-prometheus). Mostly, I think it's useful to gather some data for what the allocation pattern of "normal" computers looks like.
Measure retained memory of computers: Add a system to compute the reachable/retained memory of a LuaState, compute that after each task (or after every $N$ bytes allocated), and expose this as a metric.

My biggest concern here is performance, and whether we can do a scan of the whole VM in a reasonable time. I'm not aware of any sensible GC-like strategies we can apply here (i.e. incremental or three-colour GCs³), as most of those will work against Java's existing GC. Computers tend not to use much memory, so I'm hoping this should be pretty fast, but I honestly don't know!

I'm not 100% sure how this should interface with CC yet. We need to include the main-thread queue and event queue in this retained memory, but ideally we'd also track all Lua-related objects (especially HTTP handles).
Managed allocations in Cobalt: All (non-trivial) allocations in Cobalt should now be done through some Allocator class, which (for now) just counts how many bytes are allocated. This effectively gives us the same as 1., but will allow us to hook into it in the future.

We might want to expose this as a separate metric, so we can see how this compares to the
allocation rate given by JMX.

I think Lua -> Java calls also need to through this interface. The LuaString to String conversion allocates quite a bit, and we want a scheme which avoids MightyPirates/OpenComputers#1774 without having to do string interning.
Hard limits on large allocations: If some code attempts to allocate more than $N$ bytes (probably where $N$ is half the computer's soft limit), throw an "out of memory" error immediately.
Interrupt the VM after lots of allocations: Using the monitoring in 1. and 3., interrupt the VM after allocating $N$ bytes. In this case, we suspend the VM, and compute retained memory.

There are then two cases here:
- If we're above the "hard" memory limit (either defined as the soft limit * 1.5, or soft limit + some constant), terminate the computer immediately.
- If above the soft limit, resume the computer with an error.
Like with timeouts, we probably want some mechanism to force-terminate the computer if it continues to run after a pause has been requested.

I'm not thrilled about resuming the computer with an error - it does mean the error will occur after the allocation, so source positions won't be entirely accurate.

Like I said, definitely not a concrete plan, but hopefully helps us moving towards something working!

Though if you've been paying attention to the news in the UK, you'll know concrete isn't all it's cracked up to be. ↩
Computed both after each task, and in the computer thread monitor. ↩
We can use a two-colour marking system for most objects (aside from strings, as those are shared across VMs), so at least we don't need to generate massive HashSets. ↩

RadiantUwU · Answer 1 · 2024-03-27T16:03:22.000Z

Used string.rep('\n',0xFFFFFFF) and filled an array with it till it threw a java OutOfMemoryError

SquidDev · Answer 2 · 2023-09-03T20:23:12.000Z

Marking as "feedback wanted", as thoughts/ideas/glaring-holes-in-the-plan are appreciated!

SquidDev · Answer 3 · 2023-09-08T12:44:26.000Z

Measure retained memory of computers [...] My biggest concern here is performance, and whether we can do a scan of the whole VM in a reasonable time.

It's definitely not as quick as I'd like, which probably is a good hint to how little I understand low-level performance :D.

A newly booted computer uses about 700KB¹ of memory, which we can scan in 1-2ms.

However, if we create a lot of objects (testing with local a = {} for i = 1, 1e6 do a[i] = {} end, which takes up 36MB), the resulting object graph takes 60ms to scan. This, as one would expect, scales linearly - using 2e6 in the same loop results means the object graph takes 120ms to scan.

This means we definitely we don't want to compute retained memory after every computer resume, but only after every N bytes allocated.

Memory counting should be included in computer execution time, and should probably be made interruptable. There's a risk of malicious program creating a large/complex object graph, and then forcing it to be scanned multiple times, which we need to protect against.
While computers don't really have any scheduling guarantees, introducing 10-100ms delay between two events is not ideal. I don't really have a good proposal for this.

VisualVM is not especially useful here, as most things seem to get inlined into the MemoryCounter loop. We should probably try running some small benchmarks with JITWatch and perf to see what exactly is going on

According to Cobalt. The numbers used to compute used memory are somewhat made up. ↩

SquidDev · Answer 4 · 2023-11-09T18:38:57.000Z

Have implemented the initial allocation tracking in 76968f2. This should at least give us some basic metrics on average memory growth.

lexi-the-cute · Answer 5 · 2023-12-08T09:08:57.000Z

Have implemented the initial allocation tracking in 76968f2. This should at least give us some basic metrics on average memory growth.

thanks! i myself am concerned about malicious users when i as the server owner accidentally crashed my own server messing around with web sockets. used up all of Java Heap and dropped ticks to 1 tick over 60 seconds

SquidDev · Answer 6 · 2024-01-04T17:35:04.000Z

Mostly, I think it's useful to gather some data for what the allocation pattern of "normal" computers looks like.

We've been running the above commit on SwitchCraft for a few weeks now. Here is a plot of the median, 90% quantile, 99% quantile and max allocation rate over the last 3 hours:

As we can see, the majority of computers don't allocate very much at all (the median hovers around 250KiB/s). 90% are sitting at less than 2MiB/s, but the remaining 10% can go all the way up to 100+MiB/s.

Here's also a histogram breakdown, which gives a better idea what the long tail looks like.

This is definitely a larger long-tail than I expected. I was assuming things would cap out at around 8MB/s or so, except in rare cases (reading files). Combined with the performance issues of memory tracing¹, makes me less sure that the method outlined in the OP is viable

Maybe we do need to go the sampling approach after all :/. It should be fairly easy to prototype at least.

See https://github.com/cc-tweaked/CC-Tweaked/issues/1580#issuecomment-1711612378. I've got it about 2x faster since then, but it really needs to be 100x faster, and I don't think that's possible. ↩

Share to

Footnotes

Footnotes

Footnotes