
[1.12.2] Regression in 1.8.4 related to OpenOS exception handling
RealityAnomaly opened this issue ยท 10 comments
In 1.8.4, when I cause an exception to be thrown in OpenOS, occasionally the console will be spammed with text, which boils down to the following error: /lib/process.lua:63: in function </lib/process.lua:59>attempt to yield across a C-call boundary
.
This does not happen in 1.8.3, but it requires a fresh install of OpenOS, for unknown reasons. This commit 9d4f7ea looks potentially suspicious as a cause for the problem as it was introduced in 1.8.4.
And also, can you tell me a bit more about the circumstances (MC version, progamm running...), so I can attempt to reproduce ?
I agree with you. It is certainly caused by resolving GHSA-54j4-xpgj-cq4g, after all it is something solved in OC 1.8.4 and it affected also pcall / xpcall. Knowing these functions are used for error handling, it wouldn't surprise me if it was the cause, but the problem is that it resolved a security issue allowing to DDoS a server by throwing an exception...
the following (as a standalone script) reproduces a similar issue, however here the error loop is caused by a yield in a call to gpu.copy
, not gpu.set
(as in the original error)
additionally, the error loop stops after a while
local function explode()
error(("\n"):rep(100))
end
local magicTable = setmetatable({}, {
__tostring = explode,
})
tostring(magicTable)
It appears that this isn't strictly speaking a regression; the same "C-call yielding" crash occurs in 1.8.3 and below, but these did not implement xpcall()
according to Lua's specification (which allows the error handler to be called recursively). This is more akin to a bugfix which uncovered a bug.
Seems this still isn't completely resolved. a46e77b causes a regression of its own. Specifically, the error message is lost since msg
on line 79 of /lib/process.lua
is not defined anywhere, since the original definition as a function argument on line 67 is now in a different scope. To fix this, the following changes must be made:
- change line 71 to
table.pack(debug.traceback(), msg)
- on line 78 replace
result[2]
withresult[2][1]
- on line 79 replace
msg
withresult[2][2]
Since lines 87 and 89 use result
, the value of which is modified by my change to line 71, there's a chance this fix will cause another regression. I simply don't know enough about OpenOS to say one way or the other, and don't have the energy right now to find out. Examining the code from prior to a46e77b suggests that in case of an error, result[2]
should be 128
, so adding result[2] = 128
between lines 83 and 84 may also be in order.
For sanitation purposes, I also suggest that line 69 be somehow modified to ensure that it necessarily returns a number. Otherwise code like error({reason="terminated", code="this is a string, not a number"})
will cause problems. The simplest fix is probably to just modify the condition on line 68 by adding and type(msg.code) == "number"
to the end.
Ok, I found another issue. My fix doesn't address what happens in an out of memory error. To properly report what happens, it appears better to fix it a different way, not like I described above. Replace lines 60 to 84 of the code, which in 1.8.7 looks like:
-- pcall code so that we can remove it from the process list on exit
local result =
{
xpcall(function(...)
init = init or function(...) return ... end
return code(init(...))
end,
function(msg)
if type(msg) == "table" and msg.reason == "terminated" then
return msg.code or 0
end
return debug.traceback()
end, ...)
}
if not result[1] and type(result[2]) ~= "number" then
-- run exception handler
xpcall(function()
local stack = result[2]:gsub("^([^\n]*\n)[^\n]*\n[^\n]*\n","%1")
io.stderr:write(string.format("%s:\n%s", msg or "", stack))
end,
function(msg)
io.stderr:write("process library exception handler crashed: ", tostring(msg))
end)
end
with this version:
-- pcall code so that we can remove it from the process list on exit
local traceback = "" -- <-- additional line
local result =
{
xpcall(function(...)
init = init or function(...) return ... end
return code(init(...))
end,
function(msg)
if type(msg) == "table" and msg.reason == "terminated" then
return msg.code or 0
end
traceback = debug.traceback() -- <-- changed "return" to "traceback ="
return msg -- <-- additional line
end, ...)
}
if not result[1] and type(result[2]) ~= "number" then
-- run exception handler
xpcall(function()
local stack = traceback:gsub("^([^\n]*\n)[^\n]*\n[^\n]*\n","%1") -- <-- changed "result[2]" to "traceback"
io.stderr:write(string.format("%s:\n%s", result[2] or "", stack)) -- <-- changed "msg" to "result[2]"
end,
function(msg)
io.stderr:write("process library exception handler crashed: ", tostring(msg))
end)
result[2] = 128 -- <-- additional line
end
If I'm not mistaken, this should return behavior to 1.8.6 in all important situations, while retaining the fix for the issue from 1.8.4.
Do you have test scripts to reproduce the bugs in behaviour, so it doesn't happen again?
Ok, I don't know if this is related to this thread or not, but I found a possible error handling bug with some really cursed code. Here's the output:
And here's the code that produced it (in test.lua
):
local x = 0
local y = 0
xpcall(error, function()
x = x + 1
y = 0
xpcall(error, function()
y = y + 1
error()
end)
print("x: ", x)
--print("y: ", y)
error()
end)
print(x)
print(y)
The code does have a lot of extraneous stuff but the output seems very sensitive to normally inconsequential details so I'm hesitant to remove the fluff.
In order to make the error, it needs to be run like so:
./test.lua | less
And even like this it only produces the error maybe 1/2 to 1/3 of the time.
Note that this result is produced both with my second fix and on a fresh install of OpenOS 1.8.7.
Update: after running it a bunch more times it randomly added attempt to yield across a C-call boundary
to the end of the error message:
Update 2: using this instead reliably produces the problem 100% of the time:
./test.lua | lua
Here:
error("err")
Intended output:
/home/test.lua:1: err:
stack traceback:
<traceback>
with exit code 128.
Actual output (1.8.7):
:
stack traceback:
<traceback>
with exit code 2.
The other two cases that need to be handled correctly are os.exit(<number>)
, which should produce the right exit code, and out of memory, which should have any output that indicates that OOM happened. These two are in fact satisfied in 1.8.7, but my original fix broke OOM reporting (but did not affect os.exit(<number>)
). If I'm not mistaken, my second fix handles all three cases correctly.
On a technical level, when OOM happens, the original xpcall
that executed the program returns false, "out of memory"
without invoking the callback function. With my original fix, this crashed the exception handler without causing the string "out of memory"
to make its way to the user.