Connection timed out: EndRead Failure (on Linux ARM using Mono)
cn04 opened this issue ยท 22 comments
I've successfully configured a server to run on a Raspberry Pi using the 'mono' command to run the DMPServer.exe executable. The server works perfectly when connected to from the LAN, however when I port-forward the server (port 6702) and have a friend try to connect to it, the connection always times out while syncing. The client says, "Connection timed out" and the server console says "Lost connection: EndRead Failure".
The client will usually be in the middle of handshaking or 'syncing kerbals' when this occurs. I have noticed that the sync happens much more quickly on LAN (it only takes a few seconds) and from outside the network, it is very slow and times out.
I'm sure that the connection is actually being established, because the server does acknowledge that a new player has connected.
Any ideas would be appreciated! Thanks. I'm using the latest DMP client and server versions and the latest version of KSP. The client that times out while trying to connect will successfully connect to other public servers.
This is the original 512MB Model B Raspberry Pi, overclocked to 900Mhz.
I actually also have a Minecraft server running, however there were no players on it when I tested DMP and the 'java' process CPU load was about 18%. The Pi's RAM is only about 75% used by the idle Minecraft server. I have the official Raspbian OS.
Here's the terminal output from a player connecting remotely:
[22:14:32][INFO] : Client Cademasterflash handshook successfully, version: v0.2.2.0
[22:14:32][DEBUG] : Online players is now: 2, connected: 2
[22:14:53][DEBUG] : Sending Cademasterflash to subspace 49
[22:14:53][DEBUG] : Sending Cademasterflash 0 craft library entries
[22:14:53][DEBUG] : Sending Cademasterflash 20 kerbals...
[22:15:42][INFO] : Client Cademasterflash disconnected in ReceiveCallback, endpoint 68.112.242.253:49449, error: EndRead failure (Connection reset by peer)
[22:15:42][DEBUG] : Online players is now: 1, connected: 1
This is the output for a LAN player:
[21:57:24][INFO] : Client Kieran handshook successfully, version: v0.2.2.0
[21:57:24][DEBUG] : Online players is now: 1, connected: 1
[21:57:24][DEBUG] : Sending Kieran to subspace 47
[21:57:24][DEBUG] : Sending Kieran 0 craft library entries
[21:57:24][DEBUG] : Sending Kieran 20 kerbals...
[21:57:25][DEBUG] : Sending Kieran 31 vessels, cached: 0...
As you can see, there is a large gap in time between 'Cademasterflash's handshake and the sync beginning. Then, the timeout happens after the 'Sending Cademasterflash 20 kerbals' message has been displayed for 49 seconds. During this time, the client did show that the sync was happening, however it was doing so very slowly.
For 'Kieran', everything happens successfully within about 2 seconds.
Thanks!
Seems like an issue with Cademasterflash player. Maybe his latency to your server is too big, making him timeout while connecting?
Other players also experience the same problem.
I can't imagine why LAN is so much faster - this same person (Cademasterflash) can connect to the Minecraft server without issue.
It does seem like the server is deciding that the sync took too long, though.
I'm running a server on a Pi 2 with Arch Linux using mono as well, and no problems connecting from local or the Internet!
@cn04 did you try turning off the minecraft server and running the dmp server?
The problem persists when the DMP Server is the only thing running on the Pi.
The mono-complete package is also up to date.
I have tried changing the port that the DMP server runs on to 25566. There is no difference in the results of a player trying to connect.
I have noticed, however, that the connection of a remote player (outside the LAN) will sometimes succeed if nobody else is connected (e.g. there is not a player connected from LAN). If the internet connection of the server is being used by another device, the chances of a successful connection seem less.
This suggests that in either the DMP client or server - likely the client, since it is what reports the 'connection timeout' - there is a piece of code that gives up the connection to the server if the initial sync is progressing too slowly. For internet connections (such as mine) which are slower, the client gives up the connection when the speed falls below a certain threshold. Then, the server complains that it can't read from the client (because the client has disconnected.)
My internet connection is 1Mbps (1 megabit per second) up and slightly more (about 1.5Mbps) down.
I have also seen the connection time out for a connected player while they have already successfully connected to the server (the 'sync' has finished).
Does anyone know if there is a mechanism such as this in the client or server code that would 'time out' the connection if the sync (or any communication) takes too long? I personally know my internet connection is a bit slower than average - other users who have no intermittent problems may have faster internet connections.
Thanks!
So, you mean that the sync cannot in any case take longer than 20 seconds, or the incoming heartbeat from the client 'breaks' the server code?
Or could the client be waiting for a heartbeat from the server, and doesn't receive it because the server is too busy slowly transferring the sync data? I think this because the client is what reports the 'timeout'. The server simply seems to say that it can't communicate with the client..... 'EndRead'??
Is there any way to set the heartbeat variable without editing the source? If not, where in the source code is the heartbeat variable?
Ok, great.
I can find, in Server/Messages/Heartbeat.cs, a function which checks if the server did not recieve a heartbeat from the client in time (determined by a variable called CONNECTION_TIMEOUT (in Common).
Where is the "EndRead Failure" message generated on the server side?
Where is the "Connection timeout" message generated on the client side?
Oh, I've found where the client-side timeout message comes from: NetworkWorker.cs, which contains lots of network-related error handling functions.
Would changing that variable to a greater amount fix the issue?
CONNECTION_TIMEOUT is set to 20000 (20 sec.) INITIAL_CONNECTION_TIMEOUT (I assume when first connecting to the server) is only set to 5000 (5 seconds!) That seems too short.
So, either my internet connection or the Raspberry Pi is being too slow, and causing the client to give up the connection.
I really should comment things a little better -
Heartbeats are only sent when there is absolutely no network traffic
INITIAL_CONNECTION_TIMEOUT is the max time it takes from "Connecting" to "Connected" when you hit connect.
CONNECTION_TIMEOUT is the amount of time there has to be 0 bytes received in order to drop out.
It's unlikely connection_timeout will be hitting you.
It's possible there may be some lock issues going on, but unless people have been running into this it's unlikely as well :-/
After doing more testing, I've found that the problem very likely was not caused by a bug in DMP. Instead, it was likely due to the server machine running out of RAM. Anyway, it's working now. Thanks!