[Pogamut-list] Recovering from Fatal Errors

jacob.schrum pogamut-forum at diana.ms.mff.cuni.cz
Thu Jan 27 15:36:36 CET 2011


Re: Recovering from Fatal Errors

Author: jacob.schrum

(Strange, my post was cut off in the middle. Here's the rest)

(UCC)               [INFO]    07:08:09.312                                                                   ID30 
(UCC)               [INFO]    07:08:09.312                                                                   ID30 Exiting due to error
(UCC)               [INFO]    07:08:09.312                                                                   ID30 Exiting.
(UCC)               [INFO]    07:08:09.312                                                                   ID30 FileManager: Reading 0 GByte 59 MByte 110 KByte 873 Bytes from HD took 0.292000 seconds (0.238000 reading, -1.#IND00 seeking).
(UCC)               [INFO]    07:08:09.313                                                                   ID30 FileManager: 2.307000 seconds spent with misc. duties
(UCC)               [INFO]    07:08:10.427                                                                   ID30 Name subsystem shut down
(NBUTServer33)      [SEVERE]  07:08:10.928                                                                    UT2004Parser: Can't parse next message: java.net.SocketException: Connection reset (caused by: java.net.SocketException: Connection reset)
cz.cuni.amis.pogamut.base.communication.parser.exception.ParserException: UT2004Parser: Can't parse next message: java.net.SocketException: Connection reset (caused by: java.net.SocketException: Connection reset) (at cz.cuni.amis.pogamut.base.communication.parser.impl.yylex.YylexParser.parse(YylexParser.java:107))
caused by: cz.cuni.amis.utils.exception.PogamutIOException: java.net.SocketException: Connection reset (at cz.cuni.amis.pogamut.base.communication.connection.impl.AbstractConnection$ConnectionReader.handleException(AbstractConnection.java:445))
caused by: java.net.SocketException: Connection reset (at java.net.SocketInputStream.read(SocketInputStream.java:168))
Stack trace:
ParserException[UT2004Parser: Can't parse next message: java.net.SocketException: Connection reset (caused by: java.net.SocketException: Connection reset)]
        at cz.cuni.amis.pogamut.base.communication.parser.impl.yylex.YylexParser.parse(YylexParser.java:107)
        at cz.cuni.amis.pogamut.base.communication.translator.impl.WorldMessageTranslator.getEvent(WorldMessageTranslator.java:121)
        at cz.cuni.amis.pogamut.base.communication.mediator.impl.Mediator$Worker.run(Mediator.java:299)
        at java.lang.Thread.run(Thread.java:619)
Caused by: PogamutIOException[java.net.SocketException: Connection reset]
        at cz.cuni.amis.pogamut.base.communication.connection.impl.AbstractConnection$ConnectionReader.handleException(AbstractConnection.java:445)
        at cz.cuni.amis.pogamut.base.communication.connection.impl.AbstractConnection$ConnectionReader.read(AbstractConnection.java:418)
        at cz.cuni.amis.pogamut.ut2004.communication.messages.gbinfomessages.Yylex.zzRefill(Yylex.java:4534)
        at cz.cuni.amis.pogamut.ut2004.communication.messages.gbinfomessages.Yylex.yylex(Yylex.java:4777)
        at cz.cuni.amis.pogamut.base.communication.parser.impl.yylex.YylexParser.parse(YylexParser.java:97)
        ... 3 more
Caused by: java.net.SocketException: Connection reset
        at java.net.SocketInputStream.read(SocketInputStream.java:168)
        at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:264)
        at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:306)
        at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:158)
        at java.io.InputStreamReader.read(InputStreamReader.java:167)
        at cz.cuni.amis.pogamut.base.communication.connection.impl.AbstractConnection$ConnectionReader.read(AbstractConnection.java:402)
        ... 6 more
---------------------------------------------------------------------------------------------------

Something to keep in mind when looking at this code is that I'm running six servers simultaneously, and this error seems to have happened simultaneously on each of the servers. Naturally, I would like this fixed.

However, my more general question is this: How do I sense errors like this within the code and handle them in such a way that I can recover from them? I'm doing evolution, so I run hundreds of evaluations, each on a new server. If one error comes up, I don't really care about it. I would like my code to automatically deal with the error by shutting down the offending server, maybe waiting a few minutes, and then relaunching it. However, I'm not sure where in my code I'm supposed to intercede to prevent these fatal errors from shutting down the Pogamut platform. All of the stack traces go back to Thread.run, which makes it hard to know where these threads are rooted in the code.

So basically, I would like to be able to sense the fatal errors within my code, and reset the server instead of closing the platform.

-Jacob

-- 
Reply Link: <http://diana.ms.mff.cuni.cz/main/tiki-view_forum_thread.php?forumId=4&comments_reply_threadId=4&comments_parentId=557&post_reply=1#form>





More information about the Pogamut-list mailing list