[omniORB] Server hangs

Duncan Grisby dgrisby@uk.research.att.com
Thu, 30 Nov 2000 10:02:15 +0000


On Wednesday 29 November, Mike Olson wrote:

>   Thanks for the quick turn around.  It seems to have fixed the
> concurrency problem, kinda.  The server now runs _much_ longer, however
> eventually, one of the connections returns a COMM_FAILURE exception. 
> The interesting thing is when this happens, all connections receive the
> COMM_FAILURE (minor 11).  The server is then unresponsive to connections
> for some perios of time (2-3 seconds), then it will start returning
> objects and everything is fine.  Granted, the times depend on your
> machine.

Oh yes, I was going to mention that. The problem is mainly with the
OS. What is happening is that on each iteration of the client's loop,
it is dropping all references to objects on the server, so omniORB is
closing the TCP connection and opening a new one the next time round
the loop. Each new TCP connection opened on the server starts a new
thread. The problem is that the operating system can't keep up with
closing down the dead threads and TCP connections, so eventually it
gives up and returns a failure to the ORB. That results in the
COMM_FAILURE exception. After a while, the OS manages to clean up the
stuff it has left behind, and things start as normal again.

In future, omniORB might keep connections open a bit after they are no
longer needed, but that has its own set of problems. For now, the
work-around is to keep at least one reference to the server. In that
case, omniORB keeps the TCP connection open until the connection has
been idle for a while. Alternatively, a while ago David Riddoch posted
a patch to keep connections open:

  http://www.uk.research.att.com/omniORB/archives/2000-11/0020.html


Cheers,

Duncan.

-- 
 -- Duncan Grisby  \  Research Engineer  --
  -- AT&T Laboratories Cambridge          --
   -- http://www.uk.research.att.com/~dpg1 --