[omniORB] Stuck in socket recv() ?

Thu Jan 18 10:33:00 GMT 2007

On Wednesday 10 January, "Wernke zur Borg" wrote:

> I have a 1:1 client/server application, i.e. one Java client, one
> omniORB server. The server must get aware of client failures
> immediately, therefore I have defined a callback() method on the client,
> which the server calls right at the beginning of a lengthy session. The
> Java client simply blocks the call and only returns when the session is
> to be closed. This way the server also recognises client crashes with a
> COMM_FAILURE.
> 
> The scheme works nearly perfectly, but only once in a while the
> callback() does not return with the expected failure when the client is
> killed. A stack trace is given below, it shows that the call is blocked
> in the socket recv(), even though the TCP connection cannot exist any
> more, as the other side was killed. 
> 
> I did a netstat -a on both sides and, strange enough, the TCP connection
> is shown as ESTABLISHED on the omniORB server side and is not listed at
> all on the Java side. How is this possible?

The issue is that TCP connections don't inherently notice if one side
vanishes. omniORB doesn't enable TCP keep-alives, and they're generally
not very useful anyway, so it's entirely possible for the OS to not know
that the other end of a TCP connection has gone. It will only notice if
an attempt is made to send some data across the connection.

The usual way to implement the kind of thing you're talking about is to
periodically ping one way or the other so that data is actually
transferred between the processes.

Cheers,

Duncan.

-- 
 -- Duncan Grisby         --
  -- duncan at grisby.org     --
   -- http://www.grisby.org --