[omniORB] Problems with corbaloc

Duncan Grisby duncan at grisby.org
Thu Nov 23 12:15:15 GMT 2006


On Tuesday 21 November, Nigel Rantor wrote:

[...]
> 1) start service on A
> 
> 2) service on A attempts to contact B, B is not running yet, fine.
> 
> 3) start service on B
> 
> 4) service on B attempts to contact A, A is running and replies.
> 
> 5) kill service on B
> 
> 6) start service on B
> 
> 7) service on B attempts to contact A, A is running and has an
> operation invoked on it successfuly by B. A then attempts to invoke an
> operation on B and a CORBA.COMM_FAILURE is raised.

The issue is that A has cached a connection to B. Since B restarts
listening on the same port, A thinks its cached connection is valid.
It's not until it tries to use it that it notices that it's broken.

Because of the way the network stack works, the socket send call that A
performs actually succeeds even though the data has nowhere to go. It's
only when it tries to do a receive that it finds out about the failure.
Since as far as it's concerned it has sent the message, it has no way to
know whether B did actually get the message, or whether it's safe to
retry, so it has to throw a COMM_FAILURE exception. In other situations
(like sending a big request message) the socket send fails, and omniORB
knows it's safe to retry.

The solution for you is to install a COMM_FAILURE exception handler that
retries once. After the COMM_FAILURE, omniORB will open a new connection
and it'll be fine.

> If I leave the service on B dead for long enough this problem does not
> occur, so I turned tracing on and found that once the service on A
> gets to the point where it prints the below message out I can then
> kill and restart the service on B and everything works.

[...]
> omniORB: sendCloseConnection: to giop:tcp:172.16.69.250:9991 12 bytes
> omniORB: Client connection refcount (forced) = 0
> omniORB: Client close connection to giop:tcp:172.16.69.250:9991

This is omniORB closing the idle connection (which is actually broken,
but it doesn't know that). Once it's closed, a new call will open a new
connection, hence the lack of exception.

> omniORB: throw giopStream::CommFailure from
> giopStream.cc:835(0,NO,COMM_FAILURE_UnMarshalArguments)

This is merely an internal implementation detail. It's caught by another
bit of omniORB, so it's not expected to propagate to your application
code.

Cheers,

Duncan.

-- 
 -- Duncan Grisby         --
  -- duncan at grisby.org     --
   -- http://www.grisby.org --



More information about the omniORB-list mailing list