[omniORB] timeout detection (client-side)

Duncan Grisby duncan at grisby.org
Mon Nov 26 15:40:44 GMT 2007


On Wednesday 21 November, "Michael Kilburn" wrote:

> Hmm... From my tests I have a feeling that COMM_FAILURE gets generated
> if underlying ORB had non-closed socket to the server (which died) and
> and I attempt to send a request there. Which is kind of unpredictable,
> since you never know if ORB has connection or not...
> Question: in such cases, will omniORB (after detecting that socket
> connection is dead) attempt to reestablish connection in the same call
> or I need to catch these cases and call again, expecting TRANSIENT?

If a connection has previously been used successfully, and a subsequent
call fails when marshalling the request, omniORB will transparently try
to reconnect, and you will see a TRANSIENT. If the connection breaks in
the middle of sending a request (e.g. because the server crashes in
response to the call), you see COMM_FAILURE.

> My original intention was to find reliably whether:
> - server is 100% dead (i.e. box is reachable, but related port is not
> listening)

In that case, you will definitely get TRANSIENT with minor code
TRANSIENT_ConnectFailed.

> - server is potentially alive (i.e. socket can't connect due to
> timeout (due to network problems, or server overload))

If the server is too busy to accept the new connection, you will still
see TRANSIENT_ConnectFailed, since the client can't tell the
difference. If it accepts the connection but is too busy to start
processing it, you'll get TRANSIENT_CallTimedOut.

> - server is 100% alive, but has problems in business logic (i.e.
> request was sent and received successfully, but request processing
> timed out, e.g. due to deadlock)

In that case, you'll get TRANSIENT with minor code
TRANSIENT_CallTimedOut. There is no way a client can tell the difference
between this situation and the case that the server is just too busy to
respond.

> - server is 100% alive, but there are problems in CORBA layer (no
> resources, protocol incompatibilities and so on)

You'll most likely get some other system exception like MARSHAL,
BAD_PARAM, BAD_OPERATION, etc. If the server gets confused in a way that
causes it to drop the connection, you'll get a COMM_FAILURE.


-- 
 -- Duncan Grisby         --
  -- duncan at grisby.org     --
   -- http://www.grisby.org --



More information about the omniORB-list mailing list