[omniORB] Disappearing oneway calls: Maybe I found the bug

Sai-Lai Lo S.Lo@orl.co.uk
10 Jul 1998 17:17:20 +0100


Jan Lessner <jan@c-lab.de> writes:

> In our application scenario we are using two cooperating servers which
> transmit both two-way and one-way calls in both directions. Consequently
> the involved file descriptors are registered as outgoing and incoming
> connections on both sides. Sometimes the check for idle incoming and
> outgoing connections are performed very quickly one after the other
> without any message dispatching occuring in between. For connections
> being registered as both incoming and outgoing, the first check resets
> the heartbeat flag and the second one finds it reset, assuming the
> connection to be idle. It therefore shuts down the Strand, causing all
> data to get lost which is received while the Scavangers are running.

With respect to a process, a connection can only be an outgoing or an
incoming but *not* both. This is due to the asymmetric nature of GIOP.
So in your scenario, there are two connections between the two
cooperating servers- each carry invocations in one direction only.

Therefore, within a process, a connection is either checked for idle incoming
or idle outgoing but *never* both. It seems to me your explanation does not
fit in with what is happening in the ORB.

However, you raise a point that I've not thought of before.

By default, the ORB runtime at each end of a connection performs its idle
check independently. A race condition can occur:

       +-------+                              +--------+
       |    (outgoing)                    (incoming)   |
       |  A    |----------------------------->|   B    |
       |       |                              |        |
       +-------+                              +--------+


   Event sequence
   --------------
    1. A send a oneway message.
    2. While A's message is in transit, B decides that the connection
       is idle and shutdown the connection.
    3. A's original message was never read from the socket and hence is lost.

If A is sending a request-reply, A sees B's shutdown as an orderly shutdown
and hence would retry the request by opening a new connection.

But if A is sending a oneway, A do not wait to see if the message has gone
through so it will not detect the connection shutdown to resend the message
on a new connection.

If your application cannot tolerate oneway message being dropped under any
condition, I suggest: turn off idle incoming check on both A and B. Just
rely on the outgoing check to retire idle connections. The disadvantage of
this approach is that your server lose its control on the number of
connections clients can open.

It seems to me this race connection is a feature rather than a bug as long
as one wants the server to have the ability to shutdown connections without
the cooperation from the clients.

Good detective work Jan!

Sai-Lai


--
Dr. Sai-Lai Lo                          |       Research Scientist
                                        |
E-mail:         S.Lo@orl.co.uk          |       Olivetti & Oracle Research Lab
                                        |       24a Trumpington Street
Tel:            +44 223 343000          |       Cambridge CB2 1QA
Fax:            +44 223 313542          |       ENGLAND