[omniORB] Corrupt reply header causes multiple invocations

Chris Newbold cnewbold@laurelnetworks.com
Fri Jun 21 21:29:01 2002


I think there is a problem with omniORB's handling of corrupt reply
headers. It looks like the receipt of a corrupt header by a client on a
"reused" connection results in a CORBA::TRANSIENT exception, which in
turn causes the client to retry the call. Depending on how the server
reacts to duplicate invocations, this may result in an endless loop.

(The cause of the corrupt reply header is still under investigation but
is most likely the result of some application-level heap catastrophe.)

We're using omniORB 3.0.4pN where N is pretty recent...

In giopClient::ReceiveReply() omniConnectionBroken is thrown if the
reply header is bad; it would appear that the completion status
COMPLETED_MAYBE is correct (since we don't really know what happened on
the server):

      if (hdr[0] != MessageHeader::Reply[0] ||
          hdr[1] != MessageHeader::Reply[1] ||
          hdr[2] != MessageHeader::Reply[2] ||
          hdr[3] != MessageHeader::Reply[3] ||
          hdr[4] != MessageHeader::Reply[4] ||
          hdr[5] != MessageHeader::Reply[5] ||
          hdr[7] != MessageHeader::Reply[7])
        {
          // Wrong header
          setStrandIsDying();
          OMNIORB_THROW_CONNECTION_BROKEN(0,CORBA::COMPLETED_MAYBE);
        }

At the end of omniRemoteIdentity::dispatch(), where omniConnectionBroken
is ultimately caught, we turn the exception into CORBA::TRANSIENT and
preserve the completion state:

      }
      catch(omniConnectionBroken& ex) {
        if( reuse ){
          CORBA::TRANSIENT ex2(ex.minor(), ex.completed());
          throw ex2;
        }
        else {
          OMNIORB_THROW(COMM_FAILURE, ex.minor(), ex.completed());
        }
      }

All of this is kicked off by omniObjRef::_invoke(), which just spins in
a while() loop for transient exceptions as long as the transient
exception handler says to keep going.

Perhaps the solution is to have the default transient exception handler
return false when the completion status is COMPLETED_MAYBE? I can't
imagine that you'd ever want to retry a call unless you're certain that
the server did not receive the message on the first try.

-- 
Chris Newbold <cnewbold@laurelnetworks.com>
Laurel Networks, Inc.
phone: +412 809 4242