[omniORB] transient exception handler and call timeout problem

Vladislav Vrtunski vladislav.vrtunski at dmsgroup.co.yu
Tue May 10 21:13:34 BST 2005


Vladislav Vrtunski wrote:

> Hi,
>
>    We are trying to limit call duration to 2 seconds and if the 
> timeout is reached we want to repeat the call, hoping that it won't 
> last that long the second time. This is part of an attempt to make a 
> fault tolerant system. In order to achieve that, we have set client 
> call timeout and installed both transient and comm_failure exception 
> handlers. Each of these handlers is written to allow ORB to retry the 
> operation once in case exception occurs.
>    Here is the problem. When the timeout is reached and 
> TRANSIENT_CallTimedout exception is thrown, our transient exception 
> handler is called and it returns 1 so that ORB repeats the call. 
> Immediately after that TRANSIENT_ConnectFailed is thrown without 
> server side method being called, although we can see in omniORB trace 
> that the server has accepted the second connection from client. Is 
> this the expected behavior? We would expect the second call to at 
> least reach the server method.
>
> Server side trace:
>
> omniORB: Server accepted connection from giop:tcp:192.168.0.15:1695
> omniORB: AsyncInvoker: thread id = 3 has started. Total threads = 3
> omniORB: Accepted connection from giop:tcp:192.168.0.15:1695 because 
> of this rule: "* tcp"
> omniORB: Handling a GIOP LOCATE_REQUEST.
> omniORB: AsyncInvoker: thread id = 4 has started. Total threads = 4
> omniORB: throw giopStream::CommFailure from 
> giopStream.cc:831(0,NO,COMM_FAILURE_UnMarshalArguments)
> omniORB: Server accepted connection from giop:tcp:192.168.0.15:1696
> omniORB: Accepted connection from giop:tcp:192.168.0.15:1696 because 
> of this rule: "* tcp"
> omniORB: throw giopStream::CommFailure from 
> giopStream.cc:831(0,NO,COMM_FAILURE_UnMarshalArguments)
> omniORB: Server close connection from giop:tcp:192.168.0.15:1696
> omniORB: Server close connection from giop:tcp:192.168.0.15:1695
> omniORB: AsyncInvoker: thread id = 4 has exited. Total threads = 4
> omniORB: AsyncInvoker: thread id = 3 has exited. Total threads = 3
>
> Client side trace:
>
> omniORB: LocateRequest to remote: root<0>
> omniORB: Client opened connection to giop:tcp:192.168.0.16:1044
> omniORB: throw giopStream::CommFailure from 
> giopStream.cc:831(0,MAYBE,TRANSIENT_CallTimedout)
> omniORB: Client close connection to giop:tcp:192.168.0.16:1044
> Transient handler called. Retries = 0
> omniORB: throw giopStream::CommFailure from 
> giopStream.cc:1073(0,NO,TRANSIENT_ConnectFailed)
> Transient handler called. Retries = 1
> omniORB: throw TRANSIENT from omniObjRef.cc:759 
> (NO,TRANSIENT_ConnectFailed)
> TRANSIENT exception caught! Code = 1096024066
>
> Client code fragment:
>
> CORBA::Boolean TransientHandler(void* pCookie, CORBA::ULong nRetries, 
> const CORBA::TRANSIENT& ex)
> {
>   cerr << "Transient handler called. Retries = " << nRetries << endl;
>   return ((nRetries < 1) ? 1 : 0);
> }
>
> //part of the main func
> {
>        ...
>        omniORB::installCommFailureExceptionHandler(NULL, 
> CommFailureHandler);
>        omniORB::installTransientExceptionHandler(NULL, TransientHandler);
>        CORBA::ULong ct = 2000;
>        omniORB::setClientCallTimeout(ct);
>        try
>        {
>            nStatus = pServer->GetStatus();
>        }
>        catch (CORBA::TRANSIENT& ex)
>        {
>            tcout << _T("TRANSIENT exception caught! Code = ") << 
> ex.minor() << endl;
>        }
>        catch(CORBA::COMM_FAILURE&)
>        {
>            tcout << _T("Caught system exception COMM_FAILURE -- unable 
> to contact the object!") << endl;
>        }
>        ...
> }
>
> We are using omniORB-4.0.5 on Win2k machines and VC++6.0 compiler.
>
After weeks of testing, we finally figured out what is going on here. It 
seems that when TRANSIENT_CallTimedout exception is caught in installed 
exception handler and the call is retried (return value != 0), the time 
out value is not reset. That means that TRANSIENT_ConnectFailed is 
thrown since the time out has already expired. So, when call times out 
there is no way to retry it automatically. By inspecting omniORB source 
code, we came to a conclusion that this is done on purpose. What we 
would like to know is what is the reasoning behind it?

Details:

 From giopStream.cc:

giopStream::sendChunk(giopStream_Buffer* buf) {

  if (!pd_strand->connection) {
    OMNIORB_ASSERT(pd_strand->address);

    if (pd_strand->state() != giopStrand::DYING) {
      if (omniORB::trace(25)) {
    omniORB::logger log;
    log << "Client attempt to connect to "
        << pd_strand->address->address() << "\n";
      }
      giopActiveConnection* c = 
pd_strand->address->Connect(pd_deadline_secs,
                             pd_deadline_nanosecs);

      if (c) pd_strand->connection = &(c->getConnection());
    }
    if (!pd_strand->connection) {
      errorOnSend(TRANSIENT_ConnectFailed,__FILE__,__LINE__,0);
    }
    if (omniORB::trace(20)) {
      omniORB::logger log;
      log << "Client opened connection to "
      << pd_strand->connection->peeraddress() << "\n";
    }
  }

On call retry, pd_deadline_secs & pd_deadline_nanosecs have the same 
value as when the first call is made, so 
pd_strand->address->Connect(pd_deadline_secs, pd_deadline_nanosecs) 
returns  NULL (due to unsuccessful call to SocketSetTimeout() from 
SocketCollection.cc) and TRANSIENT_ConnectFailed is raised (very confusing).

Is there any reason not to change pd_deadline_secs when call is retried 
if we want to retry it even though the time out already happened (eg. 
temporary network failure caused the original call to last for too long)?

Regards,

-- 
Vladislav Vrtunski
DMS Group
Serbia & Montenegro





More information about the omniORB-list mailing list