[omniORB] Physical network disconnection/reconnection?

Sai-Lai Lo S.Lo@uk.research.att.com
24 Nov 1999 12:08:49 +0000


Terje,

I think you have to establish whether the problem with reconnection is
to do with the TCP stack in the OS. On each invocation, omniORB always
tries to create a connection if there isn't one. The fact that you see an
exception may indicate that the connect() call to the OS returns an error
even though the network has been reconnected.

Two things you can do:

1. Install a COMM_FAILURE exception handler and make the ORB attempt a few
   more times before it gives up. If the TCP layer is working properly,
   this should not be necessary. The details are in the user guide but here
   is some sample code:


        CORBA::Boolean
        comm_exc(void*, CORBA::ULong retries,const CORBA::COMM_FAILURE&) {

            // Retry 5 times before giving up.
            return (retries < 5) ? 1 : 0;
        }


        And in initialisation code:

          orb = CORBA::ORB_init(argc,argv,"omniORB2");
          omniORB::installCommFailureExceptionHandler(0,comm_exc);


2. If the above does not work around your problem, please verify that
   the reason COMM_FAILRE is raised is because connect() returns an error.
   I expect you to see that in src/lib/omniORB2/orbcore/tcpSocketMTfactory.cc
     function: realConnect(), the connect call returns an error.
   If that is the case, ask why the OS is not coping with the
   disconnection/reconnnection.

Sai-Lai



>>>>> Terje Strand writes:

> We are having a major problem using omniOrb in our embedded real-time
> system. The problem arises when we try to use a reference which resides on a
> computer that is physically disconnected from the LAN. The first call to the
> reference results in an exception after a time-out as expected. However, any
> subsequent calls return immediately with an exception even if the computer
> running the referenced object has been reconnected onto the LAN. This is
> also true for resolving the reference to the naming service..

> The problem stems from the fact that we need to be able to handle physical
> disconnection from the LAN while data is being transferred continuously and
> that the reconnection (2 min to 100 hours later) must happen seamlessly. 




-- 
Sai-Lai Lo                                   S.Lo@uk.research.att.com
AT&T Laboratories Cambridge           WWW:   http://www.uk.research.att.com 
24a Trumpington Street                Tel:   +44 1223 343000
Cambridge CB2 1QA                     Fax:   +44 1223 313542
ENGLAND