[omniORB] Uncontrolled reconnects again

Duncan Grisby duncan@grisby.org
Sun Mar 23 22:12:02 2003


On Wednesday 19 March, ondrej@frcatel.fri.utc.sk wrote:

> This is what we have found. We used omniORB-4.0.1 client and Orbix2000
> server utilizing Orbix locator (in our example at port 10001). When
> everything goes fine, server and locator both run, there is no problem.
> When server crashes, client correctly reverts to initial identity and asks
> locator again. But, when server and locator both crash (which is our case,
> because Orbix2000 services seem to be a little bit unstable), omniORB core
> enters endless loop trying to establish connection.

Strange.

>    We have also found a
> hotfix for this: disabling scanning for idle connections causes this bug
> not to appear.

Even more strange.

The trace shows that the call to the location-forwarded reference
fails, so omniORB correctly reverts to the original location. Then, it
tries to connect to the Orbix locator, and fails. At this stage, it
should give up, but it doesn't. The code doing the retry is in
omniObjRef.cc, in the invoke() function:

    catch(const giopStream::CommFailure& ex) {
      if (ex.retry()) continue;
      if( fwd ) {
	RECOVER_FORWARD;
	continue;
      }
    ...

The RECOVER_FORWARD macro is what happens on the first call. The only
way I can see that the other calls can retry is if the ex.retry()
returns true. That member is set in GIOP_C.cc, notifyCommFailure
function. That is only meant to set retry to true if the IOR contains
more than one address, and not all addresses have been tried yet. It's
possible there's a bug in that code, but I can't see it.

I'd suggest you try adding some logging info to omniObjRef.cc and
GIOP_C.cc in the places I've mentioned to see what's going on.

Cheers,

Duncan.

-- 
 -- Duncan Grisby         --
  -- duncan@grisby.org     --
   -- http://www.grisby.org --