[omniORB] Urgent: omniORB::fatalException in omni2.6.1

Sai-Lai Lo S.Lo@uk.research.att.com
06 Aug 1999 19:13:59 +0100


Randy,

It would be helpful if I can have a copy of the omniORB source you are
using. Please upload it into the directory
 ftp://ftp.uk.research.att.com/pub/incoming/omniorb

It is also helpful to understand what interactions the client and server
are engaging in.
 
 1. Is the crash occurs in the server?

 2. Does the server do callback to the clients?
      The location of the fatal exception suggests that this is
      an outgoing rope. 

 3. Is GIOP LOCATION_FORWARD used? If both sides are omniORB and you do not
    use the dynamic object loading hook, then LOCATION_FORWARD is not
    generated.

The bugs related to race conditions with the scavengers have been fixed but
I want to be sure your source has a consistent set of fixes.


> I have several questions:

> (1) We turn the scavengers off after the ORB object is created, but
> before the BOA object is created.  Is this sufficient to make sure that
> the scavengers stop running?  (From examining the omniORB code, I think
> the answer is yes, but now I am not 100% sure)

YES.

> (2) What else could cause this fatalException?  It seems to occur
> because of a mismatch in the "idle" states between the Rope and the
> Strand -- the Rope is idle, but the Strand is not.  Is there any other
> way that a Rope could be set to idle, and the Strand not be set to idle,
> other than by the action of the scavenger?  Idleness appears to be
> related to the reference counts on these objects, so perhaps there is a
> problem there?

The reference count on a Rope equals the no. of proxy objects created in
the address space that use the Rope. A remote address space maps to a Rope.

One possible cause of the problem, although I think it is unlikely, is that
a thread has called release on an object reference while another thread is
using that object reference to do a remote invocation. The release causes
the ref count on the rope to goes to 0 but a strand within the rope is
still active.

Another scenerio is that a thread is using an object reference which has
been released. The memory has actually been returned to the heap but have
not been modified yet. Again the rope ref count goes to 0 but a strand is
still active.

> (3) Could we fix the mismatch of "idle" states in another way -- i.e.,
> could we perhaps un-idle the Rope if we discover one of the Rope's
> Strands is not idle?  I am wondering here if we could avoid throwing
> this exception altogether by cleaning up the inconsistency
> automatically.

Yes, it is safe to avoid throwing the exception. The safe thing to do is to
leave the rope alone. Or you can "un-idle" the Rope by calling incrRefCount
on the rope.

However, I think this is just a symptom of something else is wrong. So
"un-idle" the Rope might just shift the crash to somewhere else.

I suggest you double check you code to make sure that it is not doing the 2
things I suggested above. The suspect is how you manage the callback object
reference.

It may also be a bug in omniORB2 but I've not seen a crash report that
exhibits the same symptom.

Sai-Lai


-- 
Sai-Lai Lo                                   S.Lo@uk.research.att.com
AT&T Laboratories Cambridge           WWW:   http://www.uk.research.att.com 
24a Trumpington Street                Tel:   +44 1223 343000
Cambridge CB2 1QA                     Fax:   +44 1223 313542
ENGLAND