[omniORB] Urgent: omniORB::fatalException in omni2.6.1

Randy Shoup rshoup@tumbleweed.com
Sun, 08 Aug 1999 13:00:12 -0700


Sai-Lai --

  The problem was ours, as I describe below.  It was exactly the
scenario you described:  one thread released an object reference while
another thread was still invoking on it.  Thanks for the help.

Randy Shoup wrote:
> 
> Randy Shoup wrote:
> >
> > Sai-Lai Lo wrote:
> > >
> 
> > We have two processes involved here, one a "gateway" and the other an
> > "extension".  The extension registers itself with the gateway, and
> > indicates through its interface that it wants to override or augment
> > certain transactions of the gateway.  As the gateway is processing
> > (HTTP) transactions, it delegates some of the processing to the
> > extension as appropriate.  For this call, the gateway is the client of
> > the extension.  However, the extension also calls back to the gateway
> > during its processing of the transaction, so the extension is also a
> > client of the gateway.  In addition, during the registration phase, the
> > extension is also a client of the gateway.
> >
> > I should mention that the extension sometimes unregisters and
> > reregisters with the gateway.  This is intended to be because the
> > gateway has gone down and come up, but because of the inconsistencies we
> > have experienced with _non_existent(), sometimes the extension thinks
> > that the gateway has gone down and come up when it in fact never went
> > down at all.  This reregistration behavior is triggered by another
> > "watchdog" process, so it is effectively asynchronous with the rest of
> > the processing.
> >
> > The problem does not seem to occur with any particular transaction --
> > that is, it does not appear to be related to any particular transaction
> > that the gateway or the extension is handling.  This lack of pattern
> > made us suspect the scavenger.
> >
> 
> > > > (2) What else could cause this fatalException?  It seems to occur
> > > > because of a mismatch in the "idle" states between the Rope and the
> > > > Strand -- the Rope is idle, but the Strand is not.  Is there any other
> > > > way that a Rope could be set to idle, and the Strand not be set to idle,
> > > > other than by the action of the scavenger?  Idleness appears to be
> > > > related to the reference counts on these objects, so perhaps there is a
> > > > problem there?
> > >
> > > The reference count on a Rope equals the no. of proxy objects created in
> > > the address space that use the Rope. A remote address space maps to a Rope.
> > >
> > > One possible cause of the problem, although I think it is unlikely, is that
> > > a thread has called release on an object reference while another thread is
> > > using that object reference to do a remote invocation. The release causes
> > > the ref count on the rope to goes to 0 but a strand within the rope is
> > > still active.
> > >
> 
> This seems likely to have been it.
> 
> After your suggestion, we re-reviewed the code, looking for a place
> where we were not properly duplicate'ing/release'ing a reference.  We
> found one in the gateway code which uses the extension.  This code was
> not duplicating the reference, so that if the extension unregistered
> itself (thereby decrementing the reference count) during the time in
> which we were invoking or preparing to invoke on the extension
> reference, the ref count could go to zero, and cause the behavior you
> describe.  Bottom line:  always duplicate when you are using a
> reference! :-)
> 
> This seems extremely likely to have been the problem, but we would also
> surely be interested in any other suggestions.  I'll update the list
> when we are more sure.
> 
> Thanks,
> -- Randy

_________________________________________________________________  
Randy Shoup                                     (650)216-2038  
Software Architect                              rshoup@tumbleweed.com  
Tumbleweed Communications Corporation