[omniORB] RE: serious stability problems with omniORB4 snapshots on Solaris 8

Duncan Grisby dgrisby@uk.research.att.com
Thu, 31 Jan 2002 16:43:07 +0000


On Thursday 31 January, "Bastiaan Bakker" wrote:

> I *do* know where the 'pure virtual method call' comes from: in
> notifyWkDone, Peek() is called on a deleted connection. What I do not
> know is where the connection is deleted (actually decRefCount'ed) while
> it's still in use.

Something's clearly broken. I won't have a chance to look at this
until the week after next, since I'm busy this week, and next week I'm
at the International Python Conference.

Some thoughts if you want to track this down yourself: I assume the
problem doesn't occur under Linux?  Does the Solaris machine have more
than one processor?  If so (and the Linux machine doesn't) it might be
a race condition thing. Are you able to try Sun's compiler rather than
gcc?  It might be a gcc problem.

Most likely, though, it's a bug in the new omniORB transport code. Try
running with a high trace level to see if that prints anything
interesting before the crash (if it's a race condition, the tracing
might prevent it, of course...). To get a handle on when the
connection is deleted, try adding tracing to the destructor. If you
print out the this pointer, that'll help you find the relevant
destruction when the crash happens. Code something like:

  {
    omniORB::logger l;
    l << "connection deleted: " << (void*)this << "\n";
  }

Hope that helps. If you haven't tracked it down by the time I'm back,
I'll look into it properly then.

Cheers,

Duncan.

-- 
 -- Duncan Grisby  \  Research Engineer  --
  -- AT&T Laboratories Cambridge          --
   -- http://www.uk.research.att.com/~dpg1 --