[omniORB] omniNames woes

Bruce Visscher visschb@rjrt.com
Fri, 28 Apr 2000 18:25:06 -0400


Hello omniORBers,

We have been running omniNames on OpenVMS Alpha since the beginning of the year
with no problems when suddenly, the assertion in Strand::decrRefCount failed:

void
Strand::decrRefCount(CORBA::Boolean held_rope_mutex)
{
  if (!held_rope_mutex)
    pd_rope->pd_lock.lock();
  pd_refcount--;
  assert(pd_refcount >= 0);
  if (!held_rope_mutex)
    pd_rope->pd_lock.unlock();
  return;
}

Unfortunately, for some reason the process then reported an improperly handled
condition, which means that it won't report a traceback of the stack, leaving it
difficult to determine where in the code it failed.

So, I created an omniNames "stress tester" which basically creates a naming
graph that has a matrix of contexts with Echo objects bound to the leaves from
different threads.  Each thread loops creating its part of the name graph and
the objects, pausing, then destroying the graph and the objects.

I also created a client to traverse the name graph looking for Echo objects in
multiple threads.

After running a few times, omniNames crashes apparently by dereferencing the
null pointer:

%SYSTEM-F-ACCVIO, access violation, reason mask=00, virtual address=000000000000
0000, PC=00000000002DAAEC, PS=0000001B
%TRACE-F-TRACEBACK, symbolic stack dump follows
  image    module    routine             line      rel PC           abs PC      
 PTHREAD$RTL                                0 0000000000040AEC 00000000002DAAEC
 OMNITHREAD_RT  POSIX  lock              9280 000000000000031C 000000000025831C
 OMNIORB2_RT  STRAND  decrRefCount      21140 0000000000000744 000000000018EBC4
 OMNIORB2_RT  STRAND  run_undetached    21680 0000000000002248 00000000001906C8
 OMNITHREAD_RT  POSIX  omni_thread_wrapper
                                         9587 0000000000001004 0000000000259004
 PTHREAD$RTL                                0 000000000004E3DC 00000000002E83DC
 PTHREAD$RTL                                0 0000000000040674 00000000002DA674
                                            0 0000000000000000 0000000000000000
 PTHREAD$RTL                                                 ?                ?
                                            0 FFFFFFFFA1C8D118 FFFFFFFFA1C8D118

This is apparently, omniORB_Ripper::run_undetached calling Strand::decrRefCount
which invokes omni_thread::lock which apparently passes a bad mutex to
pthread_lock.

Has anyone had any problems like this?  Note that I have some unresolved VMS
issues, so it very well might not be a problem with omniORB per se.  [The other
problem I have with this test is that the client hangs after a while even though
the main thread should be displaying the number of active threads every minute:
it doesn't lock anything at all.  I definitely think this is a VMS bug, but I
may have a hard time proving it.]

Bruce
-- 
All generalities are false - including this one.

Bruce Visscher                                        visschb@rjrt.com