[omniORB] Strange access violation -- What do you think?

Brenneis, Steven W. BRENNES1@RJRT.com
Wed, 26 Aug 1998 16:27:57 -0400


I got an access violation at the next to last line in omni_condition::timedwait.  The pointer to mutex was 0xcdcdcdcd, indicating
that the omni_condition object had been deleted.  If you look at the code, the mutex is unlocked once previously in the function
indicating that the omni_condition object had been deleted during the execution of the function.

The function had been called from Strand::Sync::WrTimedLock.  In that function, s is the Strand that contains the pointer to the
omni_condition. Most, but not all of the contents of s had been set to either 0xdddddddd or 0xcdcdcdcd, indicating that the strand
object was in the process of being deleted when the access violation occurred.  WrTimedLock had been called from scanForIdle which
had been called from outScavenger_t::run_undetached.

This is not the first time that I have experienced either a crash or a deadlock at this location.  Are there timing issues involved
here?

The states were as follows:

scanForIdle called Strand::SyncWrTimedLock with:
s = 0x02afd1c0
heartbeat = 1
abs_sec = 904155361
abs_nsec = 830000000

Contents of s were:
pd_rdcond.mutex = 0x3a766544
pd_rdcond.waiting_head = 0xcd004133
pd_rdcond.waiting_tail = 0xcdcdcdcd
pd_rd_nwaiting = 0xcdcdcdcd
pd_wrcond.mutex = 0xcdcdcdcd

The remainder of the Strand class data bytes had been set to 0xdd.  This indicates that NT was in the process of initializing memory
returned to the heap to the test pattern it uses for this purpose.

The contents of the Rope from which the strand was iterated seemed to be valid.  The pd_maxStrands member was 5, pd_head was 0,
pd_next was 0x294f5c0, pd_anchor was 0x005b2b24, and pd_refcount was 3.

This kind of error would indicate to me that while the out scavenger was processing this strand, some other thread was deleting it.
Is this possible?  Under what conditions could this happen?

Any thoughts?