[omniORB] serious stability problems with omniORB4 snapshots on Solaris 8.

Bastiaan Bakker Bastiaan.Bakker@lifeline.nl
Thu, 31 Jan 2002 16:33:34 +0100


This is a multi-part message in MIME format.

------_=_NextPart_001_01C1AA6C.A4CF59F8
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

Hi,
=20
I'm porting some omniORB4 base CORBA servers from Linux to Solaris 8 and =
started experiencing stability problems. At first I suspect the =
application code, but I can reproduce it with the echo example as well. =
The occurring error is 'unrecoverable error for this endpoint' after =
wich the server will crash due to access to a deleted mutex. See below =
for an example.
=20
Repeat by:
1) start eg2_impl server
2) repeatedly start a group of concurrently running eg2_clt processes =
calling the server. In my tests I used 12 concurrent clients, but =
problably fewer will work as well, just take longer.
3) Wait a few minutes for the server to crash.
=20
Test platform:
Sparc Solaris 8, gcc 2.95.3 and snapshots of omniORB4: 20011013, =
20020103, 20020130.
Tried both TCP and Unix socket endpoints and both threadPerConnection =
and threadPool policies. The threadPool policy seemed to trigger the =
problem quicker.
=20
Does anyone else experience similar problems or know how to work around =
them?=20
=20
Thanks!
=20
Bastiaan
=20
PS. OmniORB developers: this is an update to the bug report I sent =
yesterday, not a separate issue.
=20
output of 'gdb eg2_impl' with 'set args -ORBendPoint =
giop:unix:/tmp/echo.bb':
=20
Upcall Hello!
Upcall Hello!
omniORB: Unrecoverable error for this endpoint: giop:unix:/tmp/echo.bb, =
it will no longer be serviced.
omniORB: Assertion failed -- attempt to lock deleted mutex.
 This is a bug in omniORB. Please submit a report (with stack
 trace if possible) to < omniorb@uk.research.att.com>.
=20
Program received signal SIGSEGV, Segmentation fault.
[Switching to LWP 7]
0xff2232f0 in omni_tracedmutex::lock (this=3D0x2adf8) at =
tracedthread.cc:142
142         BOMB_OUT();
(gdb) bt
#0  0xff2232f0 in omni_tracedmutex::lock (this=3D0x2adf8) at =
tracedthread.cc:142
#1  0xff25d054 in omni::SocketCollection::setSelectable (this=3D0x2ac68, =

    sock=3D10, now=3Dfalse, data_in_buffer=3Dfalse, hold_lock=3Dfalse)
    at SocketCollection.cc:150
#2  0xff283678 in omni::unixConnection::setSelectable (this=3D0x2b420,=20
    now=3Dfalse, data_in_buffer=3Dfalse) at ./unix/unixConnection.cc:279
#3  0xff240cdc in omni::giopServer::notifyWkPreUpCall (this=3D0x2a9f0,=20
    w=3D0x2b4f0, data_in_buffer=3Dfalse) at giopServer.cc:928
#4  0xff246aa8 in omni::GIOP_S::ReceiveRequest (this=3D0x2e4b8, =
desc=3D@0xfe00f868)
    at GIOP_S.cc:570
#5  0xff221cc8 in omniCallHandle::upcall (this=3D0xfe00fa68, =
servant=3D0x2b358,=20
    desc=3D@0xfe00f868) at callHandle.cc:140
#6  0x14324 in _impl_Echo::_dispatch (this=3D0x2b368, =
_handle=3D@0xfe00fa68)
    at echoSK.cc:213
#7  0xff20c4b0 in omni::omniOrbPOA::dispatch (this=3D0x2b130,=20
    handle=3D@0xfe00fa68, id=3D0x2b380) at poa.cc:1640
#8  0xff1ebb48 in omniLocalIdentity::dispatch (this=3D0x2b380,=20
    handle=3D@0xfe00fa68) at localIdentity.cc:202
#9  0xff2454b4 in omni::GIOP_S::handleRequest (this=3D0x2e4b8) at =
GIOP_S.cc:279
#10 0xff244c1c in omni::GIOP_S::dispatcher (this=3D0x2e4b8) at =
GIOP_S.cc:206
#11 0xff241e8c in omni::giopWorker::execute (this=3D0x2b4f0) at =
giopWorker.cc:167
#12 0xff298d58 in omniAsyncWorker::run (this=3D0x2e450) at =
invoker.cc:146
#13 0xff3741fc in omni_thread_wrapper (ptr=3D0x2e450) at posix.cc:423


------_=_NextPart_001_01C1AA6C.A4CF59F8
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Diso-8859-1">


<META content=3D"MSHTML 5.00.2920.0" name=3DGENERATOR></HEAD>
<BODY>
<DIV><FONT face=3DArial size=3D2><SPAN=20
class=3D240490815-31012002>Hi,</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN=20
class=3D240490815-31012002></SPAN></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2><SPAN class=3D240490815-31012002>I'm =
porting some=20
omniORB4&nbsp;base CORBA servers from Linux to Solaris 8 and started=20
experiencing stability problems. At first I suspect the application =
code, but I=20
can reproduce it with the echo example as well. The occurring error is=20
'unrecoverable error for this endpoint' after wich the server will crash =
due to=20
access to a deleted mutex. See below for an example.</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN=20
class=3D240490815-31012002></SPAN></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2><SPAN class=3D240490815-31012002>Repeat =

by:</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN class=3D240490815-31012002>1) =
start eg2_impl=20
server</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN class=3D240490815-31012002>2) =
repeatedly start=20
a group of concurrently running eg2_clt processes calling the server. In =
my=20
tests I used 12 concurrent clients, but problably fewer will work as =
well, just=20
take longer.</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN class=3D240490815-31012002>3) =
Wait a few=20
minutes for the server to crash.</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN=20
class=3D240490815-31012002></SPAN></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2><SPAN class=3D240490815-31012002>Test=20
platform:</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN class=3D240490815-31012002>Sparc =
Solaris 8, gcc=20
2.95.3 and snapshots of omniORB4: 20011013, 20020103,=20
20020130.</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN class=3D240490815-31012002>Tried =
both TCP and=20
Unix socket endpoints and both threadPerConnection and threadPool =
policies. The=20
threadPool policy seemed to trigger the problem =
quicker.</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN=20
class=3D240490815-31012002></SPAN></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2><SPAN class=3D240490815-31012002>Does =
anyone else=20
experience similar problems or know how to work around them?=20
</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN=20
class=3D240490815-31012002></SPAN></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2><SPAN=20
class=3D240490815-31012002>Thanks!</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN=20
class=3D240490815-31012002></SPAN></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2><SPAN=20
class=3D240490815-31012002>Bastiaan</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN=20
class=3D240490815-31012002></SPAN></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2><SPAN class=3D240490815-31012002>PS. =
OmniORB=20
developers: this is an update to the bug report I sent yesterday, not a =
separate=20
issue.</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN=20
class=3D240490815-31012002></SPAN></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2><SPAN class=3D240490815-31012002>output =
of 'gdb=20
eg2_impl' with 'set args -ORBendPoint=20
giop:unix:/tmp/echo.bb':</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN=20
class=3D240490815-31012002></SPAN></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2><SPAN class=3D240490815-31012002>Upcall =

Hello!<BR>Upcall Hello!<BR>omniORB: Unrecoverable error for this =
endpoint:=20
giop:unix:/tmp/echo.bb, it will no longer be serviced.<BR>omniORB: =
Assertion=20
failed -- attempt to lock deleted mutex.<BR>&nbsp;This is a bug in =
omniORB.=20
Please submit a report (with stack<BR>&nbsp;trace if possible) to &lt;<A =

href=3D"mailto:omniorb@uk.research.att.com">omniorb@uk.research.att.com</=
A>&gt;.</SPAN></FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2><SPAN =
class=3D240490815-31012002>Program received=20
signal SIGSEGV, Segmentation fault.<BR>[Switching to LWP =
7]<BR>0xff2232f0 in=20
omni_tracedmutex::lock (this=3D0x2adf8) at=20
tracedthread.cc:142<BR>142&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp=
;=20
BOMB_OUT();<BR>(gdb) bt<BR>#0&nbsp; 0xff2232f0 in omni_tracedmutex::lock =

(this=3D0x2adf8) at tracedthread.cc:142<BR>#1&nbsp; 0xff25d054 in=20
omni::SocketCollection::setSelectable (this=3D0x2ac68, =
<BR>&nbsp;&nbsp;&nbsp;=20
sock=3D10, now=3Dfalse, data_in_buffer=3Dfalse, =
hold_lock=3Dfalse)<BR>&nbsp;&nbsp;&nbsp;=20
at SocketCollection.cc:150<BR>#2&nbsp; 0xff283678 in=20
omni::unixConnection::setSelectable (this=3D0x2b420, =
<BR>&nbsp;&nbsp;&nbsp;=20
now=3Dfalse, data_in_buffer=3Dfalse) at =
./unix/unixConnection.cc:279<BR>#3&nbsp;=20
0xff240cdc in omni::giopServer::notifyWkPreUpCall (this=3D0x2a9f0,=20
<BR>&nbsp;&nbsp;&nbsp; w=3D0x2b4f0, data_in_buffer=3Dfalse) at=20
giopServer.cc:928<BR>#4&nbsp; 0xff246aa8 in omni::GIOP_S::ReceiveRequest =

(this=3D0x2e4b8, desc=3D@0xfe00f868)<BR>&nbsp;&nbsp;&nbsp; at=20
GIOP_S.cc:570<BR>#5&nbsp; 0xff221cc8 in omniCallHandle::upcall =
(this=3D0xfe00fa68,=20
servant=3D0x2b358, <BR>&nbsp;&nbsp;&nbsp; desc=3D@0xfe00f868) at=20
callHandle.cc:140<BR>#6&nbsp; 0x14324 in _impl_Echo::_dispatch =
(this=3D0x2b368,=20
_handle=3D@0xfe00fa68)<BR>&nbsp;&nbsp;&nbsp; at =
echoSK.cc:213<BR>#7&nbsp;=20
0xff20c4b0 in omni::omniOrbPOA::dispatch (this=3D0x2b130, =
<BR>&nbsp;&nbsp;&nbsp;=20
handle=3D@0xfe00fa68, id=3D0x2b380) at poa.cc:1640<BR>#8&nbsp; =
0xff1ebb48 in=20
omniLocalIdentity::dispatch (this=3D0x2b380, <BR>&nbsp;&nbsp;&nbsp;=20
handle=3D@0xfe00fa68) at localIdentity.cc:202<BR>#9&nbsp; 0xff2454b4 in=20
omni::GIOP_S::handleRequest (this=3D0x2e4b8) at GIOP_S.cc:279<BR>#10 =
0xff244c1c in=20
omni::GIOP_S::dispatcher (this=3D0x2e4b8) at GIOP_S.cc:206<BR>#11 =
0xff241e8c in=20
omni::giopWorker::execute (this=3D0x2b4f0) at giopWorker.cc:167<BR>#12 =
0xff298d58=20
in omniAsyncWorker::run (this=3D0x2e450) at invoker.cc:146<BR>#13 =
0xff3741fc in=20
omni_thread_wrapper (ptr=3D0x2e450) at=20
posix.cc:423<BR></SPAN></FONT></DIV></BODY></HTML>

------_=_NextPart_001_01C1AA6C.A4CF59F8--