[omniORB] server lockup from connection close race condition

Jeremy Van Grinsven jeremvan at rocketmail.com
Wed Mar 31 16:28:17 BST 2004


I am running omniORB 4.0.3 on solaris 8.  I uncovered a race condition
during connection close that locks the server up in an infinite loop
using a thread pool.

SocketCollection::Select gets stuck doing endless RC_EBADF retries due
to an invalid file descriptor stuck in the pd_fdset_1 and pd_fdset_2
lists.  During connection close a file descriptor is setSelectable
after the connection is closed and clearSelectable finished.

This happens when a connection is closed right after a corba operation
to the server is completed. The successful operation result is
marshaled and sent back to the client, then giopServer::notifyWkDone is
executed.  If the client closes connection fast enough a select event
can fire off another worker thread which gets a read failure causing
the new workers giopServer::notifyWkDone call to execute
giopServer::removeConnectionAndWorker and clear the file descriptor
from the SocketCollection and close the socket.  All this happens
because notifyWkDone can't hold pd_lock while calling setSelectable. 
There are several places in this method that can cause a dying
connection to be setSelectable again even if it is already closed.

I have not been able to find a clean way to fix this condition. 
pd_lock can't be held while calling setSelectable or a deadlock is
possible when notifyRzReadable is called from SocketCollection holding
pd_fdset_lock.  With the current mutex structure pd_fdset_lock has to
be held before locking pd_lock and calling setSelectable with
held_lock=1.  This is incredibly messy with pd_lock and pd_fdset_lock
being owned by 2 different objects that are not cleanly accessible from
either tcpConnection::setSelectable or giopServer::notifyWkDone.

As a temporary fix I am passing in giopServer's pd_lock into
tcpConnection::setSelectable and hijacking SocketCollections's private
pd_fdset_lock to setup the locks to check pd_dying and call
SocketCollection::setSelectable with lock_held=1.

I hope there is a better way to fix this.

Here is a log of the events at level 25, plus my setSelectable and
clearSelectable log additions:

4: omniORB: Server accepted connection from giop:tcp:127.0.0.1:52201
4: omniORB: setSelectable: 17 now
4: omniORB: setSelectable: 10 now
5: omniORB: giopWorker task execute.
5: omniORB: Accepted connection from giop:tcp:127.0.0.1:52201 because
of this rule: "* tcp,ssl"
5: omniORB: inputMessage: from giop:tcp:127.0.0.1:52201 77 bytes
5: omniORB: setSelectable: 17
5: omniORB: Handling a GIOP LOCATE_REQUEST.
5: omniORB: sendChunk: to giop:tcp:127.0.0.1:52201 20 bytes
5: omniORB: setSelectable: 17 now
5: omniORB: giopWorker task execute.
5: omniORB: inputMessage: from giop:tcp:127.0.0.1:52201 165 bytes
5: omniORB:  recieve codeset service context and set TCS to
(ISO-8859-1,UTF-16)
5: omniORB: setSelectable: 17
5: omniORB: sendChunk: to giop:tcp:127.0.0.1:52201 208 bytes
5: omniORB: setSelectable: 17 now
5: omniORB: giopWorker task execute.
5: omniORB: inputMessage: from giop:tcp:127.0.0.1:52201 222 bytes
5: omniORB: setSelectable: 17
5: omniORB: sendChunk: to giop:tcp:127.0.0.1:52201 441 bytes
6: omniORB: giopWorker task execute.
6: omniORB: throw giopStream::CommFailure from
giopStream.cc:831(0,NO,COMM_FAILURE_UnMarshalArguments)
6: omniORB: Server connection refcount = 1
6: omniORB: clearSelectable: 17
6: omniORB: Server connection refcount = 0
6: omniORB: Server close connection from giop:tcp:127.0.0.1:52201
5: omniORB: setSelectable: 17 now
4: omniORB: select() returned EBADF, retrying
4: omniORB: select() returned EBADF, retrying
4: omniORB: select() returned EBADF, retrying
4: omniORB: select() returned EBADF, retrying
4: omniORB: select() returned EBADF, retrying
4: omniORB: select() returned EBADF, retrying
.. etc ... forever

Jeremy Van Grinsven




__________________________________
Do you Yahoo!?
Yahoo! Finance Tax Center - File online. File on time.
http://taxes.yahoo.com/filing.html



More information about the omniORB-list mailing list