[omniORB] SocketCollection bug

Martin Kocian kocian at slac.stanford.edu
Wed Jan 20 16:40:32 GMT 2010


Hi Duncan,

Thank you for your reply. My answers are embedded below.

On Tue, 19 Jan 2010, Duncan Grisby wrote:

> On Sat, 2010-01-16 at 12:51 -0800, Martin Kocian wrote:
>
>> When I run omniorb a thread will sometimes hang. I traced this to a
>> conflict between one thread doing a blocking read in tcpConnection::Recv
>> and another thread doing a blocking select call in the posix
>> implementation of SocketCollection::Select.
>> If both threads are listening to the same socket and data arrives then
>> only one of the two threads gets woken up. If Recv gets woken up then it's
>> fine because there is a timeout on Select, but if it's
>> the Select thread that gets woken up then Recv will block indefinitely.
>> Please correct me if I'm wrong but as far as I know BSD sockets do not
>> allow several threads to do blocking select/read calls on the same socket
>> at the same time in which case this is a bug in omniorb. The omniorb release
>> I'm using is 4.1.4.
>
> What platform are you using?

RTEMS 4.9.2 on a powerpc 405 in a Xilinx Virtex 4 chip.

>
> select() returns if one of the file descriptors in the read set can be
> read without blocking. That doesn't change whether another thread doing
> recv() is able to actually read the data. It would certainly be bad to
> have two threads doing recv() on the same socket, but select() doesn't
> consume any data, so it shouldn't prevent recv() from returning. What
> diagnosis have you done to show that the problem you describe is
> actually what is happening?
>
I made tcpConnection do a select before every recv call (rather than just 
the first call) in which case the thread will hang in select instead of 
recv. I then put print statements before and after the
blocking select call in rtems to tell me what thread is calling select on 
what socket and what thread is waking up from the select call. If in 
addition I change the indefinitely blocking select call to one with a 
timeout of a few seconds I observe that in the cases where the read hangs
I have both the read thread and the SocketCollection thread do a select on 
the same socket from which the SocketCollection thread returns 
immediately. The read thread select call on the other hand times out,
but then when recv gets called it reads the data normally as if nothing 
had happened. I then made SocketCollection use a patched, non-blocking 
version of select that just looks at the socket without signalling this 
to rtems. With this ad-hoc fix the problem is gone.

>From the rtems code it seems clear that doing two blocking selects on
the same socket won't work because the first thread to be woken up
clears the rtems event that signalled that the socket had data waiting.
So I think the question is if this is a bug in rtems or if multiple
blocking selects (or select plus recv) on the same socket are not
allowed for BSD sockets in which case it's a bug in omniorb.

Thanks,

Martin


> Cheers,
>
> Duncan.
>
> -- 
> -- Duncan Grisby         --
>  -- duncan at grisby.org     --
>   -- http://www.grisby.org --
>
>

| Martin Kocian                                       |
| kocian at slac.stanford.edu                            |
| Stanford Linear Accelerator Center                  |
| M.S. 98, P.O. Box 20450                             |
| Stanford, CA 94309                                  |
| Tel. (650)926-2887  Fax (650)926-2923               |




More information about the omniORB-list mailing list