[omniORB] crash in openssl code

Sat Feb 24 17:42:42 UTC 2024

Am 24/02/2024 um 17:52 schrieb Duncan Grisby:
> On Fri, 2024-02-23 at 15:50 +0100, Michael Teske wrote:
> 
>>
>>
>>> It has always worked in the past with OpenSSL as well. Has
>>> something
>>> changed with OpenSSL that means it is no longer thread safe for
>>> this?
>>
>> Every reference I find tells me that this can cause severe problems
>> and crashes. In the github issue I've opened
> 
> [...]
>> It works surprisingly well until something unexpected happens, e.g. a
>> handshake in beween... I get the crash almost exclusively on
>> startup/opening of connection, to reproduce it I had to restart our
>> system up to 100 times.
> 
> It is indeed surprising that this is not seen more often. It will only
> happen with bidirectional GIOP, because normal GIOP strictly alternates
> between writes and reads, and it appears to be the case that OpenSSL is
> nearly thread safe, in that mostly you get away with overlapped reads
> and writes. I suspect it might be a change in TLS 1.3 that makes
> renegotiation happen more often that provokes this.

Yes, that could be.
BTW We have it in production on one customer without bidirectional GIOP and it didn't crash until now,
so I guess it's fine.

> It will be very easy to change the omniORB code to always use non-
> blocking sockets, and to have a per-connection lock around all calls
> into OpenSSL. Aside from the lock, it already has all that
> functionality, to be able to handle calls with timeouts. The question
> is whether doing so will have a noticeable impact on performance. If it
> is too costly to do it always, it will have to be something that is
> only used with bidirectional connections, which is also possible, but a
> bit more tricky.

That's good news. If I can be of any help let me know. From my experience on Linux,
a mutex is not very expensive if it is not locked by another thread, but other systems may be different
(I know that at least a few years ago it was rather expensive on OSX).
I'll try to write a small test program next week which triggers the problem more often, so I can test more easily.

>> Another thing, apart from that, shouldn't
>>   
>> sslConnection::Peek()
>>
>> lock pd_belong_to->pd_collection_lock
>>   
>> as well (before calling SSL_pending()) ?
> 
> Why do you think that?  Once there is a lock to protect the SSL object,
> it certainly should acquire that, but it does not need to hold the
> SocketCollection lock unless it calls into the base SocketHolder::Peek,
> and that does acquire the lock.

It was some kind of uneducated guess without fully understanding the code :-)

Regards,
   Michael
> 
> Regards,
> 
> Duncan.
>