[omniORB] crash in openssl code

Michael Teske subscribe at teskor.de
Sun Feb 25 09:50:02 UTC 2024


Hi,

On 2/24/24 19:00, Duncan Grisby wrote:
> On Sat, 2024-02-24 at 18:42 +0100, Michael Teske wrote:
>> That's good news. If I can be of any help let me know. From my
>> experience on Linux, a mutex is not very expensive if it is not
>> locked by another thread, but other systems may be different
>> (I know that at least a few years ago it was rather expensive on
>> OSX).
>> I'll try to write a small test program next week which triggers the
>> problem more often, so I can test more easily.
> That would be helpful. I have tried to reproduce the crash, but I
> haven't managed. I can see bidirectional GIOP causing it to call
> SSL_write while it's blocked in SSL_read, but it has not once crashed.
> This is with OpenSSL 3.1.1 on Fedora 39.

Since I get crashes pretty easy, even with ORBtraceLevel 40, I hope it's not too hard to put it in a program.

But it looks like some exception happens:

Example (this time crash on write, don't be irritated by line number, I had some code added, but comented out):

(gdb) where
#0  0x00007f8ee7fbc4df in tls_get_message_header (mt=<synthetic pointer>, s=0x7f8e84007080) at ssl/statem/statem_lib.c:1167
#1  read_state_machine (s=0x7f8e84007080) at ssl/statem/statem.c:587
#2  state_machine (s=0x7f8e84007080, server=<optimized out>) at ssl/statem/statem.c:442
#3  0x00007f8ee7faa76e in ssl3_write_bytes (s=0x7f8e84007080, type=23, buf_=0x7f8e8401b218, len=47, written=0x7f8ed5219460) at ssl/record/rec_layer_s3.c:415
#4  0x00007f8ee7f8d6e7 in SSL_write (s=<optimized out>, buf=<optimized out>, num=<optimized out>) at ssl/ssl_lib.c:2139
#5  0x00007f8ee8efd72c in omni::sslConnection::Send (this=0x7f8e84011b78, buf=0x7f8e8401b218, sz=<optimized out>, deadline=...) at sslConnection.cc:156
#6  0x00007f8ee8e36942 in omni::giopStream::sendChunk (this=this at entry=0x7f8e84012ab8, buf=0x7f8e8401b200) at giopStream.cc:1126
#7  0x00007f8ee8e4d4e7 in omni::giopImpl12::outputMessageEnd (g=0x7f8e84012ab8) at giopImpl12.cc:1317
#8  0x00007f8ee8e3bedc in omni::GIOP_C::IssueLocateRequest (this=0x7f8e84012ab0) at ../../../../include/omniORB4/internal/giopStream.h:143
#9  0x00007f8ee8e188e9 in omniRemoteIdentity::locateRequest (this=0x7f8e84017e10, call_desc=...) at remoteIdentity.cc:211
#10 0x00007f8ee8e18f9c in omniRemoteIdentity::dispatch (this=0x7f8e84017e10, call_desc=...) at remoteIdentity.cc:86
#11 0x00007f8ee8dfcb0c in omniObjRef::_invoke (this=0x7f8e840197b0, call_desc=..., do_assert=false) at ../../../../include/omniORB4/omniObjRef.h:202
#12 0x00007f8ee8dfc8e4 in omniObjRef::_locateRequest (this=<optimized out>) at omniObjRef.cc:800
#13 0x00007f8ee8dfc915 in omniObjRef::_assertExistsAndTypeVerified (this=0x7f8e840197b0) at omniObjRef.cc:295
#14 0x00007f8ee8dfcb28 in omniObjRef::_invoke (this=this at entry=0x7f8e840197b0, call_desc=..., do_assert=do_assert at entry=true) at omniObjRef.cc:649
#15 0x00000000005ef5ec in exchangeAgent::_objref_eaLoginServer::GetLoginServerCallback (this=this at entry=0x7f8e84019790, ServerType=..., ServerName=..., Version=..., Description=...)
     at libidl/EAGeneral_skel.cpp:6197
#16 0x0000000000526ae0 in LS::UsedEA::run (this=0x24932f0) at /home/build/Builds/Trader-build1605/Trader/LINUX_RHEL9_64/GCC12/Debug/include/omniORB4/stringtypes.h:508
#17 0x00000000006ca225 in omniJTCThread::entrance_hook (this=0x2493500) at /home/build/Builds/Trader-build1605/Trader/src/libomniJTC/Thread.cpp:1037
#18 0x00007f8ee8f0e14e in omni_thread_wrapper (ptr=0x2493870) at /home/michael.teske/build/thirdparty/trader/omniORB/src/lib/omnithread/posix.cc:449
#19 0x00007f8ee789f802 in start_thread (arg=<optimized out>) at pthread_create.c:443
#20 0x00007f8ee783f450 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
(gdb) frame 5
#5  0x00007f8ee8efd72c in omni::sslConnection::Send (this=0x7f8e84011b78, buf=0x7f8e8401b218, sz=<optimized out>, deadline=...) at sslConnection.cc:156
156         tx = SSL_write(pd_ssl,buf,sz);
(gdb) p pd_myaddress
$2 = {
   _data = 0x7f8e8402b020 "giop:ssl:10.49.64.81:49374"
}
(gdb) p pd_peeraddress
$3 = {
   _data = 0x7f8e8402bac0 "giop:ssl:10.49.64.81:5610"


In the log for port 5610:

omniORB: (21) 2024-02-25 09:13:33.512917: Switch rope to use address giop:ssl:10.49.64.81:5610
omniORB: (21) 2024-02-25 09:13:33.515291: Client attempt to connect to giop:ssl:10.49.64.81:5610
omniORB: (21) 2024-02-25 09:13:33.515403: TLS connect to 10.49.64.81:5610
omniORB: (21) 2024-02-25 09:13:33.528565: TLS peer identity for giop:ssl:10.49.64.81:5610 : localhost.localdomain
omniORB: (21) 2024-02-25 09:13:33.528579: Client opened connection to giop:ssl:10.49.64.81:5610
omniORB: (21) 2024-02-25 09:13:33.528596: Client registered bidirectional connection to giop:ssl:10.49.64.81:5610
omniORB: (21) 2024-02-25 09:13:33.529822: Send bidir IIOP service context: ( rhel9build.finw.net:47707 10.49.64.81:29052 90.14.223.174:29052 )
omniORB: (21) 2024-02-25 09:13:33.529881: sendChunk: to giop:ssl:10.49.64.81:5610 179 bytes
omniORB: (65) 2024-02-25 09:13:33.531633: inputMessage: from giop:ssl:10.49.64.81:5610 25 bytes
omniORB: (65) 2024-02-25 09:13:33.531644:
4749 4f50 0102 0101 0d00 0000 0200 0000 GIOP............
0000 0000 0000 0000 01                  .........

omniORB: (21) 2024-02-25 09:13:33.534223: sendChunk: to giop:ssl:10.49.64.81:5610 84 bytes
omniORB: (21) 2024-02-25 09:13:33.534230:
4749 4f50 0102 0100 4800 0000 0400 0000 GIOP....H.......
0300 0000 0000 0000 0d00 0000 4578 6368 ............Exch
616e 6765 4167 656e 7440 0084 1000 0000 angeAgent at ......
6372 6561 7465 436f 6d70 6f6e 656e 7400 createComponent.
0000 0000 0100 0000 0300 0000 4c53 0000 ............LS..
0100 0000

omniORB: (65) 2024-02-25 09:13:33.534588: inputMessage: from giop:ssl:10.49.64.81:5610 324 bytes

omniORB: (21) 2024-02-25 09:13:33.534857: sendChunk: to giop:ssl:10.49.64.81:5610 77 bytes
omniORB: (21) 2024-02-25 09:13:33.534864:
4749 4f50 0102 0100 4100 0000 0600 0000 GIOP....A.......
0300 0000 0000 0000 0d00 0000 4578 6368 ............Exch
616e 6765 4167 656e 7440 0084 1000 0000 angeAgent at ......
4765 7445 7863 6861 6e67 6544 6174 6500 GetExchangeDate.
0000 0000 0100 0000 0100 0000 00        ..........

omniORB: (65) 2024-02-25 09:13:33.535080: inputMessage: from giop:ssl:10.49.64.81:5610 37 bytes
omniORB: (65) 2024-02-25 09:13:33.535091:
4749 4f50 0102 0101 1900 0000 0600 0000 GIOP............
0000 0000 0000 0000 0900 0000 3230 3234 ............2024
3032 3235 00

omniORB: (21) 2024-02-25 09:13:33.535522: sendChunk: to giop:ssl:10.49.64.81:5610 68 bytes
omniORB: (21) 2024-02-25 09:13:33.535534:
4749 4f50 0102 0100 3800 0000 0800 0000 GIOP....8.......
0300 0000 0000 0000 0d00 0000 4578 6368 ............Exch
616e 6765 4167 656e 7440 0084 1000 0000 angeAgent at ......
6765 7443 6170 6162 696c 6974 6965 7300 getCapabilities.
0000 0000

omniORB: (65) 2024-02-25 09:13:33.536854: inputMessage: from giop:ssl:10.49.64.81:5610 68 bytes
omniORB: (65) 2024-02-25 09:13:33.536872:
4749 4f50 0102 0101 3800 0000 0800 0000 GIOP....8.......
0000 0000 0000 0000 0a00 0000 0000 0000 ................
0000 0000 0200 0000 0500 0000 0c00 0000 ................
1500 0000 1c00 0000 0700 0000 0300 0000 ................
1600 0000


omniORB: (21) 2024-02-25 09:13:33.540943: Name 'rhel9build.finw.net' resolved to 10.49.64.81
omniORB: (21) 2024-02-25 09:13:33.540979: Enable rope for bidirectional GIOP.
omniORB: (21) 2024-02-25 09:13:33.540985: Client attempt to connect to giop:tcp:10.49.64.81:0
omniORB: (21) 2024-02-25 09:13:33.540987: Switch rope to use address giop:ssl:10.49.64.81:5610
omniORB: (21) 2024-02-25 09:13:33.540990: Unable to open new connection: giop:tcp:10.49.64.81:0
omniORB: (21) 2024-02-25 09:13:33.540992: throw giopStream::CommFailure from giopStream.cc:1299(1,NO,TRANSIENT_ConnectFailed)
omniORB: (21) 2024-02-25 09:13:33.541047: LocateRequest to remote: root/bidirPOA<1>
omniORB: (21) 2024-02-25 09:13:33.541063: Client attempt to connect to giop:ssl:10.49.64.81:5610
omniORB: (21) 2024-02-25 09:13:33.541164: TLS connect to 10.49.64.81:5610


omniORB: (21) 2024-02-25 09:13:33.547567: TLS peer identity for giop:ssl:10.49.64.81:5610 : localhost.localdomain
omniORB: (21) 2024-02-25 09:13:33.547580: Client opened connection to giop:ssl:10.49.64.81:5610
omniORB: (21) 2024-02-25 09:13:33.547584: Client registered bidirectional connection to giop:ssl:10.49.64.81:5610
omniORB: (21) 2024-02-25 09:13:33.547650: sendChunk: to giop:ssl:10.49.64.81:5610 47 bytes

omniORB: (88) 2024-02-25 09:13:33.549133: giop:ssl:10.49.64.81:5610 recv error: error:0A0C0103:SSL routines::internal error
omniORB: (88) 2024-02-25 09:13:33.549149: Error on client receiving from bi-directional connection on strand 0x7f8e84012970. Will scavenge it.
omniORB: (88) 2024-02-25 09:13:33.549155: Error in network receive (start of message): giop:ssl:10.49.64.81:5610
omniORB: (88) 2024-02-25 09:13:33.549166: throw giopStream::CommFailure from giopStream.cc:849(0,NO,COMM_FAILURE_UnMarshalArguments)

omniORB: (88) 2024-02-25 09:13:33.549232: Client connection giop:ssl:10.49.64.81:5610 refcount = 1
omniORB: (88) 2024-02-25 09:13:33.549239: AsyncInvoker: thread id 88 finished immediate server task.

  I can send the complete output to you in private, if you wish.

I currently don't know yet why the receiver opens  a second connection at  09:13:33.540943 to giop:tcp:10.49.64.81:0 (tcp??)

but it looks to me that the SSL internal error in thread

88 causes the connection being scavenged while still having the writer thread on it.

Maybe our software does do some very strangs things after all?


> My performance concern is not just the lock, but also the overhead of
> non-blocking sockets, which means it needs to call poll() or select()
> to see when the socket is available, and potentially makes multiple
> calls into OpenSSL instead of just one. A brief experiment that doesn't
> properly implement the solution but does try to simulate the overheads
> suggests that it is likely to add about 10% to the call times, so it is
> not a trivial cost.
>
> Duncan.
>
Agreed, 10% is not negligible. But using openssl in a way its developers explicitly forbid could be a problem, too.

But finding out what happens exactly first would be the right way to go IMHO.

Greetings,

   Michael





More information about the omniORB-list mailing list