[omniORB] Sun C++ 5 or 6 "throw" is not thread safe.

Sai-Lai Lo S.Lo@uk.research.att.com
31 Aug 2000 10:16:12 +0100


Arnault,

>>>>> Arnault Bonafos writes:

> =A0 After having compiled (with SunCompiler 5.0) the Sai Lai Lo revised
> program from the omniORB archive on a Ultra 5 Solaris2.7 single processor
> machine (with all necessary patches), it appears that this test works
> fine.  The test machine is a SMP machine Solaris2.7, 4 processors, with
> the OS related patches applied.  Before applying the patches the program
> was not runnable because of some linker issue.

> Can Sai Lai Lo confirm, or someone else, that this test is supposed to
> fail (crash) on a unpatched SMP machine, the message seems to be clear, I
> just want to confirm this point.  Should the success of this test confirm
> the thread safety of SunCompiler 5.0?

No. As I have said in the message you quoted, the problem is not as simple
as it first looks. In fact the revised test program works fine on an SMP
machine but my concurrent omniORB test still SEGV occasionally.

The following is the test code I used. It can be used against the example
servers, eg3_impl for instance. If you run the test program and kill it
while it is running, the server occasionally will SEGV. What the test
program does is to fire off concurrent requests to the server, the ORB by
default opens 5 connections to the server. When the test program is
killed, the server side threads throw COMM_FAILURE (or
omniConnectionBroken) simultaneously. Looking at the core dump or running
it under dbx, the SEGV is always inside the exception unwinding.  This
happens on a Solaris 2.7 4-processors SMP machine. It doesn't core dump on
a uniprocessor machine but I don't think we can be certain it is a SMP only
problem.

When COMM_FAILURE is thrown, the exception unwinding releases all the
resource associated with the connection and performs other miscellaneous
clean-up before the thread handling the connection exits. In other words,
the unwinding code is more complicated than the previous test program I've
posted and may exercise some part of the Sun's runtime that is
non-thread safe.=20

If one is to get to the bottom of this, I'd suggest looking at the stack of
the tcpSocketWorker thread which should block in tcpSocketStrand::ll_recv.=
=20
For each stack frame, look for auto variables that will have their dtor
called when the COMM_FAILURE (or omniConnectionBroken) exception is thrown.
May be a subset of dtor code hits on the non-thread safe code in Sun's
runtime. May be a simpler test case can be produced or a workaround can
be found. This, however, is a time consuming exercise which we can't
undertake at the moment.

You may have to modify the test code a bit as it is part of our testsuite
and  have some extra code to link with the test environment.

Sai-Lai

------------------------------------
// Testing code: client of echo objects
//
//   interface Echo {
//        string echoString(in string mesg);
//   };
//

#include <iostream.h>
#include <testecho.hh>
#include <common/omnitest.h>


omni_mutex cerr_sync;

static
void
contact(char* id, Echo_ptr e)
{
  int loopcount =3D 10;
  while (loopcount--) {
    try {
      char * echostr;
      echostr =3D e->echoString((char *)"abcde");
      {
	omni_mutex_lock s(cerr_sync);
	cerr << id << ": reply " << echostr << endl;
      }
      if (strcmp((const char *)echostr,"abcde")) {
	cerr << loopcount << " : echo string differs ('" << "abcde', '"=20
	     << (char *)echostr << "')" << endl;
	OMNI_FAILED("echo string differs");
      }
      CORBA::string_free(echostr);
    }
    catch (...) {
      OMNI_FAILED("Caught system exception. Abort");
    }
  }
}

class worker : public omni_thread {
public:
  worker(char* id,Echo_ptr e) : omni_thread(id) {
    pd_e =3D e;
    start_undetached();
    return;
  }
  virtual void* run_undetached(void*id) {
    contact((char*)id,pd_e);
    return 0;
  };
  virtual ~worker() {}
private:
  Echo_var pd_e;
};


class MyApp : public OmniTestApp {
public:
  virtual int main(int argc, char* argv[]);
};

static MyApp a;


int
MyApp::main(int argc, char** argv)
{
  OMNI_SIMPLE_CLIENT_INIT(Echo, e);


  worker* worker1 =3D  new worker("worker 1:",Echo::_duplicate(e));
  worker* worker2 =3D  new worker("worker 2:",Echo::_duplicate(e));
  worker* worker3 =3D  new worker("worker 3:",Echo::_duplicate(e));
  worker* worker4 =3D  new worker("worker 4:",Echo::_duplicate(e));
  worker* worker5 =3D  new worker("worker 5:",Echo::_duplicate(e));
  worker* worker6 =3D  new worker("worker 6:",Echo::_duplicate(e));
  worker* worker7 =3D  new worker("worker 7:",Echo::_duplicate(e));
  worker* worker8 =3D  new worker("worker 8:",Echo::_duplicate(e));
  worker* worker9 =3D  new worker("worker 9:",Echo::_duplicate(e));
  worker* worker10 =3D  new worker("worker 10:",Echo::_duplicate(e));
  contact("main",e);
  worker1->join(0);
  worker2->join(0);
  worker3->join(0);
  worker4->join(0);
  worker5->join(0);
  worker6->join(0);
  worker7->join(0);
  worker8->join(0);
  worker9->join(0);
  worker10->join(0);


  test_complete();
  return 1;
}
-------------------------------------------------


--=20
Sai-Lai Lo                                   S.Lo@uk.research.att.com
AT&T Laboratories Cambridge           WWW:   http://www.uk.research.att.com=
=20
24a Trumpington Street                Tel:   +44 1223 343000
Cambridge CB2 1QA                     Fax:   +44 1223 313542
ENGLAND