[omniORB] assert failure during thread creation

Carlson, Andy andycarlson@ipo.att.com
Thu, 7 Oct 1999 04:40:23 -0700


Sai-Lai,

Thanks. I think your suggestion should be fine - you probably
understand what a strand reference count means better than 
I do, so I went for a very minimal change.

Its early days in my concurrency testing and this problem was
not happening under controlled conditions so I cant be certain
exactly how many threads were active.

What I can say is that on one test I was seeing 50 threads
active when things were going wrong. I'm seeing some Solaris
lwp thread error messages and lots of file descriptor limit
messages (I've upped omniORB's own connection limit to 
beyond what the OS can handle for now). The assertion 
problem was stopping me getting to a reproducible test, hence 
my mail.

Andy
----------------------------------------------------------------------------
-----------
Andy Carlson. AT&T Labs (UK)   	Tel: +44 1527 495258
E-Mail: andycarlson@ipo.att.com	Fax: +44 1527 495229


> -----Original Message-----
> From:	Sai-Lai Lo [SMTP:S.Lo@uk.research.att.com]
> Sent:	Thursday, October 07, 1999 12:02 PM
> To:	Carlson, Andy
> Cc:	omniorb-list@uk.research.att.com
> Subject:	Re: [omniORB] assert failure during thread creation
> 
> Andy,
> 
> Thanks for the report. How about the alternative:
> 
> class tcpSocketWorker : public omni_thread {
> public:
>   tcpSocketWorker(tcpSocketStrand* s) : omni_thread(s), pd_sync(s,0,0) {
>     start();                 
>     s->decrRefCount();       // <-- Reach here means we have created a
>                              //     thread successfully.
>   }
>   virtual ~tcpSocketWorker() { }
>   virtual void run(void *arg);
>   static void _realRun(void* arg);
> 
> private:
>   Strand::Sync    pd_sync;
> };
> 
> 
> Just out of interest, what number of threads are you using at the time
> when
> thread create fails?
> 
> Sai-Lai
> 
> 
> 
> >>>>> Carlson, Andy writes:
> 
> > I am testing some code based on the omniORB 2.7.1 source under
> > heavy concurrent load. The following problem happens when resources
> > are being exhausted, but could be handled more gracefully.
> 
> > Here is the code (from tcpSocketRendezvouser::run_undetached)...
> 
> > {
> > 	// locking & state check stuff omitted
> 
> > 	newSt = new tcpSocketStrand(r,new_sock,1);
> newSt-> incrRefCount(1);
> > }
> 
> > // logging stuff omitted
> 
> > try {
> > 	newthr = new tcpSocketWorker(newSt);
> > }
> > catch(...) {
> > 	newthr = 0;
> > }
> 
> > if (!newthr) {
> > 	// big comment omitted
> newSt-> decrRefCount();
> newSt-> shutdown();
> > }
> 
> > The problem is that I sometimes get an assertion failure from
> > the newSt->decrRefCount() call. This checks that the refcount
> > doesnt go negative after being decremented.
> 
> > I think that the problem is this... The tcpSocketWorker is created
> > and decrements the strand refcount in its constructor. It then calls
> > start() to start the thread. This (I think) is throwing an exception
> > which is caught by tcpSocketRendezvouser::run_undetached.
> 
> > As newthr hasnt been set, the 'if (!newthr) code is run, which 
> > calls decrRefCount again which dies because the refcount was
> > already zero.
> 
> > Suggested fix: prefix the decrRefCount with if (!newSt->is_idle())
> 
> 
> -- 
> Sai-Lai Lo                                   S.Lo@uk.research.att.com
> AT&T Laboratories Cambridge           WWW:
> http://www.uk.research.att.com 
> 24a Trumpington Street                Tel:   +44 1223 343000
> Cambridge CB2 1QA                     Fax:   +44 1223 313542
> ENGLAND
>