[omniORB] Unrecoverable error for this endpoint - EBADF

Thu Sep 26 15:02:02 2002

> Was this race condition ever fixed for omniORB4?
It looks like a variation on the patch was applied back in February after
all.  I'll investigate further why we still seem to be seeing similar random
behaviour on Windows 2000.

Norrie

> -----Original Message-----
> From: Norrie Quinn [mailto:norrie.quinn@tumbleweed.com]
> Sent: Wednesday, September 25, 2002 2:57 PM
> To: omniORB-list@omniorb-support.com
> Cc: Bastiaan Bakker
> Subject: [omniORB] Unrecoverable error for this endpoint - EBADF
> 
> 
> Hi,
> 
> Was this race condition ever fixed for omniORB4?
> 
> We are seeing the same behaviour on SMP Windows 2000 machines 
> under heavy
> load, and the patch below (or similar) does not seem to have 
> been applied to
> the cvs source.
> 
> Regards
> Norrie
> 
> > -----Original Message-----
> > From: Bastiaan Bakker [mailto:Bastiaan.Bakker@lifeline.nl]
> > Sent: Tuesday, February 05, 2002 1:43 AM
> > To: Duncan Grisby
> > Cc: omniorb-list@uk.research.att.com
> > Subject: RE: [omniORB] RE: serious stability problems with omniORB4
> > snapshots on Solaris 8: bug located!
> > 
> > 
> > Hi,
> > 
> > I've created a small patch to work around the EBADF problem. 
> > As I suggested yesterday, it simply retries the fd_set 
> > creation and select() in case of EBADF. In a couple of quick 
> > tests, using 20 concurrent eg2_clts it retries once every 
> > 1000 to 4000 SocketCollection::Select() calls. Of course on 
> > very busy systems this figure may become impractically worse.
> > 
> > Please let me know what you think.
> > 
> > Cheers,
> > 
> > Bastiaan Bakker
> > LifeLine Networks bv
> > 
> > 
> > -----Original Message-----
> > From: Bastiaan Bakker [mailto:Bastiaan.Bakker@lifeline.nl]
> > Sent: Monday, February 04, 2002 7:05 AM
> > To: Duncan Grisby
> > Cc: omniorb-list@uk.research.att.com
> > Subject: RE: [omniORB] RE: serious stability problems with omniORB4
> > snapshots on Solaris 8: bug located!
> > 
> > 
> > Hi all,
> > 
> > I've located a race condition in SocketCollection::Select, 
> > which causes at least one of my problems:
> > 
> > the 'Unrecoverable error for this endpoint: 
> > giop:unix:/tmp/echo.bb, it will no longer be serviced.' is 
> > caused by a race condition in SocketCollection::Select. This 
> > method first creates a file descriptor set and then performs 
> > a select on it. However, between the fd_set creation and the 
> > select call another thread may have closed() a connection 
> > file descriptor in this set. This causes select() to return 
> > EBADF ('invalid file descriptor'). Way up in the call chain 
> > this is translated to an 'unrecoverable error', with known 
> results....
> > 
> > I guess the easiest solution to this problem is to check for 
> > EBADF and retry the fd_set creation and select() in that case. 
> > 
> > Any suggestions?
> > 
> > Cheers,
> > 
> > Bastiaan Bakker
> > LifeLine Networks bv
> 
> _______________________________________________
> omniORB-list mailing list
> omniORB-list@omniorb-support.com
> http://www.omniorb-support.com/mailman/listinfo/omniorb-list
>