[omniORB] RE: OmniORB 4.0.1 server application hangs: endpoint shutdown pro blem

Fri Oct 1 17:28:41 BST 2004

I have also seen this problem using Solaris 8.  It can be reporduced by
doing a nmap -n localhost port scan.  The port scan will consistantly
cause omniNames to stop listening to incoming connections.

A patch is attatched to fix the problem by handling the errors in
tcpEndpoing::nodifyReadable.  I do not know how this patch will affect
other  system types.  I would guess it wouldn't have a negative impact
assuming the error types are defined.

Jeremy VanGrinsven

> ****Forgot to mention that following problem was reported in SOLARIS. On
> WINDOWS it was no tested.****
> Hello All,
>
> We are using OmniORB 4.0.1 for are server application, which consists of
> various distributed components across the globe. The problem is that the
> our
> server hangs after giving the following error:
> omniORB: Unrecoverable error for this endpoint:
> giop:tcp:10.91.201.202:2222,
> it will no longer be serviced.
> There are no reproducible steps to the above error but it reoccurs in few
> hours of operation. However, upon investigation we have found that one of
> code in OmniORB the above error could be displayed is in the following
> scenario:
> CORBA::Boolean
> tcpEndpoint::notifyReadable(SocketHandle_t fd) {
>   if (fd == pd_socket) {
>     SocketHandle_t sock;
>     sock = ::accept(pd_socket,0,0);
>     if (sock == RC_SOCKET_ERROR) {
>       return 0;
>     }
> ....
> ...
> }
> As it is clear from the above that whenever accept sys call fails (in our
> accept fails with error ECONNABORTED which means "Software caused
> connection
> abort") this routine would return 0 and eventually OmniORB would shutdown
> the endpoint e.g. giop:tcp:10.91.201.202:2222 in our case.
>
> Question 1: Is it desired that whenever there is such failure occurs
> OmniORB
> should stop servicing the concerned endpoint, because in real time accept
> could fail even if there is any n/w problem from the clients who are
> connecting to the server?
>
> To solve this we have changed the giopRendezvouser::execute() in
> giopRendezvouser.cc to do NOT break from the while loop of incase
> AcceptAndMonitor return NULL pointer i.e. internally when accept fails.
> Please see the following code snippet from changed
> giopRendezvouser::execute() method:
> void
> giopRendezvouser::execute()
> {
> ....
> ....
>   CORBA::Boolean exit_on_error;
>
>   do {
>     exit_on_error = 0;
>     giopConnection* newconn = 0;
>     try {
>       newconn = pd_endpoint->AcceptAndMonitor(notifyReadable,this);
>       if (newconn) {
>          pd_server->notifyRzNewConnection(this,newconn);
>       }
>       else {
>         /******** COMMENTED OUT THE FOLLWOING TWO LINES *********
>           exit_on_error = 1;
>          break;
>
> ****************************************************************************
> ***/
>       }
>     }
> ....
> ....
> } // end function
>
> After making the above change now our server logs the SAME error message,
> but resumes and keep listening on the SAME endpoint e.g.
> giop:tcp:10.91.201.202:2222 in our case.
>
> Question 2: Is the above fix right or does it violates CORBA specs in any
> way?
>
> Also, in the current scope we cannot use the multiple endpoints to keep
> server application available as it does not solve our problem.
>
> Regards,
>
> --Kamal
>
> _______________________________________________
> omniORB-list mailing list
> omniORB-list at omniorb-support.com
> http://www.omniorb-support.com/mailman/listinfo/omniorb-list
>
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: omniORB-ECONNABORTED.patch
Type: application/octet-stream
Size: 1876 bytes
Desc: not available
Url : http://www.omniorb-support.com/pipermail/omniorb-list/attachments/20041001/ed9cabb5/omniORB-ECONNABORTED.obj