[omniORB] Re: ThreadPool model and ORBmaxServerThreadPerConnection (Corrected Patch)

Mon Mar 1 11:21:06 GMT 2004

Hi
I have realized that the previous patch was wrong because it breaks the 
mutex holding
strategy, which might result in deadlock. Here is a correct patch.

*** giopServer.cc    Fri Feb 27 11:30:57 2004
--- giopServer-new.cc    Mon Mar  1 10:38:12 2004
***************
*** 1004,1026 ****
        }
      }

-     // Connection is selectable now
-     conn->setSelectable(1);
-
      // Worker is no longer needed.
      {
!       omni_tracedmutex_lock sync(pd_lock);

        if (conn->pd_n_workers == 1 && ( conn->pd_dying || pd_state == 
INFLUX )) {
      // Connection is dying. Go round again so this thread spots
      // the condition.
      omniORB::logs(25, "Last worker sees connection is dying.");
      return 1;
        }
        w->remove();
        delete w;
        conn->pd_n_workers--;
        pd_n_temporary_workers--;
      }
      return 0;
    }
--- 1004,1031 ----
        }
      }

      // Worker is no longer needed.
      {
!       pd_lock.lock();

        if (conn->pd_n_workers == 1 && ( conn->pd_dying || pd_state == 
INFLUX )) {
      // Connection is dying. Go round again so this thread spots
      // the condition.
+     // Connection is selectable now
      omniORB::logs(25, "Last worker sees connection is dying.");
+     pd_lock.unlock();
+    
+     conn->setSelectable(1);
      return 1;
        }
        w->remove();
        delete w;
        conn->pd_n_workers--;
        pd_n_temporary_workers--;
+       pd_lock.unlock();
+      
+       // Connection is selectable now
+       conn->setSelectable(1);
      }
      return 0;
    }

Serguei Kolos wrote:

> Hello
>
> I believe I have found the reason of this problem. This is of course 
> the race condition
> between giopServer::notifyRzReadable and giopServer::notifyWkDone.
> It happens because the connection's socket is made selectable 
> (giopServer.cc:1008)
> before the current worker is destroyed in the 
> giopServer::notifyWkDone. In this case the
> giopServer::notifyRzReadable function can be called in between of this 
> two things and it
> might set the conn->pd_has_hit_n_workers_limit to 1 because the 
> conn->pd_n_workers
> is still equal to 1 (giopServer.cc:815), but then the worker is 
> destroyed by the
> giopServer::notifyWkDone and nobody takes care about the connection, 
> which is really
> readable. Below is the patch.
> Cheers,
> Sergei
>
>