[omniORB] Canceling a blocking function

Han Kiliccote kiliccote@cmu.edu
Sun, 2 Apr 2000 14:00:09 -0400


At Carnegie Mellon University, we are developing a prototype for a
distributed system that contains a very large number of servers (e.g.,
100000servers). In this prototype, we need to send a request to a large
subset of these servers (e.g., 100).

Currently we have a loop that uses a thread pool to attach a thread to a
server and each thread calls a function in a different server.

When a percentage (e.g., 50) of these functions return, we would like to
cancel the operation in the remaining threads which are blocking either
because the servers are down/faulty or just about to complete but not yet
completed.

Currently we don't know how to do this. In each remaining thread, there is a
call

server[i]-> do_function(argument) // blocked (no reply yet)

How can we unblock this? We don't want to wait  more than 10sec for these
functions to timeout because since the overall function is deemed completed,
there will be another request soon and this would cause a very large number
threads to exist in the system at any given point. We dont want to lower the
timeout to anything to less than 10sec because this would cause an early
abort in some cases.

Your advice and help are greatly appreciated.

P.S. Shall we switch to one-way functions?

Han Kiliccote
ICES/CMU