Connection and Thread Management

In CORBA, the ORB is the ‘middleware’ that allows a client to invoke an operation on an object without regard to its implementation or location. In order to invoke an operation on an object, a client needs to ‘bind’ to the object by acquiring its object reference. Such a reference may be obtained as the result of an operation on another object (such as a naming service or factory object) or by conversion from a stringified representation. If the object is in a different address space, the binding process involves the ORB building a proxy object in the client’s address space. The ORB arranges for invocations on the proxy object to be transparently mapped to equivalent invocations on the implementation object.

For the sake of interoperability, CORBA mandates that all ORBs should support IIOP as the means to communicate remote invocations over a TCP/IP connection. IIOP is usually¹ asymmetric with respect to the roles of the parties at the two ends of a connection. At one end is the client which can only initiate remote invocations. At the other end is the server which can only receive remote invocations.

Notice that in CORBA, as in most distributed systems, remote bindings are established implicitly without application intervention. This provides the illusion that all objects are local, a property known as ‘location transparency’. CORBA does not specify when such bindings should be established or how they should be multiplexed over the underlying network connections. Instead, ORBs are free to implement implicit binding by a variety of means.

The rest of this chapter describes how omniORB manages network connections and the programming interface to fine tune the management policy.

6.2 The model

omniORB is designed from the ground up to be fully multi-threaded. The objective is to maximise the degree of concurrency and at the same time eliminate any unnecessary thread overhead. Another objective is to minimise the interference by the activities of other threads on the progress of a remote invocation. In other words, thread ‘cross-talk’ should be minimised within the ORB. To achieve these objectives, the degree of multiplexing at every level is kept to a minimum by default.

Minimising multiplexing works well when the system is relatively lightly loaded. However, when the ORB is under heavy load, it can sometimes be beneficial to conserve operating system resources such as threads and network connections by multiplexing at the ORB level. omniORB has various options that control its multiplexing behaviour.

6.3 Client side behaviour

On the client side of a connection, the thread that invokes on a proxy object drives the GIOP protocol directly and blocks on the connection to receive the reply. The first time the client makes a call to a particular address space, the ORB opens a suitable connection to the remote address space (based on the client transport rule as described in section 7.5). After the reply has been received, the ORB caches the open network connection, ready for use by another call.

If two (or more) threads in a multi-threaded client attempt to contact the same address space simultaneously, there are two different ways to proceed. The default way is to open another network connection to the server. This means that neither the client or server ORB has to perform any multiplexing on the network connections—multiplexing is performed by the operating system, which has to deal with multiplexing anyway. The second possibility is for the client to multiplex the concurrent requests on a single network connection. This conserves operating system resources (network connections), but means that both the client and server have to deal with multiplexing issues themselves.

In the default one call per connection mode, there is a limit to the number of concurrent connections that are opened, set with the maxGIOPConnectionPerServer parameter. To tell the ORB that it may multiplex calls on a single connection, set the oneCallPerConnection parameter to zero. If the oneCallPerConnection parameter is set to the default value of one, and there are more concurrent calls than specified by maxGIOPConnectionPerServer, calls block waiting for connections to become free.

Note that some server-side ORBs, including omniORB versions before version 4.0, are unable to deal with concurrent calls multiplexed on a single connection, so they serialise the calls. It is usually best to keep to the default mode of opening multiple connections.

6.3.1 Client side timeouts

omniORB can associate a timeout with a call, meaning that if the call takes too long a CORBA::TIMEOUT exception² is thrown. Timeouts can be set for the whole process, for a specific thread, or for a specific object reference.

namespace omniORB { void setClientCallTimeout(CORBA::ULong millisecs); void setClientCallTimeout(CORBA::Object_ptr obj, CORBA::ULong millisecs); void setClientThreadCallTimeout(CORBA::ULong millisecs); void setClientThreadCallDeadline(const omni_time_t& tt); void setClientConnectTimeout(CORBA::ULong millisecs); };

setClientCallTimeout() sets either the global timeout or the timeout for a specific object reference.

setClientThreadCallTimeout() sets the timeout for the calling thread. Alternatively, setClientThreadCallDeadline() sets an absolute deadline for calls on thread. The calling thread must have an omni_thread associated with it.

Accessing per-thread state is a relatively expensive operation, so per thread timeouts are disabled by default. The supportPerThreadTimeOut parameter must be set true to enable them.

To choose the timeout value to use for a call, the ORB first looks to see if there is a timeout for the object reference, then to the calling thread, and finally to the global timeout.

When a client has no existing connection to communicate with a server, it must open a new connection before performing the call. setClientConnectTimeout() sets an overriding timeout for cases where a new connection must be established. The effect of the connect timeout depends upon whether the connect timeout is greater or less than the timeout that would otherwise be used.

Connect timeout > usual timeout

If the connect timeout is set to 20 seconds, then a call that establishes a new connection will be permitted 20 seconds before it times out. Subsequent calls using the same connection have the normal 10 second timeout. If establishing the connection takes 8 seconds, then the call itself takes 5 seconds, the call succeeds despite having taken 13 seconds in total, longer than the usual timeout.

If an object reference has multiple possible endpoints available, and connecting to the first endpoint times out, only that one endpoint will have been tried before an exception is raised. However, once the timeout has occurred, the object reference will switch to use the next endpoint. If the application attempts to make another call, it will use the next endpoint.

Connect timeout < usual timeout

If the connect timeout is set to 2 seconds, the actual network-level connect is only permitted to take 2 seconds. As long as the connection is established in less than 2 seconds, the call can proceed. The 10 second call timeout still applies to the time taken for the whole call (including the connection establishment). So, if establishing the connection takes 1.5 seconds, and the call itself takes 9.5 seconds, the call will time out because although it met the connection timeout, it exceeded the 10 second total call timeout. On the other hand, if establishing the connection takes 3 seconds, the call will fail after only 2 seconds, since only 2 seconds are permitted for the connect.

If an object reference has multiple possible endpoints available, the client will attempt to connect to them in turn, until one succeeds. The connect timeout applies to each connection attempt. So with a connect timeout of 2 seconds, the client will spend up to 2 seconds attempting to connect to the first address and then, if that fails, up to 2 seconds trying the second address, and so on. The 10 second timeout still applies to the call as a whole, so if the total time taken on timed-out connection attempts exceeds 10 seconds, the call will time out.

This kind of configuration is useful where calls may take a long time to complete (so call timeouts are long), but a fast indication of connection failure is required.

6.4 Server side behaviour

The server side has two primary modes of operation: thread per connection and thread pooling. It is able to dynamically transition between the two modes, and it supports a hybrid scheme that behaves mostly like thread pooling, but has the same fast turn-around for sequences of calls as thread per connection.

6.4.1 Thread per connection mode

In thread per connection mode (the default, and the only option in omniORB versions before 4.0), each connection has a single thread dedicated to it. The thread blocks waiting for a request. When it receives one, it unmarshals the arguments, makes the up-call to the application code, marshals the reply, and goes back to watching the connection. There is thus no thread switching along the call chain, meaning the call is very efficient.

As explained above, a client can choose to multiplex multiple concurrent calls on a single connection, so once the server has received the request, and just before it makes the call into application code, it marks the connection as ‘selectable’, meaning that another thread should watch it to see if any other requests arrive. If they do, extra threads are dispatched to handle the concurrent calls. GIOP 1.2 actually allows the argument data for multiple calls to be interleaved on a connection, so the unmarshalling code has to handle that too. As soon as any multiplexing occurs on the connection, the aim of removing thread switching cannot be met, and there is inevitable inefficiency due to thread switching.

The maxServerThreadPerConnection parameter can be set to limit the number of threads that can be allocated to a single connection containing concurrent calls. Setting the parameter to 1 mimics the behaviour of omniORB versions before 4.0, that did not support calls multiplexed on one connection.

6.4.2 Thread pool mode

In thread pool mode, selected by setting the threadPerConnectionPolicy parameter to zero, a single thread watches all incoming connections. When a call arrives on one of them, a thread is chosen from a pool of threads, and set to work unmarshalling the arguments and performing the up-call. There is therefore at least one thread switch for each call.

The thread pool is not pre-initialised. Instead, threads are started on demand, and idle threads are stopped after a period of inactivity. The maximum number of threads that can be started in the pool is set with the maxServerThreadPoolSize parameter. The default is 100.

A common pattern in CORBA applications is for a client to make several calls to a single object in quick succession. To handle this situation most efficiently, the default behaviour is to not return a thread to the pool immediately after a call is finished. Instead, it is set to watch the connection it has just served for a short while, mimicking the behaviour in thread per connection mode. If a new call comes in during the watching period, the call is dispatched without any thread switching, just as in thread per connection mode. Of course, if the server is supporting a very large number of connections (more than the size of the thread pool), this policy can delay a call coming from another connection. If the threadPoolWatchConnection parameter is set to zero, connection watching is disabled and threads return to the pool immediately after finishing a single request.

In the face of multiplexed calls on a single connection, multiple threads from the pool can be dispatched for one connection, just as in thread per connection mode. With threadPoolWatchConnection set to the default value of 1, only the last thread servicing a connection will watch it when it finishes a request. Setting the parameter to a larger number allows the last n connections to watch the connection.

6.4.3 Policy transition

If the server is dealing with a relatively small number of connections, it is most efficient to use thread per connection mode. If the number of connections becomes too large, however, operating system limits on the number of threads may cause a significant slowdown, or even prevent the acceptance of new connections altogether.

To give the most efficient response in all circumstances, omniORB allows a server to start in thread per connection mode, and transition to thread pooling if many connections arrive. This is controlled with the threadPerConnectionUpperLimit and threadPerConnectionLowerLimit parameters. The upper limit must always be larger than the lower limit. The upper limit chooses the number of connections at which time the ORB transitions to thread pool mode; the lower limit selects the point at which the transition back to thread per connection is made.

For example, setting the upper limit to 50 and the lower limit to 30 would mean that the first 49 connections would receive dedicated threads. The 50th to arrive would trigger thread pooling. All future connections to arrive would make use of threads from the pool. Note that the existing dedicated threads continue to service their connections until the connections are closed. If the number of connections falls below 30, thread per connection is reactivated and new connections receive their own dedicated threads (up to the limit of 50 again). Once again, existing connections in thread pool mode stay in that mode until they are closed.

6.5 Idle connection shutdown

It is wasteful to leave a connection open when it has been left unused for a considerable time. Too many idle connections could block out new connections when the system runs out of spare communication channels. For example, most platforms have a limit on the number of file handles a process can open. Many platforms have a very small default limit like 64. The value can often be increased to a maximum of a thousand or more by changing the ‘ulimit’ in the shell.

Every so often, a thread scans all open connections to see which are idle. The scanning period (in seconds) is set with the scanGranularity parameter. The default is 5 seconds.

Outgoing connections (initiated by clients) and incoming connections (initiated by servers) have separate idle timeouts. The timeouts are set with the outConScanPeriod and inConScanPeriod parameters respectively. The values are in seconds, and must be a multiple of the scan granularity.

Beware that setting outConScanPeriod or inConScanPeriod to be equal to (or less than) scanGranularity means that connections are considered candidates for closure immediately after they are opened. That can mean that the connections are closed before any calls have been sent through them. If oneway calls are used, such connection closure can result in silent loss of calls.

6.5.1 Interoperability Considerations

The IIOP specification allows both the client and the server to shutdown a connection unilaterally. When one end is about to shutdown a connection, it should send a CloseConnection message to the other end. It should also make sure that the message will reach the other end before it proceeds to shutdown the connection.

The client should distinguish between an orderly and an abnormal connection shutdown. When a client receives a CloseConnection message before the connection is closed, the condition is an orderly shutdown. If the message is not received, the condition is an abnormal shutdown. In an abnormal shutdown, the ORB should raise a COMM_FAILURE exception whereas in an orderly shutdown, the ORB should not raise an exception and should try to re-establish a new connection transparently.

omniORB implements these semantics completely. However, it is known that some ORBs are not (yet) able to distinguish between an orderly and an abnormal shutdown. Usually this is manifested as the client in these ORBs seeing a COMM_FAILURE occasionally when connected to an omniORB server. The work-around is either to catch the exception in the application code and retry, or to turn off the idle connection shutdown inside the omniORB server.

GIOP 1.2 supports ‘bidirectional GIOP’, which permits the rôles to be reversed.

Or CORBA::TRANSIENT if the backwards-compatibility throwTransientOnTimeOut parameter is set to 1.

Chapter 6 Connection and Thread Management

6.1 Background