[omniORB] What exactly is the impact of connectionWatchPeriod?

Sun Apr 24 15:16:51 UTC 2022

On Wed, 2022-04-06 at 13:20 +0200, Martin Ba via omniORB-list wrote:

> Context:
> * omniORB 4.2.4 on Windows
> * Windows Application servicing a single client application
> 
> We recently hit some communication delays with omniORB when we had a
> call sequence roughly like this:
> 
> 1. server invokes callback (oneway) function on client
> 2. client does several upcalls from within its callback, but these
> calls are sequenced in a 50ms raster on the server thread servicing
> them.
> (Even though ping is < 2ms)
> 
> In production we hit these delays correlating to the
> [`connectionWatchPeriod`][1] setting and sure enough, reducing it
> from its default of 50ms to 1ms resolved our problem.

This is a sign that the client is using the same connection to the
server that is handling the original incoming call, meaning the thread
handling the connection is busy and another thread must be used.

For this to be happening, I expect you either

  1. have set maxGIOPConnectionPerServer to 1

or

  2. are using bi-directional GIOP, which has the same effect

Is that the case?

What is happening is that omniORB has a thread that is responsible for
watching for incoming calls. When a call comes in on a connection, a
thread unmarshals the data from the connection and starts handling the
incoming call. While the thread is busy doing that, it is possible for
another call to arrive on the same connection. The thread that was
already handling that connection is busy, so another thread must be
dispatched.

The thread that is watching connections spends its time blocked in
select() (or poll() or equivalent). At the instant the thread handling
an incoming call finishes unmarshalling the parameters, the connection
becomes "selectable", but the thread that is watching connections is
blocked in select() so it can't be told immediately that it now has
another connection to watch. That's what connectionWatchPeriod means --
after that amount of time, the call blocked in select() wakes up and
re-checks which connections it should watch. Setting the period shorter
means that the connection watching thread wakes up more often, so uses
more CPU.

On Unix-like platforms, the select / poll is also made to wait on a
pipe, which is used to efficiently wake up the connection watching
thread earlier than the connectionWatchPeriod. That's what
connectionWatchImmediate does. That is not possible on Windows because
select() can only watch sockets, and poking a socket to wake up the
thread is costly enough that it is better just to let the timeout take
effect.

[...]

> *However*, the fact that we are not able to reproduce it outside of
> production -- I tried setting the watch period to 500+ms and still
> couldn't reproduce any delays -- makes me think we might be missing
> something here.

Could you have different configuration regarding other connection
parameters?  omniORB's default configuration allows a client to open up
to 5 concurrent connections to the same server. It does that in
preference to multiplexing concurrent calls on a single connection,
because that allows the server to handle the calls more efficiently.
Does your production system perhaps set maxGIOPConnectionPerServer to
1?

Duncan.

-- 
 -- Duncan Grisby --
  -- duncan at grisby.org --
   -- http://www.grisby.org --