[omniORB] Fwd: OmniORB4.1.6 does not detect remote connection loss on time

vishwanath Bhat vishwamegur at gmail.com
Wed Jun 11 15:42:40 BST 2014


Hello Experts

Looking for some help on a problem that I am facing with detecting a peer
loss in OmniORB4.1.6, which was not an issue in ORB4.1.3.

Here is my setup :
-----------------------------
-----------
Setup :
-----------

Have 2 systems ( A and B ) that are talking over IP in point to point wired
link.

Have identical daemon on both sides that is hosting its services using
OmniORB4.1.6 infrastructure.

Have a Ping code that is a part of this daemon itself that will see if the
other identical daemon is running or not by trying to reset some of remote
daemon's vailables.

----------
Code ::
------------

1/. Ping system A from B.
2/. Sleep 15 seconds
3/. Goto step-1 and ping again and this goes on in a loop..

This functionality is working perfecet.

However, if we have certain condition like :

1/. Ping system A from B. --> Lets say this goes okay..
2/. Started sleeping for 15 seconds.
3/. system-B goes away ( say it did a reset ) and since its a point to
point link, the link on this A is also unloaded.
4/. Sleep of 15 seconds finished
5/. Now the ping call that is destined for system B must return immediately
since the network itself is unloaded.

OR

1/. Ping system A from B. --> Lets say this goes okay..
2/. Sleep for 15 seconds.
3/. System-B started resetting and is dumping.
4/. Attempt to Ping system-B
5/. Now the ping call that is destined for system B must return immediately
or sooner since the network itself is unloaded.


==============

I see this is working perfect in ORB4.1.3   -but- in Orb4.1.6, I see A's
ORB takes a lot of time ( in the order of 200+ seconds ) to actually return
to the ping code saying it failed.
It even waits for B to come up and then fail the ping

================

The Message that are displayed when traces enabled at 40 are as follows

============
**************
In 4.1.3 :
**************


00001509.838454| 1776|TRACE        | 110|INF> ORB: omniORB: (?) 2006-09-29
16:07:43.804590: Invoke 'handleSurvPing' on remote: root/Fips MT
POA<Local.Service>
00001509.838454| 1776|TRACE        | 110|
00001509.838844| 1776|TRACE        | 110|INF> ORB: omniORB: (?) 2006-09-29
16:07:43.804912: sendChunk: to giop:tcp:x.y.z.a:90000 80 bytes
00001509.838844| 1776|TRACE        | 110|
00001509.839234| 1776|TRACE        | 110|INF> ORB: omniORB: (?) 2006-09-29
16:07:43.805266:
00001509.839234| 1776|TRACE        | 110|4749 4f50 0102 0000 0000 0044 0000
0496 GIOP.......D....
00001509.839234| 1776|TRACE        | 110|0300 0000 0000 0000 0000 001a ff46
6970 .............Fip
00001509.839234| 1776|TRACE        | 110|7320 4d54 2050 4f41 004c 6f63 616c
2053 s MT POA.Local S
00001509.839234| 1776|TRACE        | 110|6572 7669 6365 697a 0000 000f 6861
6e64 erviceiz....hand
00001509.839234| 1776|TRACE        | 110|6c65 5375 7276 5069 6e67 000f 0000
0000 leSurvPing......
00001509.839234| 1776|TRACE        | 110|
00001509.846890| 1776|TRACE        | 110|INF> ORB: omniORB: (?) 2006-09-29
16:07:43.812898: inputMessage: from giop:tcp:x.y.z.a:90000 32 bytes
00001509.846890| 1776|TRACE        | 110|
00001509.847166| 1776|TRACE        | 110|INF> ORB: omniORB: (?) 2006-09-29
16:07:43.813301:
00001509.847166| 1776|TRACE        | 110|4749 4f50 0102 0001 0000 0014 0000
0496 GIOP............
00001509.847166| 1776|TRACE        | 110|0000 0000 0000 0000 0000 0000 0fbe
3fa3 ..............?.
00001509.847166| 1776|TRACE        | 110|
00001509.847479| 1776|TRACE        | 110|INF> ORB: omniORB: (?) 2006-09-29
16:07:43.813619: Return 'handleSurvPing' on remote: root/Fips MT
POA<Local.Service>
00001509.847479| 1776|TRACE        | 110|

00001509.847548| 1776|TRACE       | 267| Surveillance success,
Communication IDs - Local: 0xFBE3FA3 Remote 0xFBE3FA3

##>> This says the ping went okay <<##

00001524.850535| 1776|TRACE        | 110|INF> ORB: omniORB: (?) 2006-09-29
16:07:58.816821: Invoke 'handleSurvPing' on remote: root/Fips MT
POA<Local.Service>
00001524.850535| 1776|TRACE        | 110|
00001524.851322| 1776|TRACE        | 110|INF> ORB: omniORB: (?) 2006-09-29
16:07:58.817619: sendChunk: to giop:tcp:x.y.z.a:90000 80 bytes
00001524.851322| 1776|TRACE        | 110|
00001524.851676| 1776|TRACE        | 110|INF> ORB: omniORB: (?) 2006-09-29
16:07:58.817899:
00001524.851676| 1776|TRACE        | 110|4749 4f50 0102 0000 0000 0044 0000
049c GIOP.......D....
00001524.851676| 1776|TRACE        | 110|0300 0000 0000 0000 0000 001a ff46
6970 .............Fip
00001524.851676| 1776|TRACE        | 110|7320 4d54 2050 4f41 004c 6f63 616c
2053 s MT POA.Local S
00001524.851676| 1776|TRACE        | 110|6572 7669 6365 697a 0000 000f 6861
6e64 erviceiz....hand
00001524.851676| 1776|TRACE        | 110|6c65 5375 7276 5069 6e67 000f 0000
0000 leSurvPing......
00001524.851676| 1776|TRACE        | 110|

00001537.998148|  699|           |1078|INFO: device0 removed because buddy
clock was lost!
00001538.038406|  699|           |1078|INFO: device2 removed because buddy
clock was lost!
00001538.039077|  699|           |1091|ERR: device_release[-18]

##>> Point to point link was unloaded since remote was reset <<##

00001540.542412| 1776|TRACE        | 110|INF> ORB: omniORB: (?) 2006-09-29
16:08:14.508872: Error in network receive (start of message):
giop:tcp:x.y.z.a:90000

##>> Comm failure detected immediately <<##

00001540.542412| 1776|TRACE        | 110|
00001540.542652| 1776|TRACE        | 110|INF> ORB: omniORB: (?) 2006-09-29
16:08:14.509137: throw giopStream::CommFailure from
giopStream.cc:874(0,MAYBE,COMM_FAILURE_WaitingForReply)
00001540.542652| 1776|TRACE        | 110|
00001540.543409| 1776|TRACE        | 110|INF> ORB: omniORB: (?) 2006-09-29
16:08:14.509867: Client connection refcount = 0
00001540.543409| 1776|TRACE        | 110|
00001540.543624| 1776|TRACE        | 110|INF> ORB: omniORB: (?) 2006-09-29
16:08:14.510131: Client close connection to giop:tcp:x.y.z.a:90000
00001540.543624| 1776|TRACE        | 110|
00001540.544152| 1776|TRACE        | 110|INF> ORB: omniORB: (?) 2006-09-29
16:08:14.510594: throw COMM_FAILURE from omniObjRef.cc:792
(MAYBE,COMM_FAILURE_WaitingForReply)

=====================================================================================================================

--------------
In 4.1.6 :
-------------------

00000214.088587390| 3124|TRACE        | 114|INF> ORB: omniORB: (?)
2009-02-19 06:36:54.198456: Invoke 'handleSurvPing' on remote: root/Fips MT
POA<Local.Service>
00000214.088587390| 3124|TRACE        | 114|
00000214.088659147| 3124|TRACE        | 114|INF> ORB: omniORB: (?)
2009-02-19 06:36:54.198588: sendChunk: to giop:tcp:x.y.z.a:90000 80 bytes
00000214.088659147| 3124|TRACE        | 114|
00000214.088964359| 3124|TRACE        | 114|INF> ORB: omniORB: (?)
2009-02-19 06:36:54.198836:
00000214.088964359| 3124|TRACE        | 114|4749 4f50 0102 0000 0000 0044
0000 0040 GIOP.......D...@
00000214.088964359| 3124|TRACE        | 114|0300 0000 0000 0000 0000 001a
ff46 6970 .............Fip
00000214.088964359| 3124|TRACE        | 114|7320 4d54 2050 4f41 004c 6f63
616c 2053 s MT POA.Local S
00000214.088964359| 3124|TRACE        | 114|6572 7669 6365 697a 0000 000f
6861 6e64 erviceiz....hand
00000214.088964359| 3124|TRACE        | 114|6c65 5375 7276 5069 6e67 000f
0000 0000 leSurvPing......
00000214.088964359| 3124|TRACE        | 114|
00000214.092186617| 3124|TRACE        | 114|INF> ORB: omniORB: (?)
2009-02-19 06:36:54.202080: inputMessage: from giop:tcp:x.y.z.a:90000 32
bytes
00000214.092186617| 3124|TRACE        | 114|
00000214.092253468| 3124|TRACE        | 114|INF> ORB: omniORB: (?)
2009-02-19 06:36:54.202168:
00000214.092253468| 3124|TRACE        | 114|4749 4f50 0102 0001 0000 0014
0000 0040 GIOP...........@
00000214.092253468| 3124|TRACE        | 114|0000 0000 0000 0000 0000 0000
417a 5503 ............AzU.
00000214.092253468| 3124|TRACE        | 114|
00000214.092315155| 3124|TRACE        | 114|INF> ORB: omniORB: (?)
2009-02-19 06:36:54.202246: Return 'handleSurvPing' on remote: root/Fips MT
POA<Local.Service>
00000214.092315155| 3124|TRACE        | 114|
00000214.092336833| 3124|TRACE        | 271| Surveillance success,
Communication IDs - Local: 0x417A5503 Remote 0x417A5503

##>> Showing the Ping has gone okay <<##

00000229.092562595| 3124|TRACE        | 114|INF> ORB: omniORB: (?)
2009-02-19 06:37:09.202431: Invoke 'handleSurvPing' on remote: root/Fips MT
POA<Local.Service>
00000229.092562595| 3124|TRACE        | 114|
00000229.092634882| 3124|TRACE        | 114|INF> ORB: omniORB: (?)
2009-02-19 06:37:09.202564: sendChunk: to giop:tcp:x.y.z.a:90000 80 bytes
00000229.092634882| 3124|TRACE        | 114|
00000229.092733758| 3124|TRACE        | 114|INF> ORB: omniORB: (?)
2009-02-19 06:37:09.202613:
00000229.092733758| 3124|TRACE        | 114|4749 4f50 0102 0000 0000 0044
0000 0042 GIOP.......D...B
00000229.092733758| 3124|TRACE        | 114|0300 0000 0000 0000 0000 001a
ff46 6970 .............Fip
00000229.092733758| 3124|TRACE        | 114|7320 4d54 2050 4f41 004c 6f63
616c 2053 s MT POA.Local S
00000229.092733758| 3124|TRACE        | 114|6572 7669 6365 697a 0000 000f
6861 6e64 erviceiz....hand
00000229.092733758| 3124|TRACE        | 114|6c65 5375 7276 5069 6e67 000f
0000 0000 leSurvPing......
00000229.092733758| 3124|TRACE        | 114|

##>> Another command in flight but the remote has started resetting so the
call is not honored yet <<##

00000256.696070735|  738|   |1104|INFO: device0 removed because buddy clock
was lost!
00000256.917866762| 2991|    | 709|ERR: device_write[-18]
00000257.208085276|  738|  |1104|INFO: device2 removed because buddy clock
was lost!

##>> At 257 seconds is when the point ti point module  as unloaded but ORB
did not detect the loss  <<##

00000257.208401978|  738|        1117|ERR: device_release[-18]
00000272.826223240| 3453|          |1037|INFO: device2 open, ip  (ifconfig
up)
00000272.833337367| 3452|        |1037|INFO: device0 open, ip (ifconfig up)

##>> Remote system even came back on and we loaded the point to point
module since we detected the link again.. <<##

00000441.880148020| 3124|TRACE        | 114|INF> ORB: omniORB: (?)
2009-02-19 06:40:41.990036: Error in network receive (start of message):
giop:tcp:x.y.z.a:90000
00000441.880148020| 3124|TRACE        | 114|

##> It did wait really a long long time to detect the failure since the
loss of remote  <<##

00000441.880201124| 3124|TRACE        | 114|INF> ORB: omniORB: (?)
2009-02-19 06:40:41.990128: throw giopStream::CommFailure from
giopStream.cc:879(0,MAYBE,COMM_FAILURE_WaitingForReply)
00000441.880201124| 3124|TRACE        | 114|
00000441.880471357| 3124|TRACE        | 114|INF> ORB: omniORB: (?)
2009-02-19 06:40:41.990401: Client connection refcount = 0
00000441.880471357| 3124|TRACE        | 114|
00000441.880511072| 3124|TRACE        | 114|INF> ORB: omniORB: (?)
2009-02-19 06:40:41.990448: Client close connection to
giop:tcp:x.y.z.a:90000
00000441.880511072| 3124|TRACE        | 114|
00000441.880628325| 3124|TRACE        | 114|INF> ORB: omniORB: (?)
2009-02-19 06:40:41.990557: throw COMM_FAILURE from omniObjRef.cc:804
(MAYBE,COMM_FAILURE_WaitingForReply)
00000441.880628325| 3124|TRACE        | 114|


Interpreting :: 00000441.880628325  ==> 441 seconds -and- 880628325
microseconds since the boot

==========

Its Very easy to recreate this issue at will.

==============

Here is the Config files :

*********************
CONFIG for 4.1.6 :
*********************

maxGIOPVersion = 1.2

strictIIOP = 1

lcdMode = 0

tcAliasExpand = 0

useTypeCodeIndirections = 1

acceptMisalignedTcIndirections = 0

scanGranularity = 5

nativeCharCodeSet = ISO-8859-1

nativeWCharCodeSet = UTF-16

abortOnInternalError = 0

abortOnNativeException = 0

 InitRef =
NameService_LOCAL=corbaname:omniunix:/tmp/forb.socket.fsp.local.omninames
#deleted some of the internal IPS

clientTransportRule = *
     unix,tcp

clientCallTimeOutPeriod = 0

clientConnectTimeOutPeriod = 15000

supportPerThreadTimeOut = 0

resetTimeOutOnRetries = 0

outConScanPeriod = 30

maxGIOPConnectionPerServer = 5

oneCallPerConnection = 0

offerBiDirectionalGIOP = 0

diiThrowsSysExceptions = 0

verifyObjectExistsAndType = 0

giopTargetAddressMode = 0

immediateAddressSwitch = 0

serverTransportRule = *
    unix,tcp

serverCallTimeOutPeriod = 0

inConScanPeriod = 0

threadPerConnectionPolicy = 0

maxServerThreadPerConnection = 100


maxServerThreadPoolSize = 250

threadPerConnectionUpperLimit = 10000

threadPerConnectionLowerLimit = 9000


threadPoolWatchConnection = 0


connectionWatchPeriod = 1

connectionWatchImmediate = 0

acceptBiDirectionalGIOP = 0

unixTransportDirectory = /tmp/

unixTransportPermission = 0777

supportCurrent = 0

copyValuesInLocalCalls = 1

objectTableSize = 0

poaHoldRequestTimeout = 0

poaUniquePersistentSystemIds = 1

supportBootstrapAgent = 0

===========================

*********************
CONFIG for 4.1.6 :
*********************

maxGIOPVersion = 1.2

strictIIOP = 1

lcdMode = 0

tcAliasExpand = 0

useTypeCodeIndirections = 1

acceptMisalignedTcIndirections = 0

scanGranularity = 5

nativeCharCodeSet = ISO-8859-1

nativeWCharCodeSet = UTF-16

abortOnInternalError = 0

abortOnNativeException = 0

InitRef =
NameService_LOCAL=corbaname:omniunix:/tmp/forb.socket.fsp.local.omninames

clientTransportRule = *unix,tcp

clientCallTimeOutPeriod = 0

clientConnectTimeOutPeriod = 15000

supportPerThreadTimeOut = 0

resetTimeOutOnRetries = 0

outConScanPeriod = 30

maxGIOPConnectionPerServer = 5

oneCallPerConnection = 0

offerBiDirectionalGIOP = 0

diiThrowsSysExceptions = 0

verifyObjectExistsAndType = 0

giopTargetAddressMode = 0

immediateAddressSwitch = 0

serverTransportRule = *unix,tcp

serverCallTimeOutPeriod = 0

inConScanPeriod = 0

threadPerConnectionPolicy = 0

maxServerThreadPerConnection = 100

maxServerThreadPoolSize = 50

threadPerConnectionUpperLimit = 10000

threadPerConnectionLowerLimit = 9000

threadPoolWatchConnection = 0

connectionWatchPeriod = 1

connectionWatchImmediate = 0

acceptBiDirectionalGIOP = 0

unixTransportDirectory = /tmp/

unixTransportPermission = 0777

supportCurrent = 0

copyValuesInLocalCalls = 1

objectTableSize = 0

poaHoldRequestTimeout = 0

poaUniquePersistentSystemIds = 1

supportBootstrapAgent = 0

===================

!! THANK YOU VERY MUCH !!

Really appreciate any help in resolving this.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.omniorb-support.com/pipermail/omniorb-list/attachments/20140611/3f6aef8b/attachment-0001.html>


More information about the omniORB-list mailing list