[omniORB] Py_ServantLocator Core Dump

Gary Pennington Gary.Pennington@uk.sun.com
Wed, 20 Sep 2000 14:39:29 +0100


Duncan Grisby wrote:

> On Monday 18 September, Gary Pennington wrote:
>
> > I've done some more investigating, still no debug build unfortunately, but I
> > have got some information.
>
> What problem do you have in creating a debug build?

Basically I'd lent my compiler CD to a friend and only got it back last night. You
wouldn't believe how difficult it is to get hold of stuff if you are only an
employee...

Anyway, I now have a debug build of omniORB3.0.1 and omniORBpy1.1. Built using
Forte Workshop 6, Solaris 8 and the latest and greatest patches.

The problem is reproducible and occurs at line 1097 in pyServant.cc

t@64 (l@11) signal SEGV (no mapping at the fault address) in
Py_ServantLocator::postinvoke at line 1097 in file "pyServant.cc"
 1097     pyos = (omniPy::Py_omniServant*)serv->_ptrToInterface("Py_omniServant");

(/opt/SUNWspro/bin/../WS6/bin/sparcv9/dbx) print serv
serv = 0x2faa7c

(/opt/SUNWspro/bin/../WS6/bin/sparcv9/dbx) print -r *serv
dbx: read of 4 bytes at address 0x72790090 failed -- Error 0
dbx: reference through nil pointer

(/opt/SUNWspro/bin/../WS6/bin/sparcv9/dbx) where
current thread: t@64
=>[1] Py_ServantLocator::postinvoke(this = 0x272d00, oid = CLASS, poa = 0x272ad8,
operation = 0xfdc07ab0 "getHistory", cookie = 0xb4bd8, serv = 0x2faa7c), line 1097
in "pyServant.cc"
  [2] _0RL_lcfn_3c165f58b5a16b59_90000000(cd = 0xfdc07618, svnt = 0x272d08), line
881 in "poastubs.cc"
  [3] omniCallDescriptor::doLocalCall(this = 0xfdc07618, servant = 0x272d08), line
90 in "callDescriptor.h"
  [4] omniOrbPOA::dispatch(this = 0x27a2a0, call_desc = CLASS, id = 0x12f630),
line 1387 in "poa.cc"
  [5] omniLocalIdentity::dispatch(this = 0x12f630, call_desc = CLASS), line 114 in
"localIdentity.cc"
  [6] omniObjRef::_invoke(this = 0x2a5ab0, call_desc = CLASS, do_assert = '\001'),
line 523 in "omniObjRef.cc"
  [7] PortableServer::_objref_ServantLocator::postinvoke(this = 0x2a5aa0, oid =
CLASS, adapter = 0x272ad8, operation = 0xfdc07ab0 "getHistory", the_cookie =
0xb4bd8, the_servant = 0x2faa7c), line 889 in "poastubs.cc"
  [8] omniOrbPOA::call_postinvoke(this = 0x272ad8, sl = 0x2a5aa0, oid = CLASS, op
= 0xfdc07ab0 "getHistory", cookie = 0xb4bd8, servant = 0x2faa7c), line 2527 in
"poa.cc"
  [9] omniOrbPOA::dispatch_to_sl(this = 0x272ad8, giop_s = CLASS, key = 0x31e030
"\xffMainPOA\xffGameMgrPOA\xfe9\xc8\x90\xe0\x93^P", keysize = 34), line 2511 in
"poa.cc"
  [10] omniOrbPOA::dispatch(this = 0x272ad8, giop_s = CLASS, key = 0x31e030
"\xffMainPOA\xffGameMgrPOA\xfe9\xc8\x90\xe0\x93^P", keysize = 34), line 1356 in
"poa.cc"
  [11] GIOP_S::HandleRequest(this = 0xfdc07a64, byteorder = '\0'), line 616 in
"giopServer.cc"
  [12] GIOP_S::dispatcher(s = 0x337e00), line 406 in "giopServer.cc"
  [13] tcpSocketWorker::_realRun(arg = 0x337e00), line 1610 in
"tcpSocketMTfactory.cc"
  [14] omniORB::giopServerThreadWrapper::run(this = 0x100310, fn = 0xfe635808 =
&tcpSocketWorker::_realRun(void*), arg = 0x337e00), line 547 in "omniORB.h"
  [15] tcpSocketWorker::run(this = 0x2cab30, arg = 0x337e00), line 1581 in
"tcpSocketMTfactory.cc"
  [16] omni_thread_wrapper(ptr = 0x2cab30), line 421 in "posix.cc"

I ran this a couple of times and sometimes I got a different error message :-

t@143 (l@9) signal BUS (invalid address alignment) in
Py_ServantLocator::postinvoke at line 1097 in file "pyServant.cc"
 1097     pyos = (omniPy::Py_omniServant*)serv->_ptrToInterface("Py_omniServant");

I guess this is because there is a corrupted pointer which sometimes causes a SEGV
(when referencing invalid memory) or a BUS (when referencing memory in our address
space which isn't aligned correctly). I checked up through the call stack and the
servant value appears to be the value returned at 7 by sl_preinvoke, so I don't
know why it isn't valid.

I also am getting a problem where occasionally the following occurs :-

t@125 (l@10) signal BUS (invalid address alignment) in PyDict_GetItem at 0x47768
0x00047768: PyDict_GetItem+0x0004:      ld      [%i0 + 0x4], %g3
Current function is omniPy::Py_omniServant::_dispatch
  436     PyObject* desc = PyDict_GetItemString(opdict_,
(char*)giop_s.operation());

I think this is a slightly different problem. opdict_ is set to :-

(/opt/SUNWspro/bin/../WS6/bin/sparcv9/dbx) print opdict_
opdict_ = 0xffffffff

I'm not sure how this is happening because from my reading of pyServant.cc the
opdict_ has an incremented reference in the Py_omniServant constructor and this is
not decremented until the destructor is invoked.

>
>
> [...]
> > self.factory.persist is locked on entry and released on termination.
> > I mess around with serv state using __getstate__ to copy the serv dict object
> > and delete the "__omni_svt" entry from the copied dictionary.
>
> I suspect that this might be the problem. While you are doing the
> postinvoke(), another call might come in for the same object, causing
> preinvoke() to be called again. If you return the same servant object
> at a time when you are messing with __omni_svt, that could confuse
> things enough to cause a segfault.

This was a red herring. I even get the error when I only run tests which
DO NOT invoke the persistence functionality. I don't actually mess around with
__omni_svt other than to remove it's entry in the copied object dictionary, e.g.

def __getstate__(self):
    try:
        self.lock.acquire()
        tempDICT=self.__dict__.copt()
    finally:
        self.lock.release()
    del tempDICT["__omni_svt"]    #Unserializable omni artifact
    del tempDICT["lock"]    #My lock, unserializable
    return tempDICT
This shouldn't cause any problems and as I say, the problem occurs even when this
code path isn't invoked.

>
>
> Cheers,
>
> Duncan.
>
> --
>  -- Duncan Grisby  \  Research Engineer  --
>   -- AT&T Laboratories Cambridge          --
>    -- http://www.uk.research.att.com/~dpg1 --

Anyway, I hope that this is enough detail to be going on with. I hope somebody can
tell me what I'm doing wrong here, because I'm lost. I can also send up my
implementation (in Python) of a ServantLocator if that would be helpful, it's only
about 30 lines or so of code.


Gary