[omniORB] segmentation violation at startup

henning.schmidt@philips.com henning.schmidt@philips.com
Fri, 30 Mar 2001 14:00:29 -0600


just today there was a message on this list about a segmentation violat=
ion at startup when the omni-libs where linked into some other library =
that was then linked into the executable. I deleted that message becaus=
e I did not know anything about it. So I=20
started my day -- and ran into exactly the same problem (I guess).

BTW I am running omniORB 3.0.3  on SGI IRIX 6.5.10 (using the SGI C++ c=
ompiler MIPSPro 7.3.1.2)

-----------------------------------------------
The situation:
- I have my own library A that encapsulates some omni-ORB APIs.
- I have a second library B that does some other stuff and uses library=
 A.
- both libA and libB are dynamic libraries and both reference the dynam=
ic versions of the omni-orb libs
- finally I link libB to my executable. Since libB references libA in i=
t's .liblist table the runtime linker will find everthing it needs
-> starting up the app gives me an immediate SEGV (before main() is ent=
ered)
-> purify leads me to a "zero page read" in some system call behind the=
 call to pthread_key_create() in posix.cc/line 304.
-> the funny thing is that this does not happen if I jot down a little =
test-executable to simulate this. It only happens if my app is a pretty=
 big one (in my case a ~40 MB executable).

note that my executable does not call any code inside libA, libB or the=
 omnilibs. The thing crashes during initialization just by linking in t=
he libs.=20

-----------------------------------------------
The workaround:
- I make libA a static lib
- I leave libB a dynamic lib and make it resolve it's symbols from the =
static libA and the *static* versions of the omni-libs
- the app links against the dynamic libB
-> this works

-----------------------------------------------
Some analysis of what is going on (may be inaccurate in some points, I'=
ve investigated lib omnithread today for the first time ...)
- At the very end of file omnithread.h there's the declaration of a sta=
tic variable omni_thread_init of type omni_thread::init_t
- this variable gets included into any c-file that includes this header=
.
- this means that the variable eventually ends up in a number of differ=
ent modules. That's fine since the variable is static to the module.
- the variable is being initialized before the main() routine is entere=
d. Afterall that is the whole point of this variable.
- Initializing this variable means calling it's constructor which is de=
fined in file posix.cc and thus comes from module posix.o in the omnith=
read library.
- That c'tor is written so that it's critical parts should get executed=
 only once (*). Calling it means initializing theomnithread-lib
- that c'tor uses variables that are global to the file posix.cc and th=
us should be there when that module (posix.o) is loaded.

-> Initializing this variable will happen by whichever module is loaded=
 first (there is no real control over the order of loading modules. hoe=
wever you can influence it to some extend by the order of arguments on =
the link line). My guess is, that the=20
whole issue has something to do with the order in which the modules are=
 loaded. And if some module that happens to include omnithread.h (and t=
hus this init-variable) gets loaded before the omnithread-library is lo=
aded ... can it be that the=20
initialization code that runs inside the module posix.o uses unitiliali=
zed variables from it's own module, or even variables that are not real=
ly there at that time ... "zero page read" ... ?

-> Again, I am not sure what the cause of all this really is. But it mi=
ght be safer to initialize the omnithread-library in some other way. E.=
g. my linker (ld32 on SGI) has an option when building dynamic librarie=
s that lets you specify a symbol=20
(function) in the shared-lib that gets executed directly after loading =
the lib. This way I am sure that the lib is actually loaded before any =
of it's methods are executed.

 (*) as mentioned above the code for the initialization c'tor omni_thre=
ad::init_t::init_t is written such that it should be executed exactly o=
nce (other calls return immediately). While playing around with link-or=
ders and static and dymamic libs today I=20
have seen it often times that the c'tor was actually executed more than=
 once (when I define the DB-macro in posix.cc there's a message on cerr=
 for each invocation of that c'tor). So the model that relys on the <co=
unt> variable does not seem to be safe.=20
It might be safer to use the pthread_once() to make sure it is called j=
ust once ...


;Henning



--
H. Henning Schmidt     <Henning.Schmidt@Philips.com>
Philips Broadcast / Film Imaging Products
phone: +1 (408) 617 5751
fax:        +1 (408) 617 7713
http://www.broadcast.philips.com/Web/FProductType.asp?lNodeId=3D282
=