[omniORB] segmentation violation at startup

Sai-Lai Lo S.Lo@uk.research.att.com
02 Apr 2001 10:38:06 +0100


Hi! IRIX linker is known to be pecular. Please read README.SGI for details.
If you don't follow the rules as described in the readme file, you are most
likely to get SEGV.

Regards,

Sai-Lai


>>>>> henning schmidt writes:

> just today there was a message on this list about a segmentation violation at startup when the omni-libs where linked into some other library that was then linked into the executable. I deleted that message because I did not know anything about it. So I 
> started my day -- and ran into exactly the same problem (I guess).

> BTW I am running omniORB 3.0.3  on SGI IRIX 6.5.10 (using the SGI C++ compiler MIPSPro 7.3.1.2)

> -----------------------------------------------
> The situation:
> - I have my own library A that encapsulates some omni-ORB APIs.
> - I have a second library B that does some other stuff and uses library A.
> - both libA and libB are dynamic libraries and both reference the dynamic versions of the omni-orb libs
> - finally I link libB to my executable. Since libB references libA in it's .liblist table the runtime linker will find everthing it needs
-> starting up the app gives me an immediate SEGV (before main() is entered)
-> purify leads me to a "zero page read" in some system call behind the call to pthread_key_create() in posix.cc/line 304.
-> the funny thing is that this does not happen if I jot down a little test-executable to simulate this. It only happens if my app is a pretty big one (in my case a ~40 MB executable).

> note that my executable does not call any code inside libA, libB or the omnilibs. The thing crashes during initialization just by linking in the libs. 

> -----------------------------------------------
> The workaround:
> - I make libA a static lib
> - I leave libB a dynamic lib and make it resolve it's symbols from the static libA and the *static* versions of the omni-libs
> - the app links against the dynamic libB
-> this works

> -----------------------------------------------
> Some analysis of what is going on (may be inaccurate in some points, I've investigated lib omnithread today for the first time ...)
> - At the very end of file omnithread.h there's the declaration of a static variable omni_thread_init of type omni_thread::init_t
> - this variable gets included into any c-file that includes this header.
> - this means that the variable eventually ends up in a number of different modules. That's fine since the variable is static to the module.
> - the variable is being initialized before the main() routine is entered. Afterall that is the whole point of this variable.
> - Initializing this variable means calling it's constructor which is defined in file posix.cc and thus comes from module posix.o in the omnithread library.
> - That c'tor is written so that it's critical parts should get executed only once (*). Calling it means initializing theomnithread-lib
> - that c'tor uses variables that are global to the file posix.cc and thus should be there when that module (posix.o) is loaded.

-> Initializing this variable will happen by whichever module is loaded first (there is no real control over the order of loading modules. hoewever you can influence it to some extend by the order of arguments on the link line). My guess is, that the 
> whole issue has something to do with the order in which the modules are loaded. And if some module that happens to include omnithread.h (and thus this init-variable) gets loaded before the omnithread-library is loaded ... can it be that the 
> initialization code that runs inside the module posix.o uses unitilialized variables from it's own module, or even variables that are not really there at that time ... "zero page read" ... ?

-> Again, I am not sure what the cause of all this really is. But it might be safer to initialize the omnithread-library in some other way. E.g. my linker (ld32 on SGI) has an option when building dynamic libraries that lets you specify a symbol 
> (function) in the shared-lib that gets executed directly after loading the lib. This way I am sure that the lib is actually loaded before any of it's methods are executed.

>  (*) as mentioned above the code for the initialization c'tor omni_thread::init_t::init_t is written such that it should be executed exactly once (other calls return immediately). While playing around with link-orders and static and dymamic libs today I 
> have seen it often times that the c'tor was actually executed more than once (when I define the DB-macro in posix.cc there's a message on cerr for each invocation of that c'tor). So the model that relys on the <count> variable does not seem to be safe. 
> It might be safer to use the pthread_once() to make sure it is called just once ...


> ;Henning



> --
> H. Henning Schmidt     <Henning.Schmidt@Philips.com>
> Philips Broadcast / Film Imaging Products
> phone: +1 (408) 617 5751
> fax:        +1 (408) 617 7713
> http://www.broadcast.philips.com/Web/FProductType.asp?lNodeId=282


-- 
Sai-Lai Lo                                   S.Lo@uk.research.att.com
AT&T Laboratories Cambridge           WWW:   http://www.uk.research.att.com 
24a Trumpington Street                Tel:   +44 1223 343000
Cambridge CB2 1QA                     Fax:   +44 1223 313542
ENGLAND