[omniORB] compiling omniORB3 on NT -- Assertion failure

Ji-Yong D. Chung virtualcyber@erols.com
Mon, 1 Nov 1999 03:57:26 -0500


Dear omniORB developers:

    I have read Mr. Steven Brenneis's email.

    He definitely misunderstood my earlier email.  If a programmer such Mr.
Brenneis did not understand me, then, I probably did not explain my point
very well.

    I hope that you will have the patience for me to go through this long,
boring explanation -- I think this is important for NT debugging --
hopefully, this will benefit you as well as me.

    (For the purposes of this discussion, I will assume that all the
libraries are compiled in debug mode.  In release versions, the following
points do not apply).

    Before I delve into specifics, I need to clarify the fact that there is
a "local" heap associated with each DLL.  The following in an excerpt from
MSDN Library for Visual Studio 6.0 (just look for the index
_CrtIsValidHeapPointer in your MSDN library):


"The _CrtIsValidHeapPointer function is used to ensure that a specific
memory address is within the local heap. The "local" heap refers to the heap
created and managed by a particular instance of the C run-time library. If a
dynamically linked library (DLL) contains a static link to the run-time
library, then it has its own instance of the run-time heap, and therefore
its own heap, independent of the application's local heap. When _DEBUG is
not defined, calls to _CrtIsValidHeapPointer are removed during
preprocessing...."


    Microsoft ASSERT macro that calls _CrtIsValidHeapPointer is invoked in
EVERY runtime delete call.  This delete call is eventually invoked each time
a C++ destructor is called.  (By the way, _CrtIsValidHeapPointer is the
cause pg the assertion failure for the omniNames.exe and omniORB3_rtd.dll).

--------------------------------------------------------

    With that out of the way, I now explain how the assert failure related
to the _CrtIsValidHeapPointer occurs.  (This is what happens when you run
the debug versions of omniORB3_rtd.dll and omniNames.exe).

     To begin, let us assume that there are 2 modules.  One executable
module (an .exe module) calls the second module (a dll).  We will also
assume that there is a single header file (definition) shared by the two
modules.  The header file contains declaration for two functions: (1) a
memory allocation and (2) memory deallocation functions.

    Given the preceding, we can now illustrate how the one can INCORRECTLY
delete a piece of memory from the heap locally associated with the dll.

    To incorrectly delete a piece of memory, first, we write our code so
that the dll referenced above exports the definition of (1) memory
allocation to the .exe module, and (2) the .exe module contains its own (2)
definition of the delete.  To repeat, the deletion operation for the object
is never exported from the dll, but defined within the .exe module.  When
one invokes (from inside a static function of the .exe module) the memory
allocation routine, the allocator defined within the dll is invoked.  This
causes the memory allocated to be associated with dll.  Later, if one
attempts to delete the allocated memory using the memory deallocation
routine DEFINED WITHIN .exe file, the debugger will invoke the call
SSERT( _CrtIsValidHeapPointer ...) stuff.  This WILL cause the assertion to
fail for the reason that the heap is not "local" to the calling function (or
"local" to the dll).

   The correct way to do allocate/deallocate is as follows:  First, write
our code so that the dll exports BOTH the (1) memory allocation and (2)
deallocation routines.  The .exe module does not contain its own routine for
deallocating memory that is allocated through functions that are defined
within the dll.

    If one calls the memory allocation routine FROM .exe module, then, one
must deallocate the memory by invoking, whether from that .exe module or
from ANY OTHER module, the memory deallocation routine defined within the
dll..

    ------------------------------------------------------------------------
-

    When inline functions are involved, the .exe file can invoke memory
allocation functions from the dll AND invoke memory deallocation functions
defined within .exe file.  This is because the inline specifier allows the
definition to be included in BOTH modules.  Of course, the preceding
scenario will cause the memory assertion to fail.

    Unfortunately, this is exactly what is wrong with omniNames.exe and
omniORB3_rtd.dll (they contain inline functions from the same header file).

    To validate what I have summarized, one can perform the following steps:

    (1) build the omniNames.exe and omniORB3 in debug mode (omniNames.exe
and omniORB3_rtd.dll).

    (2) Link them and run it -- watch the assertion fail.  With the MSVC++
debugger, locate the point at which there was assertion failure (you will
see that it is caused by _CrtIsValidHeapPointer).

    (3) Now, take poa.h and remove inline specifiers from the definition of
the ObjectId_var, ObjectId, and ObjectId_out.  Place the definitions of any
member functions (including those of constructors and destructors) inside
poa.cc file, and ERASE the definitions from the header file.

    (4) Rebuild all the files.

    (5) Rerun it -- you will notice that the earlier ASSERTION failure has
gone away -- but now, another one will appear.

    This is easy to duplicate.

----------------------------------------------------------------------------
---------------

    As I said in my earlier email, one can simply ignore these assertion
failure, or fix one's code to remove them.  I ask the omniORB3 developers to
remove these assertion failures by removing inline function DEFINITIONS (I
do not mean the declarations)  from various header files that are built into
DIFFERENT modules, and move their definitions within proper implementation
files.  In this way, only one module will contain a definition for each
declaration.  (Watch out for especially those inline functions used to
allocate and deallocate memory -- other inline functions do not cause the
particular type of assert failure described above).

     The reason for removing the assertion failure is simple: this helps
find memory errors.  By keeping track of which module allocated/deallocated
which piece of memory, if one has memory errors, one can easily narrow down
the its source.  This was the reason why _CrtIsValidHeapPointer function had
been implemented; to help track down which dll caused the memory failure.

    Having worked with memory problems, believe me, the assertion failures
caused by the _CrtIsValidHeapPointer are useful, and it is good to keep
one's code so that the future assertions would mean something.  For omniORB,
this basically means cleaning up the code to remove all such present
assertion failures caused by inlining.

       As I said in my earlier email, the efficiency gains by inlining
memory allocation/deallocation is not that much (in terms of % of the
overall computational cost).

----------------------------------------------------------------------------
----------------

         For more info, see my previous email.

----------------------------------------------------------------------------
----------------

P.S.  By the way,  Steve, your point was that there is a single memory
space.  I was not disputing that, though.  I was not disputing your point
that the DLL's memory and the application's memory are probably mapped to a
"single" space.  I simply meant that the debugger keeps track of which
memory was allocated by which dll (to be more precise, which piece of memory
was allocated from the heap associated with linking to run-time library).


----- Original Message -----
From: Steven W. Brenneis <brennes1@rjrt.com>
To: Ji-Yong D. Chung <virtualcyber@erols.com>
Cc: <omniorb-list@uk.research.att.com>; <djr@uk.research.att.com>
Sent: Friday, October 29, 1999 2:52 PM
Subject: Re: [omniORB] compiling omniORB3 on NT -- Assertion failure


> Ji-Yong,
>
> There is a basic flaw in one of you assumptions:
>
> There is no requirement for heap allocated within a DLL to be freed
> within the same DLL.  The only assertions in the debug runtime delete
> overload are to check that the block is valid, i.e. that the no-man's
> land has the proper initialization pattern, and that the heap block size
> is valid, i.e. the heap pointer size value equals the stored heap block
> size.  I am reasonably sure that process heap is the same for the DLL
> and the process.  When a DLL is connected to a process, the operating
> system simply maps the DLL's code and global variables into the process'
> virtual address space. From this point onward, the process no longer is
> aware that it is executing DLL or local code.  You can verify this by
> using the CMemoryState MFC class and check the heap size before and
> after a DLL allocates memory.
>
> If the rule you assert was true, MFC would not function.  Many of the
> MFC classes, particularly the Frame Window and its derivations, require
> the user to invoke a static member function to acquire a pointer to one
> of these classes.  These static member functions perform the heap
> allocation within the MFC DLL.  The returned pointer may then be deleted
> at the will of the programmer.
>
> Without knowing the exact assertion failure you are getting, I can only
> guess at the problem, but CrtBlock assertions are almost invariably
> caused by pointer bounding problems or double deletion attempts.  In
> particular, the std::string class had a nasty bug in MSVC 5.0 that would
> cause this to occur on a sporadic basis.  PJ Plauger posted the fix on
> his website but I am not sure whether Microsoft incorporated it in MSVC
> 6.0.
>
> Steve Brenneis
>
> Ji-Yong D. Chung wrote:
> >
> >     Library Involved:  omniORB3_rtd.dll (dll version of orbcore library)
and
> > omniNames.exe
> >     Platform: NT4.0 (SP4) MSVC++
> >
> >     As I indicated in my previous email messages, I have built a debug
> > versions of all the executables and dll's.  When I run my debug version
of
> > omniNames.exe and omniORB3, the program dies, with an assertion failure
> >
> >     I wish this would be "fixed" -- as it would make future debugging
> > easier.  This problem, by the way, probably will NOT show up in
non-debug
> > version of omniORB3 and omniNames.  I will explain later why it is
desirable
> > to fix this, even though the problem may not show up in the release
version.
> > .
> >
> >     The assertion failure happens in omniNames::init(CORBA:: ...) which
is
> > in log.cc (see omniNames files).
> >
> > void
> > omniNameslog::init(CORBA::ORB_ptr the_orb, PortableServer::POA_ptr
the_poa)
> > {
> >     ...
> >     // stuff
> >
> >       {
> >  CORBA::Object_var ref = poa->create_reference(
> >         CosNaming::NamingContext::_PD_repoId);
> >  PortableServer::ObjectId_var refid = poa->reference_to_id(ref);
> >
> >  putCreate(refid, logf);
> >       }                                    <=== ASSERTION FAILURE #1
> >
> >     // more stuff
> >
> >   _omni_set_NameService(rootContext);
> >   delete p;                           <==== ASSERTION FAILURE #2
> >
> >     // stuff
> > }
> >
> >       The first assertion failure happens just after the execution of
> > putCreate(refid, logf).  If one looks at the code in that "area," they
are
> > locally scoped within a pair of braces.
> >
> >     Just after the execution of putCreate(refid, logf), as the execution
> > point is about to exit the locally scoped area, MSVC debugger checks if
it
> > can delete   refid   .  Before it deletes the variable, however, it
first
> > attempts to validate the wholesomeness of the memory which it is about
to
> > de-allocate.  When the debugger finds that the memory pointer is not
what it
> > should be, it generates an assertion failure.  And the program stops
> > executing.
> >
> >     The assertion failure is made within MSVC delete operator (debug
> > version).  This operator happens to check WHETHER the memory which is
about
> > to be de-allocated was originally allocated FROM the heap of the
executing
> > module (local heap, that is).  Note that this heap (for omniNames.exe)
is
> > not the same as the heap of omniORB3_rtd.dll.
> >
> >     In the above code, the allocation of memory occurs inside   poa ->
> > reference_to_id(...)
> > procedure.  This eventually calls omniORB3_rtd.dll's memory allocation
> > routines.  Therefore, the memory is from the heap of omniORB3_rtd.dll,
NOT
> > from the local heap.
> >
> >      Later, refid will be deleted as the execution point moves outside
the
> > braced area.  But the deletion operators that are invoked at this point
are
> > LOCAL to omniNames.  The debug version of these delete operators expect
the
> > de-allocating memory to be from the local heap; but it is not (it is
from
> > omniORB3_rtd.dll's heap).  Thus the assertion failure.
> >
> >     The reason why the deletion operators are local to omniNames is
simple:
> > omniNames, when it gets built, includes  poa.h.  This  file contains
inline
> > functions that allocates memory and de-allocates memory.  These inline
> > function get incorporated into omniNames as  local functions.  If the
> > execution of the above code actually called the delete operators from
> > omniORB3_rtd.dll, since the calling delete operator's local heap is the
same
> > as that of omniORB3_rtd.dll, there would not be assertion failures.
> >
> >     In other words, if one were to remove "inline" specifiers" from
> > constructors/destructors/member functions of ObjectID_var, ObjectId_out,
and
> > ObjectId_var, (all inside poa.cc) the assertion error above would go
away.
> > Of course, poa.cc now would need to contain the definition of those
> > declarations in poa.h
> >
>
> --------------------------------------------------------------------------
--
> > -------------------
> >
> >     There are other memory assertion failures that occur for the same
> > reasons described above.  To repeat, these assertions occur because of
> > inline specifiers in header files that are present in both exporting and
> > importing dll/libraries.
> >
> >     It is good idea to remove the inline specifier for these functions
and
> > to move their definitions into implementation files.  There are two good
> > reasons for doing this.  First, the efficiency gain from saving few
function
> > calls are generally are overshadowed by memory allocation/de-allocation
> > operations.  The memory allocation and de-allocation operations are
about
> > 500 times more expensive than a single machine instruction (of course, a
> > function call is also more expensive than a single instruction).  The
point
> > here is that the gain in speed is not as great as one might expect.
> >
> >     The second reason is much more important: these inlined functions
that
> > cross the DLL boundaries make debugging just pure hell --- one dll
starts
> > deleting objects and stuff from another heap, and one cannot trace the
> > source of error.  As you might expect, debugging omniORB3 may take
months
> > (or perhaps longer).  During all these debugging sessions, one really
needs
> > to be able to narrow down the potential source of problems.