[J3] Compiler internal type tables and shared vs static libraries

Malcolm Cohen malcolm at nag-j.co.jp
Tue May 14 02:42:59 UTC 2024


Hi Tom,

 

Creating the shared library U that references M, and providing it with
libM.a, will end up with a copy of all of the bits of M that U
(transitively) references embedded into that shared library. This will
almost certainly include the type signatures for the types defined by M,
unless by mere chance, there are no references that need the type
signatures.

 

Linking the user application with M and not with U will likewise end up with
a copy of all the bits of M that the program references in the executable.
Almost certainly including the type signatures.

 

I note that if the shared library U is linked into the program, any parts of
M that U has will satisfy requirements of those from the user program, so
you never get multiple copies of the same .o fragment in the executable.

 

Using dlopen and dlsym bypasses that resolution, so now you have two copies
of the same .o fragment in the executing program, one from the executable,
the other from the shared library.

 

IF the shared library mechanism preserves "references to M from a dlopen'ed
U use the copy of M inside U", that would produce the symptoms you describe.


 

BUT IF the shared library mechanism allows overriding them dynamically (i.e.
at dlopen time rather than just at link time), then "references to M from a
dlopen'ed U use the copy of M in the executable", so the program would work
as intended.

 

So it sounds like Linux does not do dynamic overriding (though I think it
does allow overriding at link time), and that MacOS (which has a very
different shared library mechanism) does do it.

 

I think there might be a way to make this work, and that would be to build U
with a command line referencing libM.so instead of libM.a. I think
(therefore I am), that this will produce a dynamic dependency in libU.so,
and that dynamic dependency *might* be satisfied by the libM.a that the user
linked his program with, even with libU.so being brought in by dlopen. And
if the user linked with libM.so, it should be satisfied by that, too. It
might be worth investigating this if your customers want static linking of
M.

 

Cheers,

-- 

..............Malcolm Cohen, NAG Oxford/Tokyo.

 

From: J3 <j3-bounces at mailman.j3-fortran.org> On Behalf Of Clune, Thomas L.
(GSFC-6101) via J3
Sent: Monday, May 13, 2024 11:41 PM
To: Steve Lionel via J3 <j3 at mailman.j3-fortran.org>
Cc: Clune, Thomas L. (GSFC-6101) <thomas.l.clune at nasa.gov>
Subject: [J3] Compiler internal type tables and shared vs static libraries

 

This question is only tangentially about Fortran, but I'm hoping some of the
compiler-savy people on this list may take the time to explain and/or aim me
at relevant documentation.

 

I develop a Fortran library  - let's call it M to keep it simple.    I can
build it either shared or static, but one of our customers prefers to have
it static.

 

Somewhere inside of M, I make a call into a user-provided dynamically linked
library, U.     However, U is not linked to M, rather I use dlopen() and
dlsym() to access and call a user-provided procedure that conforms to a
standard interface.

The interesting twist is that U makes calls back into M.  (U is linked to
M.)

 

When M is compiled dynamically everything seems to work.  But when M is
compiled statically,  the system sometimes fails.   To be clear, when the
system fails, it fails reproducibly.  But seemingly unrelated changes to M
can result in a system that works.  When the system fails, it fails for
multiple compilers, and at least for NAG and gfortran it fails at the exact
same line in M.    Also, we've never seen a related failure under OSX - only
under Linux.

 

The failure appears to be a problem with the polymorphic part of a data
structure that is passed from M to U and then back to M.    The top level
(non-polymorphic) object is passed by a pointer stored inside of a C struct
- no copies are involved.   The same operations on the Fortran object are
fine before the call to U but break for the same call from inside of U.
The values inside the data structure appear to be correct, and the
unexpected behavior happens at a SELECT TYPE construct, where the code
"incorrectly" reaches the CLASS DEFAULT block.

 

My theory, which I can elaborate on if requested is that some internal table
of TYPEs is different in M when accessed outside of U than from inside of U.
But I have no theory as to why that would be.

 

The good news is that the obvious fix here is to just use dynamic linking
for M.  (The customer mentioned above has agreed to this change.)   But I
need to be certain that this is not actually a bug in M that is merely being
exposed by library shenanigans.    To that end I would appreciate any
insight into why this scenario should or should not work. 

 

Thanks,

 

*	Tom

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.j3-fortran.org/pipermail/j3/attachments/20240514/48df7e60/attachment.htm>


More information about the J3 mailing list