Object file is 2.5x larger on linux than on macOS or Windows


Object file is 2.5x larger on linux than on macOS or Windows



I have a file which, when compiled to object file, has the following size:



The file (detailed below) is a representation of (a part of) a unicode table along with character properties. The encoding is utf8.



It occured to me that the problem might be that libstdc++ can't handle the file well, so I tried libc++ with clang on Gentoo, but it didn't do anything (the object file size remained the same).



Then I thought that it might be some optimization doing something odd, but once again I had no size improvements when I went from -O3 to -O0.


-O3


-O0



The file, on line 50 includes UnicodeTable.inc. The UnicodeTable.inc contains a std::array of the unicode codepoints.


UnicodeTable.inc


UnicodeTable.inc


std::array



I tried changing std::array to C style array, but again, the object file size did not change.


std::array



I have the preprocessed version of the CodePoint.cpp which can be compiled with $CC -xc++ CodePoint.i -c -o CodePoint.o. CodePoint.i contains about 40k lines of STL code and about 130k lines of unicode table.


CodePoint.cpp


$CC -xc++ CodePoint.i -c -o CodePoint.o


CodePoint.i



I tried uploading the preprocessed CodePoint.i to gists.github.com and to paste.pound-python.org, but both refused the 170k lines long file.


CodePoint.i



At this point I'm out of ideas and would greatly appreciate any help regarding finding out the source of the "bloated" object file size.





what is sizeof (RawCodePoint) on each platform?
– PaulR
Jun 27 at 13:49


sizeof (RawCodePoint)





I assume you are using the .o file as part of a shared library or executable at some point -- is that executable also 12MB larger under Linux? Even after you run strip on the executable?
– Jeremy Friesner
Jun 27 at 13:51


strip





How big are the executables? Since object files are temporary, is the size important?
– Thomas Matthews
Jun 27 at 14:11





What was the full compiler command line you used?
– PaulR
Jun 27 at 14:11






This question may help with analyzing the result further: stackoverflow.com/questions/11720340/…
– PaulR
Jun 27 at 14:15




1 Answer
1



From the output of size you linked you can see that there are 12 MB of relocations in the elf object (section .rela.dyn). If a 64 bit relocation takes 24 bytes and you have 132624 table entries with 4 pointers to strings each, this pretty much explains the 12 MB difference (132624 *4 * 24 = 12731904 ~ 12 MB ).


size


.rela.dyn



Apparently the other formats either use a more efficient relocation type or link the references directly and just relocate the whole block together with the strings as one piece of memory.



Since you are linking this to a shared library the dynamic relocations will not go away.



I am not sure if it is possible to avoid this with the code you currently use.
However, I think a unicode code point must have a maximal size. Why don't you store the code points by value in char arrays in the RawCodePoint struct? The size of each code point string should be no larger than the pointer you currently store, and the locality of reference of the table lookup may actually improve.


constexpr size_t MAX_CP_SIZE = 4; // Check if that is correct

struct RawCodePointLocal {
const std::array<char, MAX_CP_SIZE> original;
const std::array<char, MAX_CP_SIZE> normal;
const std::array<char, MAX_CP_SIZE> folded_case;
const std::array<char, MAX_CP_SIZE> swapped_case;
bool is_letter;
bool is_punctuation;
bool is_uppercase;
uint8_t break_property;
uint8_t combining_class;
};



This way you should not need relocations for the entries.





Thanks for the help! That surely pointed me in the right direction. Replacing char* with (appropriately sized) char made .rela.dyn drop to negligable size, but then MSVC took forever to compile (however, that's another issue).
– bstaletic
2 days ago


char*


char


.rela.dyn






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Comments

Popular posts from this blog

paramiko-expect timeout is happening after executing the command

Opening a url is failing in Swift

Export result set on Dbeaver to CSV