Went AWOL for most of the day. At night, started tinkering around with the GC and my version of "leak soup".
Checked in improvements for the Linux RTTI heuristics for the Boehm collector.
Spent the day trying to get some reasonable semblance of output from
my Perl version of leak-soup. I fixed up the problem with
“root detection”: I was incorrectly assuming that anything without a
parent was a root. This might be wrong if you end up with a
cycle. Don't have a good algorithm for computing entrained size, so
I've kinda punted on that. Tried implementing a “leak server” so
that you can load a large data set once and the query it: this worked,
sorta, but the deluge of information was still pretty unusable. I
think I might just step back and see how far I can get with some
simple histograms and deltas.
Cleaned up and checked in scripts for processing
trace-malloc and Boehm output; specifically, one that
will infer type information from stack crawls. Interesting to note
that we see 3.4MB (or so) “data size” reported by the
/proc filesystem for gtkEmbed on startup,
and trace-malloc only finds about 1.6MB of heap
allocations! Lots of nasty static data lying around...
Got foldelf checked in to the tree. Spent some time
trying to collect trace-malloc output from a gtkEmbed run
that had loaded a couple of web pages: I'm getting stuffed by another
bogo-vtable in NSPR.
Spent a bit of time rooting around with vtables to see how much
__declspec(novtable) is buying us on Win32 (versus Linux,
where we can't do it). I converted all the DOM-generated interfaces to
use NS_NO_VTABLE, and shrunk the layout DLL by a nominal
amount, 25KB. I'd guess there's another 8KB of pure virtual vtables
that we could strip out. (It turns out that each pure virtual method
points to a function called __purecall, which'll
presumably abort the process; I just looked through the
.data segments for that vector, and counted up the number
of times I saw it.)
Undaunted, I spent some time with foldelf trying to see
if the space taken up by vtables is similar on Linux. Turns out that
it is, so stripping the pure virtual tables will shrink layout by
about 30KB. librdf.so had about 20KB of pure virtual
vtable; not sure how the other DSOs would be affected, but assuming
10KB per DSO on average, doing some magic to strip this would save
about 230KB for the twenty three “necessary” DSOs.
Another interesting thing that I discovered is that, not only does
gcc create a vtable for each of these pure abstract
classes, it creates a constructor! These ctors are each 50
bytes. There are 305 in libgklayout.so, so that makes for
about 15KB of useless code (which is dead-stripped on Win32, BTW).
So, assuming 5KB of ctors on average per DLL, ripping that out would
yield about 115KB savings for the twenty three “necessary”
DSOs, or 345KB when combined with the vtable savings.
FWIW, I spent some time digging around in gcc trying to figure out if
there was an equivalent to __declspec(novtable). Turns
out that there sort of is:
#pragma interface. This is controlled on a
file-by-file basis, rather than a class-by-class basis, and happens to
also generate references to the ctor; however, it might be possible to
massage that #pragma into a feature that'd do what we
want...
Improved
the speculative RTTI heuristics a bit more: I was having trouble
getting trace-malloc dumps after loading several ``real''
web pages. With this in hand, I was able to get data on
gtkEmbed after loading
one URL
and after loading
forty URLs,
and was able to do some
analysis
on the differences.
Mostly meetings. Managed to collect
data
for gtkEmbed live objects after 140 URLs.