Wednesday, January 3, 2001

Another setback with footprint tracing. With work that jar and thesteve had done, I thought we'd narrowed the problem down to fragmentation. But then remembered that the data included all of the `dark matter' from trace-malloc's own bookkeeping. Brendan's going to massage trace-malloc to allocate stacks and table entries from his own `private pool', so they won't show up in the malloc() heap.

Tried the hoard allocator, and discovered that it had a similar growth rate to vanilla glibc malloc(), which leads me to believe that either (1) hoard has the same problems as glibc malloc(), or (2) the data that I gave to jar and thesteve to process was shite. I'm leaning toward the latter.

Talked to mjudge about trace-malloc on windows; he's struggling with some of jband's fancy macro-fu; asked jband to help him tomorrow if there's still trouble.

On a lark, tried using the BSD4.4 allocator that can be built as part of NSPR. I was suprised to find that it actually doesn't exhibit the `perpetual growth' problems that hoard and glibc malloc() exhibit. Kooky.

Thursday, January 4, 2001

Posted some more results from analyzing the lo-tech output from the BSD4.4 allocator last night, but looking at it again, I think I see a problem with this data. Specifically, since I computed `VM size' by subtracting the address of the highest pointer seen to date, it should monotonically increase. But it doesn't! Looks like I was screwing up the excel graphing.

Trying to get a better long-term picture of URL usage with the different allocators. I've got a long list of URLs, but have also discovered a problem with the `buster' CGI. Some of the pages have onload handlers that replace document.top, and this cuts the refresh goop off at the knees. I'm just pruning those URLs out of the file as I find them, but it's slow going.

Talked with jst about moving the XUL content model stuff into the layout DLL, and he's going to get heikki to see how much work it'd be to just completely separate the content model stuff into its own DLL.

Friday, January 5, 2001

Got a head-to-head comparison of some different allocators pulled together. Gotta try Doug Lea's allocator, which is apparently pretty good. I'm having some trouble getting it to work with Mozilla out-of-the box.

Monday, January 8, 2001

Got Doug Lea's allocator up and running. (I was having trouble because it's not threadsafe by default!) Anyway, v2.6.6 does a bit better than glibc, and 2.7.0 does a bit better than 2.6.6. (Well, a bit more than `a bit': it's process size was about 9MB smaller after 500 URLs!) Anyway, I've summarized the results and am now trying to collect data on what the live objects are (using lo-fi printf() technology).

Finally checked in a fix for bug 57026 which was a block-in-inline headache with out-of-flow frames and views.

Tuesday, January 9, 2001

Wednesday, January 10, 2001

Talked with jst and heikki briefly: they are moving ahead with content.dll, so I'll wait to land my XUL merge until they're done. Memory meeting: rickg to test SmartHeap in winEmbed; harishd to evaluate parser node arena stuff using lo-fi malloc instrumentation. dprice to collect winEmbed stats using jrgm's tests on slow machine.

Thursday, January 11, 2001

Collected some quick and dirty data on winEmbed performance on a low-end (166MHz, 64MB, Win98) machine using jgrm's stuff. Used TaskInfo2000 to look at working set size: although VM size is equivalent, our working set size is almost five times larger than IE5's.

Spent some time debugging winEmbed netwerk foo along the way: discovered a deadlock as well as some evil surprises left by ruslan.

Friday, January 12, 2001

Comparing builds with and without rfg's patch to strip pure-virtual vtables. The builds end up being slightly smaller (~1-2%), which is a bit less than I'd expected.

Running trace-malloc build to determine what objects are around after loading ~500 URLs.

Tuesday, January 16, 2001

Somebody posted about new glibc equivalents of declspec(__dll[import|export]); need to goad someone into looking into this to see if it'd help reduce the number of symbols we export.

Finally collected trace-malloc data on 100 URLs, and then 400 URLs. I'll dig into what the objects actually are tomorrow. Met with buster, karnaze, kmcclusk, attinasi to discuss excessive invalidates. Came up with some good stuff to work on, including:

Spent some time trying to profile tomshardware.com, and had a hell of a time getting Quantify to work right. Must be some crufty software that I've installed recently. Anyway, it looks like we're thrashing the heck out of the network: a ton of time is showing up in PR_ExitMonitor() and such. I wanna take a look at this from a local file to see if performance is any better.

Wednesday, January 17, 2001

Rounded up some performance data for the Lea allocator, updated the allocator page to reflect it. Looks like the Lea allocators cause a slight slowdown in page load time. Started fiddling around with jband's idea of tracing calls to produce a file that the linker can use to order functions inside a DLL. The `prototype' works, but I'm having some trouble with static ctors that I need to sort out.

To do...

Thursday, January 18, 2001

Figured out problem with tracing and static ctors (needed to save off ecx), started to merge the changes into the build system, and handed it off to dprice.

Started collecting updated gross VM growth information on Doug Lea's pre7 allocator: looks to be about the same as pre6. I'm going to update the Hoard data too, and maybe BSD if I can find it, and re-publish that stuff eventually.

Started analyzing the 100- to 400-URL trace-malloc data in earnest. Reduced the default size for an nsHashtable from 256 to 16. That, plus radha's session history limits, look to knock the VM growth rate down by 40% with Lea's pre7 allocator (from 10KB to 5.8KB). Sweet justice!

Friday, January 19, 2001

Because I couldn't resist, I cobbled together some perl hackery to pass to the Win32 linker via /ORDER. At first blush, it looks like it doesn't have much effect on the working set size, and I'm not sure why. cc'd some of the Big Guns to see if they have any thoughts.

Monday, January 22, 2001

Spent most of the day poking around at resident set size. I compared two rebased builds, and noticed only a moderate (300KB, or 3%) improvement with winEmbed, testing startup only.

One thing that I tried to do (but couldn't) was to strip off the relocations (using rebase -f). I'm curious if each DLL's static data is also being properly rebased.

Found a Pietrek article on the Win32 Portable Executable File Format, which described DLL and executable layout on Win32. I stubled across it trying to figure out what the HIGHLOW relocations are. Found another article that goes into rebasing in detail.

Spent some time debugging 64929 with Bhuvan. It's a crasher that looks to have something to do with a bizarre installation config.

Tuesday, January 23, 2001

Spent some time trying to figure out whether RTTI bloats stuff significantly. Looks like it does (about 5% across the board), but certainly not to the level that the AOL folks think it does. Which means that there's probably another bugbear waiting in the weeds.

Jody mentioned that ``Exports, and the inablity to properly control them, are also another cause of bloat from gcc.'' I wonder if the new glibc-2.2 attributes that rth mentioned would be of any use to use in controlling this problem? (I also asked Jody to elaborate...)

Talked to blizzard and brendan about this: There are (apparently) a large number of export entries in each .so's Global Offset Table. It's possible that using the private attribute (or somesuch) could eliminate this, but cls said last week that bryner and wtc were working on a post-processing tool to strip this stuff out. Need to follow up there...

* * *

Ok, wow. So I was wrong: it turns out that with gcc-2.95.2, the RTTI stuff does generate much larger .so files. Should get someone to look into that.

Talked with rpotts a bit about talking the Win32 resident set size.

Couple of ``to do'' ideas:

Spent some time hacking on a program that walks the process's page table with VirtualQueryEx(), but realize that this isn't going to tell me if a page is resident or not.

Thursday, January 25, 2001

Started reading the Solomon book that discusses NT memory management in hopes that I'll be able to figure out how to track what pages are resident. Collected trace-malloc data, did quick summary, and posted it. Put together status report that describes where we are now, and tries to estimate how far we'll get going forward.

Friday, January 26, 2001

Cached design review. Filed a few bugs based on trace-malloc data collected yesterday.

Monday, January 29, 2001

jband criticized the way I was trying to detect resident set improvements: he suggested that we wouldn't see much difference justing doing a simple startup and shutdown. So, I frittered away some time running a full-blown Mozilla build, collecting data. It might have made some difference: at one point I'd convinced myself that the resident set size was about 1.5 to 2MB lower (from just under 14MB to just over 12MB) when I ran with the ``ordered'' build. A second run to verify showed a more modest (200KB) difference.

Still no progress on determining what pages are actually in memory. I finished that chapter from Solomon, and convinced myself that what we really need to do is try to read the process's page table, and dump that out. Didn't have a chance to experiment with that today.

Gotta catch up on layout bugs so I can pick buster's brain before he bails.

Tuesday, January 30, 2001

Went through layout bugs; for categories: 1) performance, 2) inline borders/margins/padding, 3) block-in-inline, and 4) text runs through non-text leaf inline frames. Another hour-long session with buster and crew. Some back-and-forth with rfg trying to figure out why we're getting different results. Updated footprint estimate stuff with phil's feedback and forwarded to embed-eng to proof-read.

Wednesday, January 31, 2001

Found a small problem with buster's frame hint stuff: a case where floaters (or probably other absolutely positioned items) would send the nsCSSFrameConstructor::FindFrameWithContent off into the weeds.

Did a startup profile and forwarded it to the newsgroup.

Twiddling with more patches from rfg.