Tuesday, August 1, 2000

Spent most of the day triaging layout bugs.

Spent the evening looking at two bugs with the XUL template builder. The first has to do with crashing if you try to call rebuild() twice on a template without switching the ref attribute in between.

My initial impression was that we should just get rid of the ContentSupportMap altogether; unfortunately, this causes a 7% or so slowdown in mailnews. I did a bit of bloat-blamin', and it turns out that the ContentSupportMap accounts for 50Kb of bloat on a 2,000 message folder, so it's too cheap to sacrifice that much speed for. The exercise led to a much better documented code base, and a patch that I think is pretty clean.

(It also led to me discovering that about 360Kb of bloat comes from what appears to be the continual re-initialization of the MIME mailbox parser. I bugged alecf about it, and he's going to look into it.)

The second is the problem where the search engines are just duplicated over and over. The problem here is that I botched nsXULTemplateBuilder::RemoveGeneratedContent() when I first converted this over to using the new stuff. I don't "special case" the tree when the <treechildren> is built explicitly in the XUL: when this happens, RemoveGeneratedContent() doesn't realize that it needs to nuke the <treeitem>s beneath the <treechildren>.

I experimented with the "elegant" fix; that is, running the conflict set to remove the retractions; unfortunately, it gets screwed by our wondrous APIs that ensure any random access will eventually become geometric. So the "right" answer here may be to just hack the tree widget case, unfortunately. (Oh, I should also note that the "elegant" fix ain't all that elegant yet: it seems to bungle the case where you open and close a folder in the tree widget repeatedly: duplicates the content over and over.)

Managed to review about half of the IBM bi-directional text changes. I posted some initial comments to the layout newsgroup.

Wednesday, August 2, 2000

Got a profile of typing into the editor window, and discovered that we were taking a huge hit in nsXULKeyBindings::HandleKeyEvent() because we grovel over the <keyset> in the DOM every time to determine if a key event is an accelerator key or whatever. Filed a bug to compute this once and hash it. It's a well-defined task that someone who's coming up to speed on the codebase could tackle. I still need to finish digging into the profile to see where the other 75-80% of the time is going.

Talked to rjc about the search window bug and ended up deciding that the best way to handle this would be to:

  1. Remove the element from the document
  2. Do a back-to-front breadth-first traversal of the tree
  3. If an element is generated, nuke it.
  4. If an element is not generated, enqueue it for traversal.

We can prune traversal on subtrees that we know won't contain generated content; e.g., anything under the <template> element.

Thursday, August 3, 2000

Spent the day looking at editor typing performance. After talking with kin and jfrancis, we figured out that the problem was that we always would reflow the entire block, not just the damaged lines. After digging around a bit, I found some bizarre code in nsBlockFrame that was basically forcing all the lines to be dirtied after a SplitLine() call. #if 0-ing that out caused us to only reflow until split frame decided that there was no more damage to propogate. And typing into the front of a big document gets real fast.

Unfortunately, CVS is down this evening, so I can't investigate the history behind this, but I'll dig tomorrow...

Friday, August 4, 2000

So I eventually managed to get into CVS last night, and nothing really descriptive lept out at me with respect to the mysterous "dirty the next linebox" code. Furthermore, there is a block beneath it that does the same thing; however, sfraser said he gets crashes if he removes that code. Time to dig deeper, but this is promising.

Got a profile of "mark all read" for the bug that got kicked over to me from hyatt. Turns out there is horrendous fanout from mailnews: for each message that they're marking read, they are generating 20 notifications to RDF.

Finally got around to implementing the "fast but ugly" removal code for the search bug that rjc and I talked about.

Trawled through the tree to add batching APIs to nsIRDFObserver. I'll get a patch into the bug tonight for review.

Saturday, August 5, 2000

Got a profile of tree scrolling in the threadpane on an un-threaded newsgroup. Scrolled down two or three times with the thread pane all the way open. Here's the breakdown:

Changing focus to nsCSSFrameConstruct::ConstructFrameInternal() and excluding those two subtrees shows the following. nsXBLService::LoadBindings() accounts for 72% of the total time. From here, about 36% of the total time (i.e., half of LoadBindings() time) is spent in nsXBLBinding::GenerateAnonymousContent(). That breaks down as follows, with percentages given in terms of the total time:

The next largest offender called from LoadBindings() is nsXBLService::GetBinding(), accounting for 20% of the total time (as before, after removing style resolution and RDF template stuff). This looks like it introduces suck mostly because of an enormous fan-out: 1,820 calls fan out to 21,294 GetAttribute() calls and 18,746 string compares. Ouch!

The next LoadBindings() descendant, at 7% of the total time, is nsXBLBinding::InstallEventHandlers(). This is another big consumer of nsXBLBinding::AllowScripts() (it accounts for 25% of this routine's time), so fixing that will be a win. The rest of the time goes into creating and installing event listeners, which I'm not going to follow right now.

Sunday, August 6, 2000

Started looking at the dread scrollbar-hover leak. After poking at it for a couple hours, the only progress I made was to discover that the <scrollbarbutton> was being held alive by the <scrollbar> above it (possibly among other things). The <scrollbar> thought that it had no children, and no parent, which seemed quite wrong at first blush.

dbaron suggested that the <scrollbar> didn't have a parent because it was anonymous content, but I think that's not right: it should have a parent, IIRC, but it's parent won't think the <scrollbar> is a child. Regardless, I would think that the <scrollbar> should know about its <scrollbarbutton>, unless maybe all of that was created from XBL?

Anyway, at least one of the references to the <scrollbar> was coming from the document's nsBindingManager. I put in a hack to nuke the binding manager when SetScriptGlobalObject(nsnull) is called: that killed the <scrollbar>, but didn't unroot the <scrollbarbutton>'s script object.

I need to talk to hyatt to understand how this is all supposed to work.

Monday, August 7, 2000

Spent the entire day looking at that frigging scrollbar-hover leak, and finally figured it out. When you're in the SetDocument() for the <html> element, you've gotta walk up three frames (from the AreaFrame to the CanvasFrame to the nsScrollPortFrame to the nsGfxScrollFrame), and then do dbaron's fancy nsIAnonymousContentCreator::SetDocument() call. Argh!

Anyway, dbaron has a much cleaner patch than I, so he's going to post that to the bug report.

Landed all the RDF cruft I'd had bottled up in my tree: fixed bug 46616 (search overflow) and 46964 (crash on >1 call to "rebuild"), which both basically had to do with the content model cleanup sucking when calling nsIXULTemplateBuilder::Rebuild(). Also checked in the patch I'd put into a bug 17470, which was the "mark all read is slow" bug.

Monday, August 14, 2000

Cleaned up and checked in a fix for the regression that I introduced last week when trying to optimize the content model tear-down code.

(Re)discovered a bug with whitespace that got introduced when somebody was tinkering around with nsTextTransformer a week or two ago. We ended up backing the change out.

Tuesday, August 15, 2000

Tried to re-profile mail scrolling, but without much luck. I think I'm just going to hold off on that until I get a beefier machine and a new version of Quantify.

Got my fix reviewed and checked in for bug 44480 (the document.width bug). This ended up having repercussions over in the commercial tree that I wasn't expecting. AIM was relying on the incorrect value of document.height to "lock" the scrollbar at the bottom of the screen.

Also checked in a fix for bug 47451 (newlines and carriage returns not treated as whitespace).

Wednesday, August 16, 2000

Started looking at bug 3490 which has to do with margins set on inline elements. Currently, we subtract the combined margins of the inline from the available width on the line in nsLineLayout::ApplyLeftMargin(). This is only correct in the case that the inline isn't split across multiple lines.

I think that the correct way to handle this is to leave the line's available width "as is", and pass the margins down to the child frame in the reflow state (or some such). When we reach a child frame, the left-most frame will have to consider the left margin, and the right-most will have to consider the right margin. I'm not saying this well, but the idea is fairly straight-forward. This:

<span style="margin: 10px"> Here is some text </span>

would be treated as if it were:

<span> <span style="margin-left: 10px">Here</span> is some <span style="margin-right: 10px">text</span> </span>

Ran this by dbaron, and he suggested that I take a look at borders to piggy-back on that mechanism. Turns out borders are broken, too! (When we split 'em across lines, anyway.) So I need to dig into this code a bit more.

Spent the afternoon poking at the crasher that appeared to be caused by dbaron's new anonymous content whacking leak fix. Turned out it was due to some bad code that was trying to set the primary-frame-for on an image map's <area> elements. Worked up a fix for that with saari's help.

Thursday, August 17, 2000

Helped Edward get started on some leak stuff. Got fixes checked in for bugs 49122 (which had to do with cleaning up the frame manager after clobbering an image with a map) and 29641 ("long text is slow"). For the latter, I just dealt with the problem of sending the layout engine off on a mission to the moon by changing the default value for content.notify.backoffcount to -1.

Healthy bit o' bug triaging. Talked to jar about scary new theme stuff I heard rumor of. Met with David Ascher for an hour or so.

Friday, August 18, 2000

Tracked down bug 47843, which was that we'd hang on Linux when you put overflow: scroll onto the <body> element. Turns out that the pres shell was not treating the initial reflow as being "in reflow" (i.e., the mIsReflowing member wasn't set), so setting an attribute on a scrollbar caused us to get into some unstable state that kept wobbling back and forth.

I closed a bug that I shouldn't have, 46846. dbaron re-opened it; it was straight-forward to fix. In this case, we just had to handle the href attribute on <area> and <link> tags just like we do on anchor tags.

Debugged a cache problem with gordon, and worked up some sanity checking code that will at least not let us crash if the cache database gets corrupted.

Sunday, August 20, 2000

Fixed some random string bustage that was causing all kinds of stuff to go berzerk.

Dug in to bug 46043 and was able to get rid of the table and <iframe> stuff. Looks like a floater problem.

Monday, August 21, 2000

Managed to get a profile of new window last night. Dug into it a little bit and discovered that we're still loading files when we open a new window! We load eight of them, to be exact, via nsXULKeyListenerImpl::LoadKeyBindingDocument(). According to Quanitfy, this accounts for 13% of the time to open a new window; probably more on Mac. (N.B. that it accumulates elapsed time, so this may not be a fair measure of how much time it's really taking.) There are only two calls to nsXULKeyListener::GetKeyBindingDocument(), so I'm not sure how the fanout is working here. Overlays?

Hmm, on second thought, I hit CTRL+N to open the new window: I wonder if we're being good and lazily loading the keybindings only at the instant we need them?

The rest of the profile looks depressingly flat when looking at it bottom-up.

Wednesday, August 23, 2000

Another day in the bad place, learning about floaters.

Poked at bug 32191, incorrect line breaking before and after the image tag. Changed the nsImageFrame to return true for CanContinueTextRun(). Unfortunately, this isn't enough, because the text measurement code doesn't account for the width of the image frame when returning its metrics.

Long discussion with jar, vidur, and warren about how to get some traction on bloat. jar suggested that the tools we had were too full of noise to be useful, recommended we get wade on the case to get the Boehm collector to "snapshot" memory in use and how it had been allocated.

Thursday, August 24, 2000

I think I finally figured out what was going wrong in bug 46043. After spinning once to discover that it needed to redo the first line, we'd call the nsSpaceManager to get the space on the next line. Since we pass in a zero-width rect, the space manager decides that it'll just hand us back a band with no trapezoids in it. In this situation, nsBlockBandData::ComputeAvailSpaceRect() has special-case code that'll just set it's mAvailSpace rectangle to (0, 0, 0, 0). The problem is, it wouldn't clear the floater counts, so the line layout would be led to believe that the frame was still going to be impaced by a floater. This would make it miss the special case that would force the frame to be placed in nsLineLayout::CanPlaceFrame(), so we'd end up trying to "break before" over, and over, and over...

dbaron pointed me to some ideas that he an ianh had about fixing floaters in CSS. He also pointed me to this MozillaZine poll which shows performance as the number one gripe. Need to look through this later to see if there's any valuable information.

Got back around to looking at the bloaty implementation of LiteralImpl. Took scc's advice with respect to doing the string allocation.

Tinkered around with bug 47191, which has to do with extra whitespace in the src= attributes of an <img> tag. Got that fix checked in.

Started looking at bug 39944, which is a problem where an image in an absolutely-positioned frame will grow or shrink of its own accord. The problem appears to be a rounding error with auto-sizing width based on height. Came up with an idea for fixing it, ran it past buster to see what he thinks.

Friday, August 25, 2000

Cleaned up the fix for 48486 (bloaty RDF literal stuff). Checked in fix for 39944 (image spontaneously resizes). Discovered 50381 (search.rdf gets corrupted).

Saturday, August 26, 2000

Cobbled together a fix for 50381, but realized that there really isn't a clean way that we can "keep everything around that we'll need". The best I could do was detect that it was too late to flush the datasource, and bail.

Discovered that we're (potentially) no handling one form of the RDF/XML syntax correctly. posted to www-rdf-interest to get some verification, and filed a bug (with a fix!) just in case.

Monday, August 28, 2000

Spent a little bit of time tinkering around with multiple substitutions in the XUL template builder. I was trying to use all of scc's fancy new string stuff, and had some complaints:

Sent him an email to bug him about it.

Spent the rest of the day tinkering around with TraceMallocDumpAllocations(), which brendan hacked into the window objects. Put together a Perl script that'll mash the output into a Quantify-like blame hierarchy. Discovered a couple interesting gems, which I've posted to the bug.

Tuesday, August 29, 2000

Got bug fixes checked in for 50381 (search.rdf corruption), 50548 (not handling one of the RDF/XML productions right), and 46143 (support more than one substitution in the XUL template builder).

Talked to mjudge and dbaron a bit about GetFrameForPoint(), which is apparently sucking down time on long, flat documents (like LXR pages). Learned why we can't terminate the search for a frame after we've found the first frame containing a point (because the last frame is the one that looks like it should contain the point: it's painted on top). bruce suggested looking in to r-trees as an optimization to break up the space to improve search speed. Could be cool.

Bruce gave me a couple of references to stuff on r-trees:

Started poking around at 48138, which is a bug with an <img> tag in a floater that's not reflowing the things it's floating over.

Helped pink debug an AIM crasher that ended up being sort of in the menu code. Turns out a menubar was being created, adding itself as a document observer, and then being superceded by another menu. It failed to unregister itself from observeing the document, and would get blown away leaving the document with a dangling pointer. It looks like it was being superceded because the onload handler was firing twice, so I may yet end up with the bug on my plate...

Wednesday, August 30, 2000

At the footprint meeting today, I signed up to figure out how to collect differential data on memory usage. Got some other eyes on the footprint report that I'd thrown together: ConflictSet came up as an item of contention.

Spent some time looking at 48138 and realized that it is a generic problem with (right?) floaters in table cells. Put together a funky test case that illustrates that the problem has something to do with the spacing above the floated element: specifically, the number of lines that it takes before a reflow triggers depends on the number of lines of text above the floater. It does not depend on the margins (as I'd originally thought). Anyway, narrowing in on it...

Got pldhash.[h|c] added to the Mac build for brendan.

Helped harishd debug a build problem with gcc-2.7.2.3 and the parser tests.