Spent most of the day triaging layout bugs.
Spent the evening looking at two bugs with the XUL template
builder. The
first
has to do with crashing if you try to call rebuild()
twice on a template without switching the ref attribute
in between.
My initial impression was that we should just get rid of the
ContentSupportMap altogether; unfortunately, this causes
a 7% or so slowdown in mailnews. I did a bit of bloat-blamin', and it
turns out that the ContentSupportMap accounts for 50Kb of
bloat on a 2,000 message folder, so it's too cheap to sacrifice that
much speed for. The exercise led to a much better documented code
base, and a patch that I think is pretty clean.
(It also led to me discovering that about 360Kb of bloat comes from what appears to be the continual re-initialization of the MIME mailbox parser. I bugged alecf about it, and he's going to look into it.)
The
second
is the problem where the search engines are just duplicated over and
over. The problem here is that I botched
nsXULTemplateBuilder::RemoveGeneratedContent() when I
first converted this over to using the new stuff. I don't "special
case" the tree when the <treechildren> is built
explicitly in the XUL: when this happens,
RemoveGeneratedContent() doesn't realize that it needs to
nuke the <treeitem>s beneath the
<treechildren>.
I experimented with the "elegant" fix; that is, running the conflict set to remove the retractions; unfortunately, it gets screwed by our wondrous APIs that ensure any random access will eventually become geometric. So the "right" answer here may be to just hack the tree widget case, unfortunately. (Oh, I should also note that the "elegant" fix ain't all that elegant yet: it seems to bungle the case where you open and close a folder in the tree widget repeatedly: duplicates the content over and over.)
Managed to review about half of the IBM bi-directional text changes. I posted some initial comments to the layout newsgroup.
Got a profile of typing into the editor window, and discovered that we
were taking a huge hit in
nsXULKeyBindings::HandleKeyEvent()
because we grovel over the <keyset> in the DOM
every time to determine if a key event is an accelerator key or
whatever. Filed a
bug
to compute this once and hash it. It's a well-defined task that
someone who's coming up to speed on the codebase could tackle. I still
need to finish digging into the profile to see where the other 75-80%
of the time is going.
Talked to rjc about the search window bug and ended up deciding that the best way to handle this would be to:
We can prune traversal on subtrees that we know won't contain
generated content; e.g., anything under the
<template> element.
Spent the day looking at editor typing performance. After talking with
kin and jfrancis, we figured out that the problem was that we always
would reflow the entire block, not just the damaged lines. After
digging around a bit, I found some
bizarre code
in nsBlockFrame that was basically forcing all
the lines to be dirtied after a SplitLine()
call. #if 0-ing that out caused us to only reflow
until split frame decided that there was no more damage to
propogate. And typing into the front of a big document gets real fast.
Unfortunately, CVS is down this evening, so I can't investigate the history behind this, but I'll dig tomorrow...
So I eventually managed to get into CVS last night, and nothing really descriptive lept out at me with respect to the mysterous "dirty the next linebox" code. Furthermore, there is a block beneath it that does the same thing; however, sfraser said he gets crashes if he removes that code. Time to dig deeper, but this is promising.
Got a profile of "mark all read" for the bug that got kicked over to me from hyatt. Turns out there is horrendous fanout from mailnews: for each message that they're marking read, they are generating 20 notifications to RDF.
Finally got around to implementing the "fast but ugly" removal code for the search bug that rjc and I talked about.
Trawled through the tree to add batching APIs to
nsIRDFObserver. I'll get a patch into the
bug
tonight for review.
Got a profile of tree scrolling in the threadpane on an un-threaded newsgroup. Scrolled down two or three times with the thread pane all the way open. Here's the breakdown:
nsXULElement::EnsureContentsGenerated() accounts for 15%
of the time. From here, there are 1,274 calls to
nsXULDocument::CreateContents().
nsCSSFrameConstructor::ResolveStyleContext() accounts for
6% of the time. About half of that time ends up in
nsStyleUtil::IsSimpleXlink(). That's a total of 3%. Oof!
(It's all spent twiddling strings.) So if we could figure out a way to
not call this method, that would be good.
nsCSSFrameConstructor::ConstructFrameInternal() accounts
for 52% of the time. We've accounted for 15% + 6% = 21% of the time
above, so what about the other 31%?
Changing focus to
nsCSSFrameConstruct::ConstructFrameInternal() and
excluding those two subtrees shows the
following. nsXBLService::LoadBindings() accounts for 72%
of the total time. From here, about 36% of the total time (i.e., half
of LoadBindings() time) is spent in
nsXBLBinding::GenerateAnonymousContent(). That breaks
down as follows, with percentages given in terms of the total time:
nsXBLBinding::SetAnonymousContent().
nsXBLBinding::AllowScripts(), which does a
do_GetService(). That seems like it should be a
no-brainer to fix. (Many of the calls are from
SetAnonymousContent().
nsXBLBinding::ConstructAttributeTable() (all from
SetAnonymousContent()). This routine is recursive, and is
completely dominzted by thousands of calls to the heap. Can we create
an arena per binding for this?
nsXMLElement::CloneNode().
The next largest offender called from LoadBindings() is
nsXBLService::GetBinding(), accounting for 20% of the
total time (as before, after removing style resolution and RDF
template stuff). This looks like it introduces suck mostly because of
an enormous fan-out: 1,820 calls fan out to 21,294
GetAttribute() calls and 18,746 string compares. Ouch!
The next LoadBindings() descendant, at 7% of the total
time, is nsXBLBinding::InstallEventHandlers(). This is
another big consumer of nsXBLBinding::AllowScripts() (it
accounts for 25% of this routine's time), so fixing that will be a
win. The rest of the time goes into creating and installing event
listeners, which I'm not going to follow right now.
Started looking at the dread
scrollbar-hover leak.
After poking at it for a couple hours, the only progress I made was to
discover that the <scrollbarbutton> was being held
alive by the <scrollbar> above it (possibly among
other things). The <scrollbar> thought that it had
no children, and no parent, which seemed quite wrong at first blush.
dbaron suggested that the <scrollbar> didn't have a
parent because it was anonymous content, but I think that's not right:
it should have a parent, IIRC, but it's parent won't think
the <scrollbar> is a child. Regardless, I would
think that the <scrollbar> should know about its
<scrollbarbutton>, unless maybe all of that was
created from XBL?
Anyway, at least one of the references to the
<scrollbar> was coming from the document's
nsBindingManager. I put in a hack to nuke the binding
manager when SetScriptGlobalObject(nsnull) is called:
that killed the <scrollbar>, but didn't unroot the
<scrollbarbutton>'s script object.
I need to talk to hyatt to understand how this is all supposed to work.
Spent the entire day looking at that frigging
scrollbar-hover leak,
and finally figured it out. When you're in the
SetDocument() for the <html> element,
you've gotta walk up three frames (from the AreaFrame to
the CanvasFrame to the nsScrollPortFrame to
the nsGfxScrollFrame), and then do dbaron's fancy
nsIAnonymousContentCreator::SetDocument() call. Argh!
Anyway, dbaron has a much cleaner patch than I, so he's going to post that to the bug report.
Landed all the RDF cruft I'd had bottled up in my tree: fixed bug
46616
(search overflow) and
46964
(crash on >1 call to "rebuild"),
which both basically had to do with the content model cleanup sucking
when calling nsIXULTemplateBuilder::Rebuild(). Also
checked in the patch I'd put into a bug
17470,
which was the "mark all read is slow" bug.
Cleaned up and checked in a fix for the regression that I introduced last week when trying to optimize the content model tear-down code.
(Re)discovered a bug with whitespace that got introduced when somebody
was tinkering around with nsTextTransformer a week or two
ago. We ended up backing the change out.
Tried to re-profile mail scrolling, but without much luck. I think I'm just going to hold off on that until I get a beefier machine and a new version of Quantify.
Got my fix reviewed and checked in for
bug 44480
(the document.width bug). This ended up having
repercussions over in the commercial tree that I wasn't expecting. AIM
was relying on the incorrect value of document.height to
"lock" the scrollbar at the bottom of the screen.
Also checked in a fix for bug 47451 (newlines and carriage returns not treated as whitespace).
Started looking at
bug 3490
which has to do with margins set on inline elements. Currently, we
subtract the combined margins of the inline from the available width
on the line in nsLineLayout::ApplyLeftMargin(). This is
only correct in the case that the inline isn't split across multiple
lines.
I think that the correct way to handle this is to leave the line's available width "as is", and pass the margins down to the child frame in the reflow state (or some such). When we reach a child frame, the left-most frame will have to consider the left margin, and the right-most will have to consider the right margin. I'm not saying this well, but the idea is fairly straight-forward. This:
would be treated as if it were:
Ran this by dbaron, and he suggested that I take a look at borders to piggy-back on that mechanism. Turns out borders are broken, too! (When we split 'em across lines, anyway.) So I need to dig into this code a bit more.
Spent the afternoon poking at the
crasher
that appeared to be caused by dbaron's new anonymous content whacking
leak fix. Turned out it was due to some
bad code
that was trying to set the primary-frame-for on an image map's
<area> elements. Worked up a fix for that with
saari's help.
Helped Edward get started on some leak stuff. Got fixes checked in for
bugs
49122
(which had to do with cleaning up the frame manager after clobbering
an image with a map)
and
29641
("long text is slow"). For the latter, I just dealt with the problem
of sending the layout engine off on a mission to the moon by changing
the default value for content.notify.backoffcount to -1.
Healthy bit o' bug triaging. Talked to jar about scary new theme stuff I heard rumor of. Met with David Ascher for an hour or so.
Tracked down bug
47843,
which was that we'd hang on Linux when you put
overflow: scroll onto the <body>
element. Turns out that the pres shell was not treating the initial
reflow as being "in reflow" (i.e., the mIsReflowing
member wasn't set), so setting an attribute on a scrollbar caused us
to get into some unstable state that kept wobbling back and forth.
I closed a bug that I shouldn't have,
46846.
dbaron re-opened it; it was straight-forward to fix. In this case, we
just had to handle the href attribute on
<area> and <link> tags just like
we do on anchor tags.
Debugged a cache problem with gordon, and worked up some sanity checking code that will at least not let us crash if the cache database gets corrupted.
Fixed some random string bustage that was causing all kinds of stuff to go berzerk.
Dug in to
bug 46043
and was able to get rid of the table and <iframe>
stuff. Looks like a floater problem.
Managed to get a profile of new window last night. Dug into it a
little bit and discovered that we're still loading files when we open
a new window! We load eight of them, to be exact, via
nsXULKeyListenerImpl::LoadKeyBindingDocument(). According
to Quanitfy, this accounts for 13% of the time to open a new window;
probably more on Mac. (N.B. that it accumulates elapsed time, so this
may not be a fair measure of how much time it's really taking.) There
are only two calls to
nsXULKeyListener::GetKeyBindingDocument(), so I'm not
sure how the fanout is working here. Overlays?
Hmm, on second thought, I hit CTRL+N to open the new
window: I wonder if we're being good and lazily loading the
keybindings only at the instant we need them?
The rest of the profile looks depressingly flat when looking at it bottom-up.
Another day in the bad place, learning about floaters.
Poked at
bug 32191,
incorrect line breaking before and after the image tag. Changed the
nsImageFrame to return true for
CanContinueTextRun(). Unfortunately, this isn't enough,
because the text measurement code doesn't account for the width of the
image frame when returning its metrics.
Long discussion with jar, vidur, and warren about how to get some traction on bloat. jar suggested that the tools we had were too full of noise to be useful, recommended we get wade on the case to get the Boehm collector to "snapshot" memory in use and how it had been allocated.
I think I finally figured out what was going wrong in
bug 46043.
After spinning once to discover that it needed to redo the first line,
we'd call the nsSpaceManager to get the space on the next
line. Since we pass in a zero-width rect, the space manager decides
that it'll just hand us back a band with no trapezoids in
it. In this situation,
nsBlockBandData::ComputeAvailSpaceRect() has special-case
code that'll just set it's mAvailSpace rectangle to
(0, 0, 0, 0). The problem is, it
wouldn't clear the floater counts, so the line layout would
be led to believe that the frame was still going to be impaced by a
floater. This would make it miss the special case that would force the
frame to be placed in nsLineLayout::CanPlaceFrame(), so
we'd end up trying to "break before" over, and over, and over...
dbaron pointed me to some ideas that he an ianh had about fixing floaters in CSS. He also pointed me to this MozillaZine poll which shows performance as the number one gripe. Need to look through this later to see if there's any valuable information.
Got back around to looking at the
bloaty implementation
of LiteralImpl. Took scc's advice with respect to doing
the string allocation.
Tinkered around with
bug 47191,
which has to do with extra whitespace in the src=
attributes of an <img> tag. Got that fix checked
in.
Started looking at bug 39944, which is a problem where an image in an absolutely-positioned frame will grow or shrink of its own accord. The problem appears to be a rounding error with auto-sizing width based on height. Came up with an idea for fixing it, ran it past buster to see what he thinks.
Cleaned up the fix for 48486 (bloaty RDF literal stuff). Checked in fix for 39944 (image spontaneously resizes). Discovered 50381 (search.rdf gets corrupted).
Cobbled together a fix for 50381, but realized that there really isn't a clean way that we can "keep everything around that we'll need". The best I could do was detect that it was too late to flush the datasource, and bail.
Discovered that we're (potentially) no handling one form of the
RDF/XML syntax correctly.
posted
to www-rdf-interest to get some verification, and filed
a bug
(with a fix!) just in case.
Spent a little bit of time tinkering around with multiple substitutions in the XUL template builder. I was trying to use all of scc's fancy new string stuff, and had some complaints:
Sent him an email to bug him about it.
Spent the rest of the day tinkering around with
TraceMallocDumpAllocations(), which brendan hacked into
the window objects. Put together a
Perl script
that'll mash the output into a Quantify-like blame
hierarchy. Discovered a couple interesting gems, which I've posted to
the bug.
Tuesday, August 29, 2000
Got bug fixes checked in for
50381
(search.rdf corruption),
50548
(not handling one of the RDF/XML productions right), and
46143
(support more than one substitution in the XUL template builder).
Talked to mjudge and dbaron a bit about
GetFrameForPoint(), which is apparently sucking down time
on long, flat documents (like LXR pages). Learned why we
can't terminate the search for a frame after we've found the
first frame containing a point (because the last frame is the
one that looks like it should contain the point: it's painted
on top). bruce suggested looking in to r-trees as an
optimization to break up the space to improve search speed. Could be
cool.
Bruce gave me a couple of references to stuff on r-trees:
Started poking around at
48138,
which is a bug with an <img> tag in a floater
that's not reflowing the things it's floating over.
Helped pink debug an
AIM crasher
that ended up being sort of in the menu code. Turns out a menubar was
being created, adding itself as a document observer, and then being
superceded by another menu. It failed to unregister itself
from observeing the document, and would get blown away leaving the
document with a dangling pointer. It looks like it was being
superceded because the onload handler was firing twice,
so I may yet end up with the bug on my plate...
At the footprint meeting today, I signed up to figure out how to
collect differential data on memory usage. Got some other eyes on the
footprint report that I'd thrown together: ConflictSet
came up as an item of contention.
Spent some time looking at 48138 and realized that it is a generic problem with (right?) floaters in table cells. Put together a funky test case that illustrates that the problem has something to do with the spacing above the floated element: specifically, the number of lines that it takes before a reflow triggers depends on the number of lines of text above the floater. It does not depend on the margins (as I'd originally thought). Anyway, narrowing in on it...
Got pldhash.[h|c] added to the Mac build for brendan.
Helped harishd debug a build problem with gcc-2.7.2.3 and the parser tests.