Spent some time tuning the initial size of the arenas created by the
nsXULTemplateBuilder (this is accounting for about 500KB
of bloat per window). I determined that I'd wildly
over-estimated the amount of space needed: ratcheting down the initial
size from 13KB to 1KB cut the bloat down to about 70KB per window. I
turned on the PLArena instrumentation (and tweaked it a
bit) on one of my builds and will fine-tune it tomorrow.
The
two-bookmarks bug
resurfaced, this time by pressing Ctrl+D twice. Turns out
hyatt and I botched the ordering when fixing
another bug.
The bug that was filed to allow more than one value per attribute in template substitution was re-opened, because of my silly "consume the space" logic.
Tinkered around with
46134
(multiple values in a substituted attribute). I think that the caret
character ('^') would be a good character to indicate
concatenation. It's listed as an "unwise" character that must be
escaped in
RFC 2396,
so it shouldn't ever appear as a variable name or as part of a URI in
the "simple template syntax". Also fixed an error with my use of
nsPromiseSubstring (thought it was first n' last, but
it's really first n' length).
Came up with a couple of guesstimates for how big arenas should be sized for the template stuff based on some analysis that I did of the arena stats.
Ran a profile for a site related to bug
50634
("CNN locks up"), and found out that we're forcing a synchronous
reflow after every ContentInserted(). Based on this and a
message
from pink about async reflows fixing the problem, kicked over to
nisheeth.
Spent some time looking at a mail regression; at first, I thought RDF was innocent, but on a second pass through, I'm not so sure. Need to dig deeper tomorrow...
Dug around at 51813 (can't flag messages sometimes) all day. Ended up being an RDF problem after all: regression that was introduced when hyatt and I tried to hack the menu stuff.
The deal was this. Hyatt and I changed
FireNewlyMatchedRules() to remove a match from
one of the match sets: this was a truly evil thing to do in
retrospect, because the conflict set maintains at least three
match sets! Anyway, the whole thing that we were trying to do was
force a match to be "new" again when a folder or menu was
opened. Well, it turns out that we don't really need to do
that now, because by making it so that the incremental update code is
savvy towards unbuilt XUL content, we've made it so that you don't
need to worry about whether a rule was newly matched or
not. You just collect 'em all.
Found that I'd missed some of the fix for 46134. Got that checked in.
Spent the day working on finishing up the memory flusher stuff for warren:
nsIMemoryPressureObserver, and replaced with
the nsIObserver stuff. Updated current consumers.
IsLowMemory() predicate. If it
detects that it needs to flush, it marshalls an event to the UI
thread to do the work.
PLDHashTable instead of nsHashtable), but
decided the right way to go would be to fix nsHashtable
so that you can delete during enumeration.
Got RDF arena tuning changes in, as well as the fix for the mail threading regression.
Looked at the
problem
with the N6 startup page: I'm pretty sure this was caused by the
change
that I made to fix the document.width
computation. Anyway, I'm pretty sure that we're doing the right thing,
and that the HTML is wrong.
Worked up a fix for bug 50999, which is a regression that dbaron and I introduced when we added the tear-down code to deal with script objects and anonymous content.
Spent some time poking around with this problem that I'm seeing in the thread pane where a message returns itself as a thread-child (and sends us off into space).
Banged around bug 51938 (getting the frame for a window coordinate is inefficient). I think maybe keeping the line boxes sorted by their y-coordinate once we pass a threshold may be the way to go.
Checked in fix for bug 50999 after banging around with hyatt and dbaron a bit. Re-worked and landed initial memory flusher stuff. Still need to implement predicates on mac and linux, as well as hook up flushers for the ZIP cache and XUL prototype cache. Kicked bug 51938 over to rickg because he wanted to work on it.
Debugged the problem with the page cycler hanging. Ended up being a latent bug, exacerbated by nisheeth's async-reflow-during-docload changes. The problem was that we were synchronously starting the next page load while observing the previous page's completion. This was evil because observers are notified before parent doc loaders, and caused a fairly confused situation. Tried to fix using a timer, but it didn't seem to work on linux. So, I just pushed and event through the event queue.
Got a -O2 build to work on Linux, but had to
--disable-pedantic. There's something funky about
-pedantic and code generation...
Helped evaughan debug a problem with some of the image loader fixes he checked in last night.
Spent the rest of the day profiling mail scrolling. I discovered that
we suck down about one percent of the time
twiddling atoms
in nsXULContentUtils.cpp, and'll try to get that pushed
in for beta3.
I saw that about 25% of the time was spent in style resolution, of
that, 1/5th is spent in nsStyleUtils::IsSimpleXLink().
Filed
a bug
on attinasi to look into that.
About 15% of the time is spent in the
nsXULTemplateBuilder's content construction methods. A
whopping 2% of the overall time is spent grovelling to determine
whether or not the builder is actually the right one, because
we ask each builder to build content when an element realizes
that it needs the work done!
Tried to
minimize the attributes
in the thread pane template, and discovered that I hadn't really fixed
the
multiple substitutions per attribute
feature that I was trying to wedge in last week. Went the "extra mile"
and factored the logic cleanly, and made sure that we also called it
when we're constructing the initial set of bindings that the rule will
need. (I'd forgotten that we'd need to set up variables in
AddSimpleRuleBindings().)
Started looking at the problems with inline elements and margins, padding, and borders. The way it's been written works only if the inline box never needs to split: the margins, borders, and padding are all subtracted from the "available width" that's given to the child frame.
Need to look at the crap that buster put in to deal with text fields. Crawling back up to the root of the frame hierarchy for every text frame seems like it'd be slow.
With respect the margins, borders, and padding problems on inline elements, I think I have the sketch for a solution. There are two things that need to be patched:
nsTextFrame::MeasureText() needs to discount the right
margins, borders, and padding of all containing inlines when
determining if a word fits, except for the last word in a
text run.
nsLineLayout::CanPlaceFrame() needs to
discount the right margins, borders, and padding when placing a frame
unless the frame's layout status is "complete". (You can see
that a partial fix has been done: we at least examine the margin when
trying to determine if the frame can fit.)
Since most of the time, inlines will have no
margins, borders, or padding, I think that I can accomplish the first
task by doing a check only when we run out of space and think
we need to break the frame. As far as I can tell, there are three
places that we'll need to patch in MeasureText().
Haven't quite worked out how to address the second task yet...
Worked up a patch for 3490: it doesn't work all the way, but I think it's close...
Thought I'd spend some time looking in to two problems
(52726
and
53901)
that dbaron discovered with how we're setting up the
mDocument member in XUL elements. The problem is that we
set the mDocument member, and this causes the element's
script object to get rooted; unfortunately, if the element is never
added to the document tree, the corresponding
SetDocument(nsnull, ...) will never unroot the script
objects, and will lead to collosal leakage. I discovered that just
fixing these problems leads to worse problems: XBL is
actually depending on this behavior being broken the way it is.
Digging further, I discovered that HTML elements, although not leaky like XUL elements, exhibit behavior where the unrooted script objects will get collected, even though they should theoretically be reachable. The test case is to create a parent DOM element, not inserted in the document; create a child DOM element, append it to the parent; set a property on the child; force a GC; the child's script object will be collected. (It's a little bit trickier than this, because of the JS newborn slot: you actually need to nuke the newborn by creating a second dummy child element.)
Talked with brendan about it, and am going to pursue trying to do some kind of jband-like "when my refcount gets to such-and-such, I'll know that the only reference left is from JS" scheme.
Put together a patch that removes root management from
nsGenericElement::SetDocument() and makes it part of the
AddRef() and Release() implementation for
HTML elements. Unfortunately, this creates a bizarre leak that I
haven't been able to track down yet.
Back to trying to figrue out how to determine if we're at the last
unbreakable word in a text run (bug
3490).
We have a flag LL_ENDSINWHITESPACE in
nsLineLayout, and although we set it in
nsTextFrame::MeasureText(), I don't think that we ever
use it anywhere. Oops, yeah, we do: in
nsTextFrame::Reflow()!
Spent the afternoon looking at the
Win98 crash on exit.
Discovered that we're actually re-starting the memory flusher thread
after XPCOM shutdown. Oops! Fixed that, but am still seeing a bunch of
thread-safety assertions. These seem to be caused because the last
thread to exit (from WS2_32.DLL) is getting the
"privilege" of running the static destructors. In this case, a bunch
of nsCOMPtrs.
Hyatt checked in his prototype stuff for XBL. Doing some profiling of the fruits o' his labor in mailnews scrolling:
nsEventStateManager::ChangeFocus(), which corresponds
roughly to the number of times that I clicked on the scrollbar to page
down. Problem is, this accounts for like 10% of the time, so if we're
doing a bunch of extra crap (like running the command enablers and
stuff), then we could get a decent speedup by just kiling that!
nsXULTemplateBuilder::IsElementInWidget()!).
nsXMLElement::CloneNode(), which is dominated by
heavy-weight copies in nsXULElement::CloneNode(). Makes
me wonder if it'd be possible to make an RDF-savvy XUL element that
could avoid copies and delegate most of its work...
nsXBLBinding::GenerateAnonymousContent(). 41% of that
time (6% overall) is spent in failed calls to
nsXULTemplateBuilder::CreateTemplateContents(); that is,
there were no contents to create! It should be trivial for me
to
detect that I'm at the leaf
of a template and just eagerly set the contents generated flag. The
other 40% (6% overall) goes to the CloneNode() calls,
noted above.
nsStyleUtils::IsSimpleXLink().) Despite the fact that
I've also
hacked my rules
to only depend on classes, we still get an twenty-fold fanout because
we have to enumerate through about ten rules for each call, leading to
about 80,000 calls to SelectorMatches. Looks like we have
a couple of attribute-based selectors, because we fall through to
GetAttribute() about 20,000 times. The 35,000 calls to
GetTag() is, unfortunately, expensive because atoms are
thread-safe nsISupports objects. (How bad would it be if
we just made 'em be unsafe? Are they really ever used across threads?)
Anyway, it'd be interesting to know what rules are being applied here.
Got the fix for the Win98 crash-on-exit reviewed and checked in on the branch and the tip. Checked in the fixed template builder stuff on the tip so that I could finish up the attribute-ectomy on the mail thread-pane. I think I've gotten rid of as many as I could, and fixed up all the skins. We'll see if we can get this in: I did some home-brew measurements that showed this to be about a 9% improvement!
Code reviewed some stuff for hyatt, and spent some time looking at form submit problems with jar and vidur and pollman.
More scrolling profiles. This is with a tip build hacked to use the
minimzed thread-pane
and with nsStyleUtils::IsSimpleXLink()
ripped out.
CharAt() from
nsXULTemplateBuilder::ParseAttribute(), so I've filed a
bug to replace that with iterators.
nsDateTimeFormatWin::FormatPRTime(), which is dominated
by calls to GetDateFormatW() and
GetTimeFormatW(). We could probably cache this formatting
information once we've retrieved it, but...I'll file a bug later. It's
still sorta small potatoes now.
nsXULTemplateBuilder::IsElementInWidget().
nsXULElement::SetAttribute(), setting 6,364 attributes on
2,236 elements.
class attributes (because
we call UpdateClassList() as many times.
template attributes, which we could
remove if we kept an external map from content node to template
node. (Assuming we completely removed that time, it'd be about a 0.33
* 0.25 * 0.16 = 1.3% speedup.)
UpdateClassList() isn't cheap: it accounts for 8% of the
time (or about 1.2% overall). That's mostly spent in new
and NS_NewAtom(), but the code could be re-written to use
the newfangled string foo and avoid quite a bit of copying.
nsXULElement::Create() accounts for about
6% of the time, or about 1% overall. 34% of that is in
EnsureSlots(). We could theortically have a subclass of
nsXULElement with the slots "built in" so as to fuse the
allocation; this is small potatoes though...
Match::GetAssignmentFor account for 17% of
the time (3% overall). nsAssignmentSet::Add accounts for
a 1/3 of that, and is mostly spent in new. We could pick
up a percent or so arena allocating these.
CompositeDataSource::GetTarget account for
10% of the time (1.6% overall). So, basically, mail/news's RDF
interface is not part of the problem here!
nsCSSFrameCosntructor::CreateTreeWidgetContent(). I'll
drill in to that, and the way I figure, any percentage of that time
will correspond to (1 - .17) * .37 = .30 of the time.
nsCSSFrameConstructor::ResolveStyleContext() account for
24% of the time (that's 7.2% overall).
StyleSetImpl::GetContext(). That's dominated by the 224
calls to NS_NewStyleContext(). Why do we need so many?
StyleSetImpl::WalkRuleProcessors(), we
consume about 85% of the time spent in
ResolveStyleContext (maybe about 5% overall). In
WalkRuleProcessors(), the backstop rule processors
account for 61% of the time (a bit over 3% overall), and document rule
processors account for about 37% of the time (a bit under 2% overall).
ResolveStyleContext is spent
thread-safely AddRef-ing and Release-ing
atoms.
SelectorMatches is spent QI()'ing the
element to see if it's HTML Content. We could hoist that.
SelectorMatches is spent in
GetAttribute, which seems to indicate that we've got a
bunch of attribute selectors still ticking. Where are these?
SelectorMatches is spent in
GetTag, which is dominated by
PR_AtomicIncrement and PR_AtomicDecrement.
SelectorMatches, we don't need to call
IsHTMLLink if we know it's not an HTML element, right?