Spent the day tinkering around with the rest of the inline-block
bugs. I think I have a pretty good handle on 'em. Just one or two left
to go. Fixed
35935,
34862, and
41521.
Respectively, we needed to delete all the "special" frames when doing
a ContentInserted; the relatively positioned inline code
had to do the special splitting magic; and "style changed" reflows
need to be savvy with respect to special frames. Fun fun.
Cleaned up some warnings while I was in there; tweaked some of the
HAVE_CPP_* macros to get them consistent.
Cleaned up the patch for
35935,
I needed to make sure that the anynonymous block frame picked up the
position: relative style.
Spent about an hour reviewing Ben Bucksch's crappy MIME patch. What a waste of time: rhp punted and left me holding the bag (sigh).
Spent forever (and then some) discussing rayw's ideas for versioning. Sigh. The group consensus: maybe just doing it by hand like MS COM does isn't so bad. Bitched to brendan about how much time these meetings are taking: I should bug warren tomorrow.
Tinkered around a bit with 27211 which illustrates how screwed up the block-in-inline code gets when you start trying to tinker with it via the DOM. I'm tempted to say "screw this", and say these are cases that we don't handle. It'll take me another couple of days to get this all cleaned up if it really is important. I'll bug buster tomorrow.
Not much accomplished today. Gecko meeting, marathon architecture meetings (group, then bloat, then group angst).
Started poking around bug
25963,
which is my only "dogfood+" bug. I'm pretty sure this has nothing to
do with resolving anchor tags and everything to do with that
boneheaded stuff that
pink found.
Anyway, sent a mail to simon to see if he could try it. I managed to
get a profile of a big-ass LXR page, and I'm seeing shades of the
bug
that pavlov and I were looking at a long time ago with
nsVoidArray eating all of the time it takes to lay out a
long text document.
We spend about 12% of the total time laying out
http://lxr.mozilla.org/mozilla/source/rdf/content/src/nsXULTemplateBuilder.cpp
in nsVoidArray::InsertElementAt. Half the time comes from
nsCheapVoidArray (Vidur's content model hackery to
optimize the zero- or one-child common cases), and the other half
comes from nsLineLayout, which I suspect could also
benefit from the zero- or one-child optimization.
InsertElementAt() is dominated by 8,500 calls to
memcpy() and 16,000 calls to
::operator new() and
::operator delete(). So, it's getting clobbered
copying the array around, which I believe it grows 4 slots at a time!
This makes me want to take Vidur's hack one further, and create
something that's good for zero, one, a small number, and a large
number. I think that's the btree magic I was tinkering around with at
davidmc's prompting several months ago.
Implemented nsAdaptiveVoidArray, which uses vidur's one
element hack, and then faults to a b-tree if you start allocating more
elements. I'm rebuilding layout now to see if it'll work or
not. Adapted my old nsSegmentedSupportsArray, and cleaned
it up a bit to be a bit more compact (e.g., avoided extra allocation
and packed bits). It's still a bit sloppy in some ways (e.g.,
IndexOf() just uses ElementAt() and a loop
to find the element, which'll be O(n log16n) instead
of O(n).
It seems to work, but I'll need to do some more measurement to see how wonderful it really is. warren and I talked about it a bit, and he recommended that I try it against a power-of-two in terms of bloat and speed trade-off. My initial reason for doing this was to avoid wastage risk that power-of-two entails; but, we'll see.
Spent some time last night trying to do an analysis of the situation. My gut feeling here is that using a B-tree is probably not going to buy us much: in order to keep wastage down in the small case we need to use a fairly small node size (e.g., 16). This ends up equating to about 1.2 words required for each entry. The only advantages that I can think of are:
I'm not sure how I could analyze these. The down sides are:
Anyway, I'm looking forward to hearing back from warren and brendan on the analysis.
Moved the nsAdaptiveVoidArray changes over to my
optimized tree: they shave about 4 seconds off of the 40s page load
time (10%, as expected) for a long LXR page.
Realized that I'm not being "honest" with respect to what the node overhead is: I should really count the number of words required for each heap allocation: probably something like two. That makes the ratio be (b+4)/(b-1), so for b=16, we should really be comparing to a power-of-n array with a growth rate of 1.33, instead of 1.2.
Furthermore, I'm comparing the power-of-n array growth's worst case with the B-tree's best case, which isn't fair; however, I don't have the smarts to compute what the power-of-n's overhead is in the "average case". Maybe brendan will have some ideas.
Put the wraps on this b-tree implementation and checked it in. I haven't turned it on in the build yet because I have to add it to the friggin' Mac project file. I'll do that this weekend: maybe take this sad old PowerTower Pro in to work and install the latest Mac juju on it.
Started to get in to grokking the line layout stuff a bit more. I'll
have a handle on this nsLineLayout::FindNextText()
problem in no time. I feel lucky, baby!
Hrm. I put some code in to instrument the size of the
nsTextRun arrays, and saw this peaking at about
twenty. So there is no need to make that into an
nsVoidBTree. I think I was seeing it come up on the radar
so frequently because of then IndexOf that gets slammed
when it linearly searches through the list of text runs.
Okay, I was wrong about nsTextRun being constrained. On
LXR, I really did see this go through the roof. I just
untangled some weirdness in the parser that was causing LXR to not
display right, and saw a text run of about 500 elements on a short
page!
Allright, I implemented nsStatistics last night and
gathered some information about the distribution of
nsGenericHTMLContainer's children. Turns out 72% of them
have zero or one element, 92% have less than 4 elements, and 98% have
less than eight elements. This was on a subset of the "top 100" URLs.
This makes me question the wisdom of the b-tree implementation. Certainly b=16 is way too big: this'll create 12 words of wasted space 30% of the time! Decided to pack the bits even harder so that the b-tree overhead is one word per node rather than two. This makes the best-case wastage drop to (b+1)/(b-1), and makes b=8 come out at 1.28 words per node where n>>b (as opposed to 1.42, previously).
Ok, after thinking about this a bit more, I thought I'd go off and
write a
little program
that would generate utilization data (which is the reciprocal of
overhead, I guess). I came up with the chart to the right.
The blue line is nsVoidArray, as it stands today. The
pink line is nsVoidArray implemented to increase
exponentially by a power of 1.333 after 16 elements have been
inserted. The yellow line is nsVoidBTree with
b=8. The X-axis is the number of elements in the array, and is
logarithmic. The y-axis is memory utilization.
It's pretty clear that exponential expansion of
nsVoidArray does better in terms of memory utilization
until we get up past about 1000 elements. At which point,
nsVoidBTree really only starts to marginally beat the
worst case! It's pretty darn horrible at utilization down in the 10 to
100 node range: it only really gets about 50% utilization on average.
I haven't been able to quantify the time required to do
memcpy() during the growth of the array, but considering
that the case is "so rare" (well, it's rare if you look at the top 100
sites, anyway), that I'm hard pressed to say we need a data structure
that can accomodate it.
The only argument left for doing nsVoidBTree is the
random insertion and removal argument. And this really only holds
water for the (I would claim) extremely rare case that you'd
be doing DOM manipulation on a very large content model.
Or does it? the chart to the left shows how many words are
actually wasted by nsVoidArray with exponential growth
compared to nsVoidBTree.
This shows that, even though the utilization ratio may not be good,
the number of words actually burned is relatively small. It's a
log-log graph, with element count along the x-axis, and words wasted
along the y-axis. Between 10 and 100 nodes, nsVoidBTree
is pretty consistently burning about twenty words of space; after
that, things start to sync up with the nsVoidArray.
When plotted this way, it's easy to see how nsVoidBTree
levels off at a consistent burn rate. nsVoidArray does
too, but with a much wider "spread". And more interesting, if you were
to plot the mean burn of nsVoidArray, it would be
lower than nsVoidBTree by a pretty significant
amount.
Anyway, I've send a message to people asking for some feedback on this
stuff. I think that the middle insertion and removal case would be
good for XUL (mostly because of mailnews), although I'm hard pressed
to see it being really all that useful in vanilla HTML, it sort of
seems like a wash in the other areas of comparison, so you might as
well us nsVoidBTree and get the cheap insertion.
Got changes to nsVoidBTree and nsVoidArray
checked in late last night. nsVoidArray now doubles in
size after reaching 16 elements.
Proposed breaking up librdf.so to factor XUL stuff into a
"transitional" DLL that would live over in the layout
source tree. Want to get some buy-in from everyone who'd be affected,
so I'm gonna wait a day or two for the idea to sink in.
While I was sitting in the bloat meeting today, I discovered why
<pre>-formatted text containing
<span>'s was ending up with such long
nsTextRun's. It's treating the entire block of text as a
single, long text run; i.e., it's not splitting it into lines, like
I'd have thought it would.
Spent the day with Brad Quinn from Gateway going over the leak tools n' stuff.
Got buster's block regression tests checked in. Checked in a fix for the
stack overflow
you'd get loading an XML file that contained XUL. Got rid of
a bogus property
in the TREE_PROPERTY_HACK code in
nsXULTemplateBuilder. Figured out what was wrong with
deleting from an inline-block
and sent the patch off to nisheeth for review.
Spent the rest of the day profiling the test case from 39133, which is a gargantuan table. I put the obvious stuff into the bug report, but have a way to go yet. Here's what I've uncovered so far:
- 3% of the time spent in nsHTMLTags::LookupTag(). It's all spent
grovelling through the AVL tree. This should probably be a
hashtable. I guess 4.x computed a perfect hash?
- CNavDTD::WillHandleStartTag() is using NS_ConvertToString() where it
should be using NS_LITERAL_STRING() while notifying the observer
service. Altogether, inflating the string and notifying the observer
is accounting for 1.77% of the time. Went ahead and changed this:
although we'll still have to pay for the observer service, we won't
need to inflate.
- nsBlockFrame::Reflow() accounts for 30% of the time. Of that:
- 5% is spent in nsLineLayout::AddText(). 70% of that time is spent
in ::operator new() creating nsTextRun objects. We're down in the
noise, but it may make sense to recycle these or something.
- 9% is spent nsBlockFrame::SlideLine() placing frame views. This is
dominated by nsContainerFrame::PositionChildViews(). It's a recursive
function, dominated by the cost of the recursion itself and
the cost of addref-ing and release-ing atoms. Atoms are expensive
to addref and release because it appears that they're assumed to
be threadsafe. Argh! Sheesh, why even use atoms for this? They're
tokens! Use enums!
Need to keep poking around reflow to see what's going on there. Also need to look into frame construction and content construction. Joy!
Finished writing peer reviews. Ran a leak report and filed some bugs on evaughan to fix the temple and obelisk baloney.
Mostly worked on leaks today. Figured out why the
nsIOService was leaking: re-entering the service manager
in an evil way. Spent a couple of hours with warren trying to figure
out why the FTP protocol was leaking an nsSocketTransport
and some nsPipe objects, but to no avail.
Boioioing. Turns out the nsIOService leak was from
ruslan's attempt to avoid assertions from the string bundle. So I
promptly busted his code. Oh well, I didn't like his solution anyway!
(Seriously though, I think warren and rpotts have a whole grand scheme
figured out for string mapping anyway.)
Shepherded a couple more leak fixes from Inaky Gonzales.
Started looking at av's plugin bugs. I thought that I was finished
with these block-in-inline bugs, but it turns out I really did
miss one
where the <object> can't be loaded and we've got a
block-in-inline frame replacement happening. This one could be nasty
to fix, because it's not clear that I can just punt and wipe the
containing block. I guess I really have to understand what's going on
now.
Damn. So part of the problem here is that when we do the frame
replacement, we all of a sudden make the inline that was
previously a "normal" inline into a "special" block-in-inline
beast. So to get this right, I need to pull in pieces from
WipeContainingBlock() and
ConstructInline(). Woof.
Raptor meeting. Talked with editor folks afterwards about retaining whitespace in tables; harishd is going to flip the switch to make that happen. Talked with mjudge and karnaze about performance of ender-lite inside tables; karnaze is going to profile. warren's meeting, footprint meeting.
Spent the rest of the evening tinkering with leaks with dbaron and
looking at (what I believe to be) the
last block-in-inline bug.
It's a nasty one, because it happens when we're replacing the an
<object> tag's frame with the "default"
content. I've been playing around with splitting the frame tree back
up to the nearest containing block without much luck.
Finally got the CantRenderReplacedElement()
hackery
to work when the inline is suddenly changed into a block on
Wednesday. And then a tree fell on my car and knocked out the power!
Doh!
Tinkered with
problem
with a plugin's JavaScript description not being
available on Linux, and uncovered a real rat's nest of evil code in
the plugin initialization stuff. I guess this is what av meant when he
said, "we haven't really figured out how to initialize new-style
plugins yet." Anyway, what should have been going through the
ns4xPlugin
object has just been hard-coded into this abomination that derives
from nsFileSpec. I'm not sure if we want to spend the
time to untangle it (who's writing new-style plugins, anyway?) or if
we should really fix it.
Tinkered with maubi.net to get my machine back on line again. Approved some minor bug fixes from BenB and Andreas.