Started looking over some of the block and line reflow code last night
after thinking about "how one flattens implicit recursion, in
general". Wow. Now I understand why Kipp didn't want to go and do
this. And, in looking at the code, I started to wonder if maybe some
of the bugs might be better served by really digging in and seeing
what the hell is going wrong with block and line layout. For example,
the bug dealing with the long parts list
was a ton of nested <ul> tags: maybe it's running
into the problem with list numbering?
Anyway, I had a few ideas with respect to how to proceed with the interruptable reflow stuff that seemed cool before I started really reading the code. I'll write 'em down, because maybe they'll turn out to make a bit of sense.
Reflow() method as it is.
Reflow()
with the accumulated state, and we'd degenerate to a recursive,
uninterruptable reflow.
Reflow() method into the things that are
done pre-, in-, and post-order; i.e., before the recursion and after
the recursion. The Reflow() method (which would still
need to exist, in case the frame was reflowed by a degenerate
recursion) would then just call those three methods sequentially.
So, while this all sounds fine an nice, it turns out that block and
line frame layout is recursive in a pretty indirect way; we wind our
way through about three or four different routines before we end up
back in nsBlockFrame::Reflow(). And at each level, there
is so much pre- and post-recursion cruft going on, I can see that this
may be very difficult to do interruptably.
Which brings us back to Kipp's (and Troy's) assertion that we just need to make sure that we reflow frequently enough to avoid getting into a state where we're reflowing the entire world. By which is meant, tinker with the content sink notificiations! Now, about two or three months ago, an assertion was made that we reflow too much, and Nisheeth went off and fixed this by coalescing reflows together. I wonder if the real win here is to avoid over coalescing reflows.
Okay, so this SplitToContainingBlock() stuff is kicking
my ass. Here is what I think needs to happen.
In the initial state, you've got a block frame B that contains
a bunch of inline frames eventually winding down to the
<object> frame o
A-->B-->C
|
a-->1-->a'
|
b-->2-->b'
|
o
Now the object frame o gets split into the left inlines (o), the block frames (O), and the right inlines (o').
o O o'
We call SplitToContainingBlock(), which will split
2 as follows.
b-->2 2* 2'-->b'
| | |
o O o'
Note that 2 gets split into 2* and 2', which correspond to the anonymous block and inline frames, respectively.
We recurse again to split 1.
a-->1 1* 1'-->a'
| | |
b-->2 2* 2'-->b'
| | |
o O o'
When we hit B, the recursion terminates and the frame model is fat and happy.
A-->B-->C
|
a-->1------>1*-->1'-->a'
| | |
b-->2 2* 2'-->b'
| | |
o O o'
Since B is a block, it is allowed to contain both block and inline frames, so we can let 1* and 1' be "real siblings" of 1.
So this all seems great, but there's a problem. A bad problem.
As we recurse up, I need to make it so that the "left inline frame" (e.g., 2) gives its next sibling (e.g., b' to the "right inline frame" (e.g., 2'). Unfortunately, the line boxes all assume that the number of frames in the frame list is correct, and when I twiddle with the frames directly like this, it totally screws that count. And makes us crash while reflowing.
Furthermore, there are no "noisy" frame manipulation methods that
would let me temporarily remove a frame from the frame list and wedge
it in someplace else: RemoveFrame() nukes the frame that
you want to remove!
So here's the way I'm going to try to solve the problem. I'll "defer" the transfer of the left inline frame's successors to the right inline frame by "one level" in the recursion. This will mean that when the recursion terminates at the block frame, all that will happen is that the block frame and the right inline frame will be inserted into the containing block. The "left inline frame" won't have prematurely had its successors nuked, so it's line box will still be intact. This certainly makes the logic more squirrely. But hopefully it will work. (The big question is, how much of the nested frame's line box state will be screwed if I do this...)
So a more realistic illustration of what happens would be:
In the initial state, you've got a block frame B that contains
a bunch of inline frames eventually winding down to the
<object> frame o
A-->B-->C
|
a-->1-->a'
|
b-->2-->b'
|
o
Now the object frame o gets split into the left inlines (o), the block frames (O), and the right inlines (o').
o O o'
We call SplitToContainingBlock(), which will split
2 as follows.
.--------.
/ \
b-->2 2* 2' b'
| | |
o O o'
Note that 2 gets split into 2* and 2', which correspond to the anonymous block and inline frames, respectively. Furthermore, note that 2 still refers to b' as its next sibling, and that 2' does not.
We recurse again to split 1. At this point, we'll break the link between 2 and b', and make b' be the next sibling of 2'.
.-----------.
/ \
a-->1 1* 1' a'
| | |
b-->2 2* 2'-->b'
| | |
o O o'
As before, 1 retains a' as its next sibling.
When we hit B, the recursion terminates. We now insert
1* and 1' immediately after 1, resulting in the
following frame model. This is done using the "normal" frame insertion
mechanism, nsIFrame::InsertFrames(), which properly
recomputes the line boxes.
A-->B-->C
|
a-->1------>1*-->1'-->a'
| | |
b-->2 2* 2'-->b'
| | |
o O o'
Since B is a block, it is allowed to contain both block and inline frames, so we can let 1* and 1' be "real siblings" of 1.
With this logic, everything seems to work out pretty much right. (I needed to juggle some of the code that was setting the "special sibling" stuff, and had to make sure that everything got reparented properly.)
Heh. Dammit! I tried running my test with viewer set to a really narrow window, and everything went to hell again. Grr. I think that there are problems if the inline spans multiple line boxes. Well, I guess that gives me something to poke at tomorrow.
I managed to convince myself late last night that it should be okay that the only frame we do "noisy" manipulation on is the containing block. All the nested frames contained by the block should be inline frames, so the only line boxes we need to worry about will be in the containing block.
I created a test case to experiment with wrapped inlines. It looks what I'm botching are continuation frames. Instead of splitting the first-in-flow, I'm trying to break a continuation frame. Can I just walk to the first-in-flow, and split it? Here's trying...
Well, it didn't work: it inserted the block frames directly after the first inline, and then left the continuation frames fat and happy afterwards:
before before before before before before before before block block block block block block block block block block block block block block block block block block before before before before before before before before before before before after after after after after after after after after after after after after after
So now what I'm going to do is not walk back to the first-in-flow, but try to split the continuation frame itself (like I was doing originally). Bug this time, I'll break the next-in-flow and prev-in-flow pointers. Whee!
Woof. That seemed to do it.
So I'm trying my hand at figuring out where the time goes on Linux loading a long LXR page. Thought I'd write down what I find so I can bug people about it later.
The parser's "topic" stuff does string compares to determine whether a topic matches. These seem like they should be atoms.
I found nisheeth and vidur's "magic prefs" that control how the content sink notifies the back end.
content.notify.ontimer is a boolean pref that says
whether or not we should use a timer to throttle notifications. By
default, it's true.
content.notify.backoffcount is an integer that says how
many notifications we should do before just stopping all
notifications to the document. It defaults to three. Ow!
content.notify.interface is an integer that says how
often (in "microseconds") we should notify the document. By default,
it's set to 1sec.
content.maxtextrun is presumably the maximum number of
text runs that we'll let fly by before doing a notification.
I tried tweaking these a bit; specifically, setting the "back-off count" (which should probably be called "turn-off count") to a really large number, and cranking down on the notifications to every 100ms. I managed to really bring this thing to a crawl!
From profiling on Win32, I recall that
nsLineLayout::FindNextText() is a real dog. It does a
linear walk back to the first-in-flow, and then iterates linealy
through the text runs to find a frame.
It's particularly disturbing that Linux is so much less responsive that Win32 while doing a load. If I recall correctly, Win32 would actually scroll fairly decently while loading an LXR page (until it slammed against the back-off count pouring through several thousand lines of text). Why is Linux so crappy?
So speaking of nsTextRun dying on long inline frames, I
went ahead and re-wrote it to use PLHashTable in a
righteous way so that access time to individual frames is
O(1). (I stuffed the frame pointer in as the key, and use the
value field to hold the next frame in the run.) It would be even more
righteous if I had the double-hash table, because then I'd waste a lot
less space. But, it hasn't made an appearance in NSPR yet, so -- oh
well.
Talked to mjudge about the incremental reflow problems in block layout and promised to look at them. This is killing EnderLite so bad that he might have to back stuff out.
Went through fix for bug 22037 with Nisheeth; all looked good, but I need to still check what happens for absolutely positioned frames.
Spent some time with dbaron, warren, and ruslan looking at the service manager assertions and concluded that there are some problems with the threading model. Also discussed the necko string bundle hackery and discussed a couple of alternatives to fix, including just doing away with the string bundle stuff altogether (which is what warren wants to do anyway, I guess).
I need to take a look at the bug with images with percentage widths in table cells. I don't understand the code around there enough to make a good decision on what to do.
Re-wrote my patch for the
{ib} performance bug
where nsLineLayout::FindNextText() was sucking down
time. I changed it to use brendan's new JSDHashTable
stuff. It speeds up a long LXR document by about 10%; that is, 6s
out of 58s. I'm pretty depressed that it wasn't better.
Took a look at a couple of bugs that dealt with building deeply nested
content models because of 4.x-class content that doesn't close
<font> tags; specifically,
one
that dealt with crashing, and
another
that's really just a performance bug in disguise. I almost threw up
when I read some of the bogo-computer science commentary in the
former. Especially when you look at 4.x, which load the page in under
a second. I gave up after three minutes in Mozilla. We're doomed.
Pinkerton helped me figure out why tooltips weren't working (I am an idiot: I'd left off the "hook" in the main file that would let the tooltip stuff overlay properly). So now I should be able to make progress on the tooltip bug that's assigned to me.
Communal hand wringing with warren, then the whole architecture group, over the sorry state of our application. Vidur says, "just do it"; dougt says, "ship and be happy". I think those are the right attitudes to take at this point.
Talked to brendan for a while about interruptable reflow. We rediscovered the idea that troy (and rickg) had been talking about, which was to use the "reflow command" architecture to manage interruptability. Specifically, it might be possible to stop reflow once you've reached an invariant, and then enqueue a reflow command to restart the reflow where you'd left off.
Talked with dougt for a bit about event queues and why the Unix build is so unresponsive when the event queue is flooded. I don't understand that code well enough to have really taken away much from our conversation. Need to go read it, I suppose.
Taking a look at the incremental reflow problem that mjudge was
talking about. nsIReflowCommand::BuildPath()
Last night, brendan and I were talking about using the reflow commands
to do interruptable reflow. I see a member variable
mChildFrame that seems like it could be used to maintain
the partial state...
Spent some time looking at the tooltip bug, but unfortunately, it looks like it can't be reproduced on Linux. At least, it can't be reproduced on Linux by me. I'll reboot in Win32 one of these days and try to reproduce it there.
Fix the
bookmark URL updating bug:
it turned out that I'd failed to re-implement the
OnChange() logic when I re-wrote all the template builder
stuff a couple of months ago. Oops! Anyway, the fix was easy.
Spent a bit of time trying to get jprof to run, but to no
avail. Need to figure out how to run profiles with that thing...
Meeting to go over the "plan for performance" for nsbeta3. I'm supposed to go off and figure out "where we are" for embedding. I figure I'll run viewer, go through the top 100 sites and see how much memory we use, try a couple of degenerate sites, and report back.
Looking at the <embed hidden="true">
bug.
This is not going to be all that easy. It appears to be the case that
there are plenty of platform-specific hacks floating around. I went
back through the Classic codebase and here's what I've found:
lo_FormatEmbedInternal()
by calling
FE_GetEmbedSize(),
one of the wonderful MWContext macros that ends up
dispatching through the callbacks there.
GetEmbedSize() callback pulls its
information out of the LO_EmbedStruct's
objTag.FE_Data member (objTag is an
NPEmbeddedApp struct). If this is null, then the
front-end knows that it's the first time it's creating the object, and
calls NPL_EmbedCreate(). (Otherwise, it goes through
gyrations to reconstitute the object's state from session history.)
LO_ELE_HIDDEN is not set, then
NPL_EmbedCreate()
will call FE_CreateEmbedWindow(), which is another
MWContext-dispatched callback.
I think that's as far as I need to care about:
FE_CreateEmbedWindow() is only be called if the
LO_ELE_HIDDEN attribute is not set, and the
LO_ELE_HIDDEN attribute is only set in the case of an
<applet> or <embed> tag. In
other words, if I encounter an <applet> or
<embed> tag with a hidden="true"
attribute (well, see
this
code
for the exact logic: looks like hidden set to "no",
"false", or "off"), then we'll simply punt on creating the plugin
window.
Well, I guess things are never that easy. I started poking
around nsObjectFrame.cpp and was quickly disheartened. I
also found
this gem,
which seems to indicate that on Mac, we always need to call
SetWindow() on the plugin.
Did some profiling on a page with a ton of Ender widgets: discovered that the bulk of the time was going to create native widgets for each entry field. Duly noted in the bug.
Last night, I went and ripped out nsTextRun. I spent the
better part of the day figuring out how to make it work. Turns out
that there was a bunch of code that
troy put in
to lazily fixup an overflow frame's parent pointer when constructing
new line boxes. Since the frame pointers hadn't been fixed up
by the time nsLineLayout::FindNextText() was being
called, I was trying to walk this crazy frame tree.
Troy put this code in to fix bug 5588, but using the ultra-scientific "one mississippi, two mississippi" method of profiling, I couldn't see much of a difference between an optimized build with his changes, and one without. Sent him a message to see if he can remember why he did this. (Since it was back in March, he should hopefully be able to reconstruct his thinking...)
Talked to warren for a bit about optimizing
nsXULTemplateBuilder::CheckTemplate(). We eventually
decided that the best way to do it would be to add
HasArcOut() and HasArcIn() methods. We'll do
the "end around" interface later, to make bookmarks and drag & drop
faster.
Some prose I wrote to rationalize nsTextRun-ectomy. Might
need it to fight PDT...
nsTextRun is a datastructure that stores the frame pointers to all of the primary text frames in a block. It is an implementation detail of nsLineLayout::FindNextText(), which is called by the line-breaking code to find the next text frame in the block while measuring the text to fit in a line box. The problem is that we are spending an inordinate amount of time keeping the nsTextRun data structure up-to-date. We frequently need to update it during incremental reflow, and the datastructure can grow to be extremely large on long text documents. Furthermore, FindNextText()'s implementation ends up degenerating to a linear crawl through the array! I propose (in bug 19051) that we eliminate this altogether, and replace nsLineLayout::FindNextText()'s implementation with one that directly crawl's a block's frame hierarchy. There are patches attached to the bug that do just this. This changes FindNextText() from an O(n) operation (where n is the number of text frames in the block) to a worst-case O(log n) operation, with the base of the logarithm being the fan-out in the block's frame subtree. Typically, it so close to O(1) as to be indistinguishable. Furthermore, it eliminates all the code that was used to maintain the data structure.
Did a bit of performance analysis, comparing the new code and the old code. Sent some email back and forth with Troy. I'm seeing about a 10-20% slowdown in the new code on documents that end up pushing to overflow lines, so after I verify that my eager parent-setting is the cause, I'm going to see if I can't get both lazy parent setting and tree grovelling.
Got the lazy parent setting stuff working last night: it was pretty
simple after all. I just needed to ensure that as we traversed
"across" frames, we made sure to set any frames further along in the
child list to have the same parent as those to their left. Protects
the assumption that you enter
nsLineLayout::FindNextText() with a node that has proper
heritage.
Went round and round a couple times with edburns on a
bug
that is supposed to use the alt attribute on an applet to
display the text if there is no alternate content between the
applet tags. Turns out that it doesn't really
fix the bug: there are deeper problems in the object frame that cause
the "plugin downloader plugin" to display.
Spent some time banging around with may patch to fix the
<embed hidden="true">
bug.
Found out that I was being an idiot, and that plugins are
working on Win32. Just some JavaScript sniffing stuff appears to be
broken. I'll try to wrap these bugs up this weekend.
For bug 31933,
nsBlockFrame::Reflow(). 20%
of that time is spent creating and destroying
nsSpaceManager objects. We allocate one for each reflow
that we do! (The method is call 70,000 times, and so are
::operator new() and nsCOMPtr::~nsCOMPtr().)
nsBlockFrame::PrepareInitialReflow(), and teh other ends
up calling nsBlockFrame::PrepareResizeReflow().
nsBlockFrame::DoReflowInlineFramesAuto() accounts for 10%
of the overall time. (cf. first point above: getting rid of the heap
allocation for nsSpaceManager would nearly halve block reflow time in
this case.)
nsTextFrame::Reflow() accounts for 5% of the overall
time. That time is split (about 75/25) between
nsTextFrame::MeasureText() and
nsTextFrame::TextStyle::TextStyle().
Spent most of last week getting plug-in stuff wrapped up for
nsbeta2. Oh, my head. Applying dbaron's GetLinkState()
changes to my tree. Tried to do it with scc's L-prefix
turned on for NS_LITERAL_STRING(), but my build barfed
all over. I'll leave fixing that to him: looks like
nsString and friends need some love.
Spent the day looking at bug 46013, which had to do with not being able to use any of the chrome registry the first time you ever start the browser. Ends up this is caused by RDF needing to initialize the layout DLL to create a namespace manager. I started by moving the namespace manager code into the parser, but ended up deciding that the way to go was to do one-off namespace management. Turns out that the namepsace manager APIs are a bad match for the way RDF uses them anyway.
Profiled "new window creation" after applying patches for a bunch of
the content baloney that hyatt and I worked out on Monday. We spend
half of the time (!) calling OLE's RegisterDragDrop()
method six times. Oof. Not sure why we need to do it six times (or
even at all). Of course this may be a "lie", especially if the
funtion's wall clock time was being measured instead of its actual CPU
time. Pinged pink about it; looks like rods wrote most of this
code. I'll bug him about it...
Wonderful orgy of tree dumping as nsbeta2 goes off on a branch today. Ahh. That felt good.
Bug 46013,
redux. This bug got re-opened today :-(. Turns out that there was also
a refcounting problem that was causing this bug to manifest itself in
a different and wonderful way. Changed the composite datasource from
using an nsISupportsArray to a nsVoidArray
so that it wouldn't addref & release while notifying. Cleaned up
refcounting and ownership over in the
nsChromeUIDataSource.
My nsTextRun evisceration caused an
editor regression
because I'd failed to bring over the logic that broke a text run at a
non-inline frame. I put back a method on nsIFrame called
CanContinueTextRun(), which asks a frame if a text run
can be continued through it. Everyone except text, first-letter, and
inline frames answer "no".
Moved nsFixedSizeAllocator out of RDF and into
xpcom/ds for use by harishd, alecf, and whoever else
feels the urge.
Bug triage fun with jud, warren, and vidur.
Looked into a problem with the XUL template builder where calling
rebuild() twice causes a crash. Turns out that we clobber
the conflict set, leaving the content support map with dangling
pointers into it. I'm not sure why mail/news doesn't crash all over
when switching folders. I put together a patch, but am not really
happy with it (for a couple of reasons). First, I'd like to understand
better why mail/news doesn't crash. Second, I'd like for the
code that just changes the ref attribute on the root to
also dump the conflict set. Third, I think that it may make sense to
move the content support map into the conflict set: I'm
skeptical that these really need to be separate objects.
Spent the evening scratching my head over this problem with the
template builder. I've pretty much convinced myself that the
ContentSupportMap is not necessary, and have gone so far
as to yank it out completely (along with commenting a good porition of
the code, which I had a hard time remembering...)
Anyway, as far as I can tell, the only reason that I'd made the
ContentSupportMap was so that I could pick up where I'd
left off in CreateTemplateContents(). Specifically, I
needed to record the Match for which we'd left off
building content.
The thing is, we can get back to this match using the
ClusterKey, constructed from the member and container
variables (which we can fetch by walking up the content model), and
then selecting the Match with the highest priority from
that set of matches. Seems to work, doubt it's much slower or faster,
but it uses less memory. Only problem is that the friggin' bookmarks
window (not the sidebar panel; just the window) doesn't display! So I
need to debug that. I'm also not convinced that it's pulling all the
right rules out. Need to regression test a bit more carefully.