Fixed the problem with the KeySet not being hashed. (And
then, trying to use a stack pointer as the key. Doh!) Restored the old
version of RemoveGeneratedContent(), and pulled out the
conflict set stuff. It's just way faster than trying to remove the
content using the conflict set. So now I really have no excuse not to
figure out how to do the arena stuff.
Fixed up about as much leakage as I could shore up using Win32. A simple test case comes up and shuts down cleanly: there are still some leaks but I'm pretty sure they're not mine.
Put together an incremental update test and got it working. I'd
botched the logic in RDFGenericBuilderImpl::Propogate(),
so that's fixed now.
Re-Quantified after doing all of the performance fixup. The biggest
hotspot in the matching code seems to now be in
ConflictSet::Add() (called from the
InstantiationNode). It's dominated by new
and MatchSet::MatchSet(), which eagerly allocates a
hashtable. First, I want to instrument to see if eagerly allocating
the table is really necessary. Just looking at the raw numbers, it
seems like we're getting about 5,800 MatchSets for 2,900
calls to ConflictSet::Add(), which'd average out to only
about two elements per MatchSet. Second, I want to see if
I can pull a stunt like I did with the KeySet to collapse
allocations. I could probably pull the Binding
and the MatchSet into the hashtable's entry
allocation and avoid two trips to the allocator.
Landed the new template builder code on Monday afternoon, pretty much without event. Broke a couple of builds with my crappy C++, but managed to get them fixed. String wars all Monday night, then had a meeting yesterday afternoon to resolve problems. scc will own string now, I think.
I discovered this problem where an nsXULTemplateBuilder
is still hooked up to the RDF datasource even after the content tree
has been removed from the document. This causes assertions to fire if,
for example, we leak a content tree. But it could also happen if
somebody removes a subtree containing RDF-generated content from the
content tree, holds onto it, and then inserts it somewhere else. This
should probably be handled in nsXULElement::SetDocument()
or nsXULDocument::[Remove|Add]Subtree[From|To]Document().
Argued with the IM folks about supporting HTML4.
Started looking into supporting the extended template syntax. Got as
far as compiling the new rule syntax, cobbling together a "straight"
network, and modifying the content construction to accept a
MatchSet::Match. Still need to deal with substitution and
adding support for the bindings tag.
Mozilla meeting. Put together a list of HOWTO topics that should be written.
Ran into a bit of a snafu getting SubstituteText to
work. I tried to finesse the "come back and finish building the
children later" part by just passing in an empty
MatchSet::Match(). It turns out that this won't cut it. I
need to remember the Match that is being used to
build the content. (This is in the call to
BuildContentFromTemplate() from
CreateTemplateContents().
My first thought was that, since we're climbing up to the "member"
resource anyway, we should be able to troll through the
conflict set to find the Match. Well, it turns out that
with the extended syntax, the binding for the "member" resource could
vary from match to match.
I could keep a hashtable that matches the "resource element" to the
Match that instantiates it...maybe the conflict set is
maintaining this already?
I ended up creating a ContentSupportMap that maps a
"resource element"q to the Match that was used to create
it. More overhead: Yay! Anyway, it gets the job done, and I was able
to get it to pass the "skipping links" test. On to the
<bindings> tag...
Talked with sspitzer yesterday about how fast the
InMemoryDataSource's HasAssertion() method
was. He's stressing that pretty hard with news subscribe, so it may be
a good opportunity to go in and fix stuff up, especially with my new
PLHashTable lore.
Started cleaning out my bug list today: got a lot of the new template
stuff in and documented yesterday. I'm sort of right in the middle of
figuring out
Bug 10208,
which is the empty attribute updating problem.
Spent most of yesterday looking at
Bug 24762,
which is to optimize nsXULTemplateBuilder::IsEmpty() and
nsXULTemplateBuilder::IsContainer(). While fixing bug
10208, I folded the methods together into
CheckContainer() with out parameters for the container
and empty status.
I implemented a fixed-size arena based allocator to ameliorate the overhead of creating enumerator after enumerator while checking to determine if a resource was a container. Although a better fix would be to improve the APIs so iteration wasn't necessary, I figure this is a good thing to do anyway.
I managed to ruin the style system's performance when trying to check in code to canonicalize URIs before sending them along to history. Had to back that out yesterday. There are three history bugs that seem really similar, and I'm almost sure that they all have to do with this canonicalization problem: Bug 12439, Bug 23210, and Bug 25963.
Intel meetings today.
Started poking around with a recycler for
nsArrayEnumerator objects. Here's the deal. The picture
before any optimizations is that
nsXULTemplateBuilder::CheckContainer() accounts for about
10.5% of the time to load a 2,800 message local mailbox. Of that time,
NS_NewArrayEnumerator() accounts for 18.39% and
nsArrayEnumerator::Release() accounts for 18.05%. That's
a total of 36.44%! Although it's less significant as a real measure,
the total time that CheckContainer() takes is
288,318usec.
With a hand-rolled threadsafe recycler, I'm seeing
CheckContainer() account for only 7% of the overall time.
NS_NewArrayEnumerator() accounts for 4.81% of the time
spent in CheckContainer(), and
nsArrayEnumerator::Release() accounts for another 4.72%
of the routine's time, for a total of 9.53%. The total time that
CheckContainer() accounts for is 208,427usec.
Well crap. It turns out that my threadsafe recycler implementation isn't really threadsafe after staring at it for a bit. I thought I was so clever to atomic swap the bitmap, but I realize that I could just as likely be corrupting it. For that scheme to work, I really need atomic test-and-set.
Anyway, back to the drawing board. I tried a simple little hack just
using PR_Lock(), and with that, I see that
CheckContainer() accounts for 8.89% of the overall
time. NS_NewArrayEnumerator() accounts for 12.95% of that
time, and nsArrayEnumerator::Release() accounts for 10.14% of
the time. That's a total of 23.09% of the
time. CheckContainer() accounts for 241,572usec.
So with that in mind, it seems like it might be reasonable to do a
simple spin lock here with PR_AtomicSet() (which is
really exactly what I was trying to do with my mask swapping hackery,
anyway). It seems to be significantly faster than
PR_Lock(), on Windows NT anyway.
Bah. Doing a spin lock isn't going to do me any good, because I still
need to do an atomic set to get out of the spin lock. Two
atomic sets will make me only marginially better than using
PR_Lock(). Feh! Oh, for test-and-set! I could just write
my own...
Allright, just for shits and grins, I did that, using Window's
InterlockedCompareExchange and falling down to a lock on
other platforms. Off to Quantify...
Using InterlockedCompareExchange,
CheckContainer() accounts for 8.17% of the overall time,
coming in at 215,579usec. Of that time, 3.99% is in
nsThreadSafeRecycler::Allocate(), and 4.06% is in
nsThreadSafeRecycler::Free(). I think that's about as
best as we're going to do, I'm going to send it to scc and brendan and
see what they think.
So on to the other
fish. nsXULTemplateBuilder::SubstituteText() is
accounting for about 18% of the overall time. 20% of his time (3.6%
overall) is spent re-fetching the same damn properties from the RDF
service over and over again! There may be an opportunity for a speed
boost here by implementing the <bindings> tag,
because we only need to compute the property resource stuff once per
template. 10% of his time is spent in
nsString::EqualsWithConversion() to see if the attribute
is equal to "..." or "rdf:*". We do that elsewhere (man, it takes 2.5%
of the time overall!), so I'm going to try to inline it. For some
reason, nsLocalMessage::QueryInterface() is
really expensive.
About 25% of the time in SubstituteText() is spent in
nsMsgMessageDataSource::createMessageNode(), and of
that, 75% of the time is spent in
createMessagePriorityNode(). That's about 3.4% overall,
so it's not chump change. Ah, I see. It bottoms out in the Mork
cellar, creating cell objects. I wonder why the other calls don't do
this? Hmm. It looks like they get their data directly out of the row
somehow. Anyway, this looks like an obvious candidate for the
row-based API that davidmc and bienvenu and I talked about.
RDFContainerMemberTestNode::FilterInstantiations accounts
for about 25% of the overall time. Of that, 35% (8.75% overall) is
spent in nsMsgDatabase::CreateMsgHeader(). Oh fuck, this
is because of naive use of those abominations
nsStringHashtable and
nsISupportsHashtable. A good whacking with
PLHashTable and an arena would speed this up like
babulach I bet.
Had an idea today about <bindings>. I took a first
cut at implementing them yesterday, and put the
RDFBindingNode objects above the
InstantiationNode. They let any instantiation
through, regardless of whether it can extend the bindings or not. They
add no support to the conflict set, so removal of a binding assertion
won't cause the rule to deactivate.
Now I'm stuck trying to figure out how to update content nodes when a binding value changes. With the current scheme, this will involve writing a bunch of special-case code to handle bindings.
What if instead, we put the RDFBindingNodes
below the instantiation node? That is, the
InstantiationNode would continue to propogate bindings
down to the RDFBindingNodes below it. Now, if they go
below, then we won't know about the bindings in the rule itself, which
means that each binding node would need to itself become an
instantiation. (This is reasonable, given that what we're really
giving people is a shorthand for writing what would otherwise be
several very similar rules.)
Since this would stress the "retraction" and "unmasking" code (different rules would match when a binding was removed), we'd need to add special-case code that would notice that Rule A and Rule B both refer to the same template. That would allow us to avoid the headache of ripping out a bunch of content just to rebuild an identical copy modulo one attribute.
Oof, wait a sec, just thought of something. We'd need to end up with the factorial number of rules beneath the original instantiation to do it right! Say we have four bound attributes: we could have any combination of those four attributes set and unset, and each combination would require a path! Oh well, scratch that idea! (I'm afraid to stress the match code that much...)
Ok, so part of the problem that I need to deal with is that now you can go more than one "hop" to get to a variable's binding. That means that there is no longer a "primary resource" that we can watch for updates to a content node, but a cluster of resources. A match needs to be aware of changes to any of these resources.
Spent yesterday afternoon into the evening banging on bindings. I have it working in a sloppy kinda way now, and have XXX-commented the stuff that needs to be fixed up.
I need to remember the variable dependencies between binding objects for a couple of reasons. Well, at least one reason! It'll allow me to minimize the number of queries that I need to make to the back-end when an attribute changes. Specifically, I'll only need to query that are dependant on the value that's changing.
Allrgiht, I'm gonna try to do a bit of profiling on this new stuff to see how it looks. As a baseline, I've got numbers on my "Sent" local folder with 3,030 messages. First load takes 5.377s, subsequent load takes 3.806s. For "Trash", with 5,735 messages, first time load is 11.998s, and subsequent loads are 8.311s.
After my changes, "Sent" takes 5.467s, and subsequent loads take
3.875s. For "Trash", 12.083s and 8.502s. Probably because we need to
allocate Match objects.
So I've been thinking that I should change the simple syntax to use the bindings stuff. It'd save several trips to the RDF service to get properties, for example. However, one thing I'm concerned about is that for large mail folders, eagerly setting up all the bindings could be far more expensive than the hashtable lookups. One way to deal with this would be to make the bindings be computed lazily, on demand. The advantage would be that we continue to only do the database lookups that we need to do (just like now), but now, since we'll already have the property we need in the binding object in the rule, they'll be no need to do a lookup for each resolution.
Just did a quick profile of the post-bindings implementation. About
10% of the time is spent in TestNode::Propogate, and of
that, a whopping 70% is spent in malloc and
new. It's all coming from ConflictSet::Add,
so that's gonna be the place to start thinking about using a
fixed-size allocator, I think.
Bienvenu checked in changes to avoid using Mork cells, and it sped my 3k message folder up by 8%. Woohoo! Gonna try using a fixed size allocator for the conflict set.
Checked in all my fixed-size allocator changes, and picked up another 8% or so.
I just noticed that I still have my hacked threadsafe recycler stuff in XPCOM. I need to remove that and see how bad things get...
Poked at getting the <content tag="..."> test
working. Turns out to be a lot tougher than it looks. I need to
combine the logic from ContentIdTestNode and
ContentTagTestNode to avoid unconstrained searches on a
tag name during either propogate or constrain (depending on whether
the tag test is above or below the id test). Unfortunately, my
"simple" rule syntax depends on being able to break those two apart,
and I'm suspicious that combining them will require me to revisit
that. Worse, I think that may have serious performance problems. Grr.
Hrmph. Just realized that extended rules have to pay for the "simple" rule's structure. Need to fix that...
With respect to getting <content tag="...">
working, I think I have the following options:
ContentIdTestNode to ContentTestNode
and add support for testing the tag attribute. Leave
ContentTagTestNode intact for use as an optimization for
the "simple" syntax.
ContentIdTestNode and
ContentTagTestNode separate, and juggle the rule network
such that the tag test appears towards the bottom of the network.
Part of me really wants to just combine the id and tag tests into a
single node. The thing that scares me is that we'll end up with a
worse fan-out than we do right now. The more I think about this,
though, the more I think that fear may be bogus. In the simple rule
case, we would end up with more
RDFContainerMemberTestNodes; however, the fan-out from
each of those would only be one. So the situation would be
isomorphic to what we have today.
Okay, let's do it then. We'll choose door number one.
Feh. I've got to finish this damn template builder! I've managed to
really screw mailnews with my recent changes to turn on the
<bindings> tag. Turns out my scheme of keeping an
extra index that mapped a resource to all of the matches that it
participated in kinda bombs. Specifically, when the resource is a
container, I end up iterating through all of the container's
membership arcs to rebuild the matches. Oops. That gets to be
O(n2) pretty quick.
So, what to do, what to do?
I could maintain an index by source and property, instead of just source. That would get me directly to the match that I need.
If I were really clever, I'd figure out how to re-use the
MemoryElement map. Since "observees" are required to
notify an nsIRDFObserver of triples that they are
removing, either via OnUnassert or
OnChange, I should be able to use the memory
element map to keep things in sync.
In some ways, the second solution is less robust than the first,
because I know that there are datasources out there that rely
on the fact that they can just call OnAssert to nuke an
old value. So the second solution, while more elegant, seems to
violate the addage to "be strict with the output you produce, and
forgiving to the input you accept".
I wonder if it'd be possible to make the
RDFPropertyTestNode::Element only "depend" on the
source and property, and not the value of the
target? That is, the hash value and equivalency testing would only
examine those two properties. In which case, we could just
use the MemoryElement map.
That won't work because it would screw the multi-attributes as containment property case. Since you couldn't discern which triple was actually supporting a particular container-child pair, you'd have to examine all of them.
And I'm less sure that the "elegant" second option will work either: it can't handle the case where a brand new triple would change an attribute's value from "nothing" to, well, "something". In this case, we've got a match, because the bindings are optional, but we can't find that match because all we've got to go on is a resource and a property that's changed. Damn. I don't even think that the first solution would work in this case!
Well, one way to make it work would be to add information about
triples (or parts of triples) that the Match could
possibly be interested in. Hmm. Maybe we do add
triples to a binding table, with the value null as the
target. Then, if that value changes, we could find it.