Had to profile with function-level (as opposed to line-level) timings. Focused on main thread, 2.5s. - malloc() and ::operator new() take 631ms (25%) and 295ms (11%) of the time, respectively. - There are ~135K calls to malloc(). JS_Malloc() and JS_ArenaAlloc() account for 50% of the time spent in malloc, but only 38K (~30%) of the calls by count. - 25K of the calls come from generic nsString stuff. Sorting by F+D time showed the following. (Stuff won't total 100%: I didn't cover everything, and some stuff subsumes other stuff.) - 39% of the time (999ms) was spent in 109 calls to nsJARChannel::OnStopRequest(). - 38% of this time (389ms) was spent in 24 calls to nsXULDocument::OnStreamComplete(), which is the XUL JS completion routine. Most (80%) of this time was spent compiling and executing scripts. - 18% of this time (189ms) was spent in 31 calls to SheetLoadData::OnStreamComplete(), which is the CSS completion routine. About half of this time is spent parsing 31 style sheets; the other half is spent in the CSSLoaderImpl::SheetComplete() method. - 16% of this time (169ms) was spent in 8 calls to nsXBLStreamListener::OnStopRequest(). I presume this is the completion routine for parsing XBL files. - 12% of this time (126ms) was spent in 19 calls to nsParser::OnStopRequest(). - 12% of this time (121ms) was spent in 107 calls to nsLoadGroup::RemoveChannel(). Nearly all this time ended up being spent in 3 calls to DocumentViewerImpl::LoadComplete(), which fires the ``onload'' event, and winds up running a bunch of JS. - 16% of the time (406ms) is spent in 3044 calls to nsComponentManagerImpl::CreateInstance(). - This is dominated by nsGenericFactory::CreateInstance(), which has a huge, and fairly flat, fanout. - Surprisingly, PR_LoadLibrary() didn't show up on the radar at all. Maybe Quantify just doesn't account for it properly. - 12% of the time (314ms) is spent in 55K (woof!) calls to PR_Unlock(). The bulk of the time appears to be attributed to 114 calls to nsThreadPool::DispatchRequest(), which comes from nsJARTransport::AsyncReadJARElement(). Seems like there might be a fair amount of contention here? - 9% of the overall time is spent in js_CompileTokenStream(). - 7% of the time is spent constructing frames (but this includes loading XBL bindings, which accounts for about half that time). - 5% of the time is spent creating 24 string bundles. - 4% of the time (111ms) is spent reading 50 RDF/XML files. (Actually, I should say, ``reading RDF/XML files 50 times'': it's not clear that 50 unique files were read.) It looks like the chrome registry accounts for most of this. - 3.5% of the time (92ms) is spent resolving style - 3% of the time is spent in 4 initial reflows: one XUL document and 3 HTML documents. (Why three? Text fields?)