Speed up proto postprocessing phase in proto decoder
The proto decoder associates creates pointers to reference each
location/function/mapping based on their id. These ids can be
arbitrary uint64s, but often they are generated in sequence from
1 to N.
The overhead of keeping these indices in a hash is about a 20% of
the cost of decoding a profile. Speed it up by using an array to
track values from 1 to N, and a hash for values outside that range.
Disambiguate names for kcachegrind under the call_tree option
When using the call_tree option and generating a graph for kcachegrind,
it will merge back nodes that are distinct on the tree, producing some
confusing results. Add a suffix so that these entries are kept separate.
This addresses the problem described in
http://yosefk.com/blog/how-profilers-lie-the-cases-of-gprof-and-kcachegrind.html ,
particularly the summary "Choosing a profiler is hard" section.
This will enable symbolization support for Go on Mac OS
It re-enables symbolization using debug/pprof/symbol on
Go profiles in the legacy format, and implements basic
mach-O support on the binutils package.
This is to keep the new TrimTree functionality from breaking any code
currently using the public interface. We do this by separating NodeSet
from nodePtrSet and creating different functions for each.
If we are deleting a node N, if the edge between N and its parent and
the edge between N and its child are both inline, then the resulting
residual edge from N's parent and the child will also be inline.
In a graph, NodeInfo maps one to one to the nodes, so it suffices to
just find the top NodeInfo s and only keep those nodes in the graph. In
a tree however, a single NodeInfo may map to many nodes. As of this
commit, a call to 'web 10' in pprof on a tree will return all the nodes
corresponding with the top 10 NodeInfo s.
Before this change, the node count used in the label is the proposed amount
provided by the user. If some nodes were trimmed and the graph ended up with
less nodes than the user asked for, the report label will now reflect this.
Allow weblist to work even if assembly is not available
Weblist provides source and assembly combined in a web document.
It needs access to the binary to print the assembly, but currently
refuses to generate the source if the binary can't be find.
Fall back to just generating the source if the binary isn't found.
Add source_path option to point pprof to source files
Currently pprof will look for source files only on the current directory
and its parents. This makes it hard to examine sources on jobs where
there are multiple source trees (eg from different libraries).
Add a variable to provide a search path for source files. It will default
to the cwd, so there will be no change in behavior by default.
Generate kcachegrind reports under line granularity
Callgrind reports are generated with line granularity since kcachegrind
can take advantage of the information. However, kcachegrind reports were
not being generated with that granularity, creating an unnecessary difference.
Graphs are built using Go maps which do not provide a determinist
traversal. We sort the nodes to remove the non-determinism, but
the sort fails to provide a fully deterministic ordering in cases
where the attributes being sorted have equal values. Detect those
cases and fallback to a full comparison of all fields.
The peek command allows looking at details of a function, and it
avoids any trimming to provide full details. However, when using
tagfocus it is useful to still limit to the filtered samples.
With this change peek will honor tagfocus, but it will avoid any
other trimming.
Apply -1 adjustment to leaf frames for some legacy profiles
Legacy profiles that are not based on interrupt-based sampling
only include addresses that are call stack return addresses. For
them, all callstack addresses should be adjusted by -1 to land
on top of the call instruction and permit accurate symbolization.