Allow weblist to work even if assembly is not available
Weblist provides source and assembly combined in a web document.
It needs access to the binary to print the assembly, but currently
refuses to generate the source if the binary can't be find.
Fall back to just generating the source if the binary isn't found.
Add source_path option to point pprof to source files
Currently pprof will look for source files only on the current directory
and its parents. This makes it hard to examine sources on jobs where
there are multiple source trees (eg from different libraries).
Add a variable to provide a search path for source files. It will default
to the cwd, so there will be no change in behavior by default.
Generate kcachegrind reports under line granularity
Callgrind reports are generated with line granularity since kcachegrind
can take advantage of the information. However, kcachegrind reports were
not being generated with that granularity, creating an unnecessary difference.
Graphs are built using Go maps which do not provide a determinist
traversal. We sort the nodes to remove the non-determinism, but
the sort fails to provide a fully deterministic ordering in cases
where the attributes being sorted have equal values. Detect those
cases and fallback to a full comparison of all fields.
The peek command allows looking at details of a function, and it
avoids any trimming to provide full details. However, when using
tagfocus it is useful to still limit to the filtered samples.
With this change peek will honor tagfocus, but it will avoid any
other trimming.
Apply -1 adjustment to leaf frames for some legacy profiles
Legacy profiles that are not based on interrupt-based sampling
only include addresses that are call stack return addresses. For
them, all callstack addresses should be adjusted by -1 to land
on top of the call instruction and permit accurate symbolization.
The legacy profilez parser handles duplicate leaf samples that are a
common artifact of satck unwinding. Apply the same technique to threadz
profiles where duplicate samples also occur.
When generating a call tree, pprof was using a map to keep track of all the
inline nodes for a location. That is incorrect as it may cause inline functions
at different nesting levels to reuse the same node, causing the resulting graph
to not be a tree.
When creating a tree nodes with the same info may appear on multiple places
in the tree. Keeping one of them preserves them all, which may cause disconnected
nodes to remain. To ensure the resulting graph is a connected tree, do not include
children on any removed node, which is suitable for the normal tree refinement
(nodecount and nodefraction) but does not allow visual refinement, which may eliminate
intermediate nodes. Disable visual mode refinement for call_tree to avoid this issue.
Add new trimproto option that generates a new profile while removing
symbol information for functions below nodefraction. This reduces the
profile sizes significantly.
Use separate mechanisms to implement graph and tree
creation, which speeds up graph creation by using a
single map from location indices to graph nodes.
pprof fetches profiles concurrently, which allows to profile multiple running
servers concurrently. However, this may translate into a large use of memory
if many profiles are merged, as pprof attempts to decode all profiles in parallel.
Limit the concurrency by chunking the concurrent fetches in groups of up to
64 profiles.
Separate implementation of graph and tree creation to speed it up.
Graph implementation maps upfront all locations to sequences of nodes,
tree implementation uses a per-parent map to keep track of a different
node per location per parent.
Profile merging spends a lot of time computing function keys, which are currently a string.
Using structs for the keys whenever possible will reduce the overhead.
Prefer nm based function names to addr2line based names.
The names produced by addr2line are often incomplete. Fix this by
falling back to nm to get function names and using the nm-provided
name if it is longer.
When reconciling samples, cleanup file paths to allow functions to be
combined if their source filenames match after cleanup (eg "src.cpp" and
"./src.cpp").
Also clean up existing uses of filepath to avoid passing empty strings
to it: filepath.Base("") returns "."
Add field for profile sources to indicate what is the primary
sample value, which visualizers should display by default.
Previously there was a convention for pprof to use the last
sample_index by default, this provides more flexibility.