Add new trimproto option that generates a new profile while removing
symbol information for functions below nodefraction. This reduces the
profile sizes significantly.
Use separate mechanisms to implement graph and tree
creation, which speeds up graph creation by using a
single map from location indices to graph nodes.
pprof fetches profiles concurrently, which allows to profile multiple running
servers concurrently. However, this may translate into a large use of memory
if many profiles are merged, as pprof attempts to decode all profiles in parallel.
Limit the concurrency by chunking the concurrent fetches in groups of up to
64 profiles.
Separate implementation of graph and tree creation to speed it up.
Graph implementation maps upfront all locations to sequences of nodes,
tree implementation uses a per-parent map to keep track of a different
node per location per parent.
Profile merging spends a lot of time computing function keys, which are currently a string.
Using structs for the keys whenever possible will reduce the overhead.
Prefer nm based function names to addr2line based names.
The names produced by addr2line are often incomplete. Fix this by
falling back to nm to get function names and using the nm-provided
name if it is longer.
When reconciling samples, cleanup file paths to allow functions to be
combined if their source filenames match after cleanup (eg "src.cpp" and
"./src.cpp").
Also clean up existing uses of filepath to avoid passing empty strings
to it: filepath.Base("") returns "."
Add field for profile sources to indicate what is the primary
sample value, which visualizers should display by default.
Previously there was a convention for pprof to use the last
sample_index by default, this provides more flexibility.