Ensure symbolz is called after local symbolization
Even after local symbolization completes successfully we should
call remote symbolization in case there are some mapping that couldn't
be symbolized otherwise.
Add some tests to ensure both local and symbolz symbolization are called.
Reset mapping file to empty string if it was patched to be the source URL.
When the source is remote and a mapping doesn't have either build ID or
file field set, the file field is set to the source URL so that the
proper source from the source mapping is used during symbolz processing.
Before this change, the file field would continue to point to the URL
producing the URL in pprof output which confuses users.
With the change it resets the file field back to the empty string. It
also now skips the URL-like paths during local symbolization as
reading at that path is not going to succeed.
We discussed switching to generating more unique IDs. On the second
thought, I propose leaving these to be URLs as that seems unique enough
and in case this field leaks into the tool or log output seeing the URL
seems still friendlier than some arbitrarily prefixed string.
The kernel mapping record has the offset of 0xc000000000000000 on
PowerPC64 along with the same value for the mapping start. Both values
come from arch/powerpc/Kconfig. This case is not handled by any of the
conditions in the current getBase() code, so update the existing code
handling the kernel case to handle the PowerPC64 case.
When -mean is selected, currently pprof divides the sample value
by value[0], which is expected to be the number of samples. This
is intended to produce mean value per sample. These means cannot
be added. Instead, we should add the value and the number of samples
independently and perform the division at the end.
To do this we will create a separate function to get the number of samples,
and accumulate it independently from the sample value (weigth) and apply
the division after the accumulation is completed.
Rather than print options on each prompt, add a "options" command
that prints the current options in a user friendly format. Also,
make sure that those options can be parsed back as printed.
Only do Seek if there is an EOF in the perf fetcher
This is to handle the case where we have a file that is smaller than the
perf.data header, but is still (or can be converted in another way) a
valid profile.proto.
When generating callgrind format output, produce cost lines at
instruction granularity. This allows visualizers supporting the
callgrind format to display instruction-level profiling information.
We also need to provide the object file (ob=) in order for tools to find
the object file to disassemble when displaying assembly.
We opportunistically group cost lines corressponding to the same
function together, reducing the number of superfluous description lines.
Subposition compression (relative position numbering) is also used to
reduce the output size.
Speed up proto postprocessing phase in proto decoder
The proto decoder associates creates pointers to reference each
location/function/mapping based on their id. These ids can be
arbitrary uint64s, but often they are generated in sequence from
1 to N.
The overhead of keeping these indices in a hash is about a 20% of
the cost of decoding a profile. Speed it up by using an array to
track values from 1 to N, and a hash for values outside that range.
Disambiguate names for kcachegrind under the call_tree option
When using the call_tree option and generating a graph for kcachegrind,
it will merge back nodes that are distinct on the tree, producing some
confusing results. Add a suffix so that these entries are kept separate.
This addresses the problem described in
http://yosefk.com/blog/how-profilers-lie-the-cases-of-gprof-and-kcachegrind.html ,
particularly the summary "Choosing a profiler is hard" section.