profile: fix legacy format Go heap profile parsing (#382)
This CL fixes a long-lasting bug that prevented pprof from recognizing
Legacy heap profile produced by Go. Go reports four types of samples
at once so the profile includes alloc_objects, alloc_space,
inuse_objects, and inuse_space. The bug caused pprof to misclassify Go
heap profile data and prevent selection of correct filtering/pruning
patterns in analysis.
Update golang/go#25096
Tested with the profile samples included in golang.org/issues/25096
(pprof hides the runtime functions with this change)
profile.parseGoCount: accept non-space characters in profile type.
In addition to the "goroutine" and "threadcreate" profiles,
Go code can generate custom profiles using the runtime/pprof package.
The user must name these profiles, and the docs recommend using the
convention "import/path" to avoid namespace conflicts. This CL
updates the pprof tool to be able to parse legacy profiles whose types
contain slashes and other non-space characters.
This is the upstream fix for https://github.com/golang/go/issues/13195.
This change will need to be mirrored to
github.com/golang/go/src/cmd/pprof/internal/profile/legacy_profile.go
A previous commit attempted to handler mappings produces using the glog
package by matching the column on which the sentinel was found.
However, some tools generate the mapping using a single log entry,
where the prefix appears only on the first line.
Instead of matching by column, use a regexp to identify and match
prefixes introduced by the glog package.
If the memory map is generated by logging routines such as glog, they
may include some initial text which confuses the parsing of legacy
mappings. The initial text is the same for all mapping entries,
so detect it on the proc map sentinel and remove it from the mapping
entries.
Parse correctly mappings where the binary has been deleted
Entry mappings of the form:
" 02e00000-02e8a000: /foo/bin (deleted)"
are currently resulting in "(deleted)" being set as the build id.
Enforce the build id to be hex, and allow unrecognized junk to
be at the end of the mapping.
Improve regexp matching of mappings for legacy formats
Simplify regexps and define a recommended format:
Start End object file name offset(optional) linker build id
0x40000-0x80000 /path/to/binary (@FF00) abc123456
Also include some additional tests and move existing tests to legacy_profile_test.go,
closer to the code in legacy_profile.go.
Do not hide allocation frames under call32/call64.
See https://github.com/google/pprof/issues/54. Frames under
call32/call64 may be user code frames so should be shown to avoid
confusing re-attribution of allocations to the calling system frames.
A previous commit tried to eliminate a panic when reading an empty
profile by stopping support for profiles with no samples. That is
unnecessary and it can obstruct some testing.
The panic was caused by the legacy parser returning a nil profile and
no error when processing an empty profile. Detect that situation and
generate a proper profile instead.
Tested by running 'pprof /dev/null'
Add testcase for profile parsing errors
Apply -1 adjustment to leaf frames for some legacy profiles
Legacy profiles that are not based on interrupt-based sampling
only include addresses that are call stack return addresses. For
them, all callstack addresses should be adjusted by -1 to land
on top of the call instruction and permit accurate symbolization.
The legacy profilez parser handles duplicate leaf samples that are a
common artifact of satck unwinding. Apply the same technique to threadz
profiles where duplicate samples also occur.