Speed up proto postprocessing phase in proto decoder
The proto decoder associates creates pointers to reference each
location/function/mapping based on their id. These ids can be
arbitrary uint64s, but often they are generated in sequence from
1 to N.
The overhead of keeping these indices in a hash is about a 20% of
the cost of decoding a profile. Speed it up by using an array to
track values from 1 to N, and a hash for values outside that range.
Add field for profile sources to indicate what is the primary
sample value, which visualizers should display by default.
Previously there was a convention for pprof to use the last
sample_index by default, this provides more flexibility.