 |
» |
|
|
|
NAMEcaliper — measure, report, and analyze program performance data SYNOPSIScaliper
measurement
[collect_options]
[report_options]
program
[program_args] caliper
measurement
[collect_options]
[report_options]
pid
[pid]... caliper
measurement
[collect_options]
[report_options]
-w caliper
{report|merge|diff}
[report_options]
[database]... caliper advise
[advise_options]
[database]... caliper info
[info_options] caliper -g
[--jre
path_to_java] caliper
{ -h | -H } caliper -v RemarksThis command requires installation of the optional
HP Caliper software, which is not included in the
standard operating system. DESCRIPTIONThe
caliper
command measures, reports on, and analyzes the performance of native
Itanium(R)
and
Itanium 2 programs. Quick StartThis
Quick Start
section provides examples of common command invocations
and frequently used options. Note that additional quick start
help is available from
HP Caliper's Quick Start Reference
card located at:
caliper_root/doc/quick_start.pdf and also downloadable from:
http://www.hp.com/go/caliper. To make a sampled call-graph measurement and report results:
To measure and report sampled data cache misses:
To find a process's (for example, pid 1234) hottest functions over a 60-second period:
To find the hottest functions across the entire system (for both kernel
and processes) over a 120-second period:
caliper fprof -o output_file -w -e120 To re-report the last measurement run with source and assembly code:
To analyze the latest performance runs and provide advice:
To see the difference between two collection runs of the same application:
caliper diff database1 database2 To start the caliper GUI to interactively measure and explore performance data:
Common Measurements- cpu
cpu execution statistics - cstack
sampled call stack profile - dcache/icache
cache miss profiles - ecount
total cpu event counts - fprof
flat execution profile - scgprof
sampled call graph profile
Common OptionsSave measurement results to named database:
Set length of measurement (in seconds):
Specify which cpu metric sets to measure:
-m EVENT_SET[:all|:user|:kernel][,EVENT_SET]... overview|cpi
|fp|l1dcache|l1icache
|l2cache|l3cache|stall
|threadswitch
Specify which cpu events to measure:
-m CPU_EVENT[:EVENT_PARAM]...[,CPU_EVENT]...
Save textual performance report to file:
Selectively measure application processes:
-p {all|root|root-forks|PATTERN[,PATTERN]...}
Control reporting details:
Control the sampling specification for profiles:
-s PERIOD[,VARIATION[,CPU_EVENT]] Measure all threads individually:
Perform measurement system wide (not on specific processes):
Get help: -h for short help
-H for full help General DescriptionThe
caliper
command measures, reports on, and analyzes the performance of native
Itanium programs. To obtain performance data,
caliper
measures some
metrics using the Itanium's Performance Monitoring Unit (PMU) and, on some
platforms, measures other metrics by inserting probe code into the running
program. No special preparation of the measured program is necessary. caliper
is available on HP-UX 11i v2 and later and Linux/Itanium (2.6.5 kernel and
later). Note that not all features are available on all platforms; see
the
PLATFORM-SPECIFIC ADDENDA
section below for details. caliper
operates in one of two broad collection modes: in per-process mode (using
--scope=process)
or in system-wide mode (using
--scope=system). In per-process mode,
caliper
tracks and measures individual processes, optionally following
fork
and
exec
calls. In system-wide mode,
caliper
measures data across all CPUs on the system, and then attributes samples
to individual processes. Using system-wide mode is a good way of understanding
the broad performance picture on a system, before "drilling down" using
caliper
in per-process mode. caliper
can measure programs which are built 32-bit or 64-bit, shared bound
or minshared bound, optimized or debug. Applications can be written in C,
C++, Fortran 9x, assembly (if standard runtime conventions are followed) or
a mixture of these languages. You can specify the program to be measured using an absolute path, a relative
path or a simple file name.
caliper
searches
$PATH
when you specify a simple
file name and looks in the current directory only if
$PATH
includes it. Alternatively, you can specify one or more already running processes to measure
by listing their ID(s) on the command line.
caliper
will measure processes until they terminate, unless you use the
--duration
option, or if you stop
caliper
by sending it a SIGINT (e.g., using
Ctrl-C
in a terminal window), which
will also generate a performance report or write performance data to a
database. Note that stopping
caliper
with a SIGINT
(Ctrl-C)
when dynamic instrumentation is being used
(measurements
acount,
cgprof,
fcount,
and
fcover)
will cause
caliper
to immediately and forcibly terminate all processes being measured before
writing data. Using SIGINT with any other measurement will stop
caliper
while allowing measured processes to continue normally. caliper
can both collect data and optionally generate reports (in ASCII, CSV, HTML,
or any combination) and/or create a
caliper
database in a single run (using the syntax, ".
caliper measurement
..."). caliper
can also:
Generate a report from a previously
created database
(caliper report), Merge or diff the data from multiple
databases
(caliper merge
and
caliper diff), Analyze performance
metrics from one or more previously created databases
(caliper analyze),
or Generate descriptions of reports and CPU events
(caliper info).
The
measurement
argument to the
caliper
command is really the name of a measurement configuration file which determines
what measurement to make and how to make it. You can use the standard
measurement configuration files supplied with
caliper
or you can create your own. You can also override measurement configuration
file settings on the command line or in a
caliper
initialization file
(.caliperinit). One example of a measurement is
fprof,
which tells
caliper
to collect the data needed to produce a flat profile report based
on CPU cycles. Another example of a measurement is
dcache,
which produces a report detailing where data cache misses have occurred
during execution of the program. See the
EXAMPLES
section for examples of
using
fprof,
dcache,
and other measurements. If an executable has been stripped of local symbols,
caliper
can only
report names for global functions. If no symbol table (or debug
information) exists at all,
caliper
will only report address information. Like all performance measurement tools,
caliper
can affect the runtime
performance characteristics of the program being measured. Some measurements,
such as
ecount,
have negligible impact while dynamic instrumentation-based measurements
(acount,
cgprof,
fcount,
fcover)
can have a large effect. Take this
"Heisenberg" effect into consideration when interpreting any performance data. When making measurements, performance data is always saved to a
caliper
database. You can use the
--database
option
to specify the name and location of the database
caliper
creates. If you do not use
--database,
then the database (named for
the type of measurement made) will be saved in the
./.hp_caliper_databases
directory (this default can be changed with the
CALIPER_DATABASES
environment
variable). The most recently created measurement database is pointed to by an
automatic "latest" symlink in the
CALIPER_DATABASES
directory. If a simple
database name is given on a
caliper
report
or
advise
run, then the
CALIPER_DATABASES
directory is searched after
the current directory for the database.
So,
caliper report fprof
will report from the most recent
fprof
measurement run,
caliper report latest
will
report the latest measurement run of any type. In addition to measuring and reporting performance data,
caliper
can also analyze collected data and make suggestions for improving your
program's performance. This analysis is driven by a set of rules which
look for specific data metrics indicating typical performance problems. You can also write your own rules customized to your application. This is
an iterative process where you first make one or more
caliper
performance runs saving the results in databases, run
caliper
advise on those
databases, review the suggestions, make changes to your program and/or how
it is built, and repeat. Note that because every program is unique, not all
of the suggestions will apply. There are several defaults assumed when portions of the
caliper
command-line are omitted. If
measurement
is omitted, but either a
program
name or
process ID
list is given, then the
scgprof
measurement is assumed. If
report
is omitted, but a
database
if given, then
report
is assumed. If
report
is given, but no
databases
are listed, then
latest
is assumed. Finally, if
advise
is given, but no
databases
are listed, then all of the databases in the
CALIPER_DATABASES
directory are
analyzed. Finally, there is a full Graphical User Interface (GUI) available as an
alternative to the command-line user interface. The GUI can be used to set up
and make measurements as well as graphically explore collected performance data.
There are two ways to invoke the GUI:
The first is to run it on the local
machine with the command
caliper -g. The second is to install a remote GUI
client on a Windows or Linux desktop system, run that program, and remotely
connect to an Integrity server on which the measurements will be made.
Measurement Configuration FilesThe
measurement
argument to the
caliper
command is actually the name of a measurement configuration file. Measurement
configuration files determine which measurements
caliper
makes. You can use any of
caliper's
standard measurement configuration files or you can create and use your own. The
measurement
parameter in the command line can be a simple, relative,
or absolute file name. If a simple file name is given, then
caliper
first looks in the current working directory for the file. If not found,
then
caliper
looks in the
caliper_root/config
directory. By default,
caliper
is installed in
/opt/caliper
on HP-UX, and
/opt/hp-caliper
on Linux. Many of the settings in the measurement configuration file can be overridden by
their corresponding options on the command line. Note that when making your own measurement configuration file, the first line
of the file must be a comment which begins with
to pass a
caliper
validity check. HP Caliper provides the following standard measurement configuration files.
Also see the
PLATFORM-SPECIFIC ADDENDA
section for additional platform-specific measurement configuration files
that may be supplied.
- alat
Measures failed ALAT checks. - branch
Measures branch (mis-)predictions. - cstack
Measures sampled call stack profile. - cycles
Measures sampled cycle profiles. Available only on dual-core
processors. - dcache
Measures data cache misses. - dtlb
Measures data tlb misses. - ecount
Measures total cpu event counts. - fprof
Measures a flat profile of sampled instruction addresses. - icache
Measures instruction cache misses. - itlb
Measures instruction tlb misses. - pmu_trace
Collects per-(kernel)thread traces of sampled cache misses, TLB misses,
ALAT misses, branch mispredictions, instruction addresses, and CPU events. - scgprof
Creates a call graph profile using sampled branch data. - traps
Profiles traps, interrupts, and faults.
OptionsThere are a number of command line options available which alter
caliper's
operation. Option names and literal arguments can be abbreviated to their
shortest, non-ambiguous spelling. Although the command line synopsis above
shows
caliper
options following
measurement,
in reality they can precede and/or follow it. In the option descriptions below, lower-case text (computer/bold font)
is a literal which is typed as shown and upper-case text (italics or underline
font) is descriptive and must be replaced with real values. See
PLATFORM-SPECIFIC ADDENDA
below for additional options that may be available. The following option can be used to supply
caliper
options from a text file:
- --options-file=OPTIONS_FILE or -f=OPTIONS_FILE
Specifies a text file
(OPTIONS_FILE)
containing a list of
caliper
command-line options (separated by spaces or newlines).
You can also use an options file to specify a
caliper
measurement configuration file as well as the application to be profiled
and its arguments.
Data Collection OptionsThe command line options which affect data collection are: --branch-sampling-spec=
EVT_PERIOD[,VARIATION
[,
CPU_EVENT[:EVENT_PARAM]...]]
Controls the branch sampling rate for an
scgprof
collection.
See the
--sampling-spec
option below for argument details.
- --database=DATABASE[,unique]
(Can also be specified with the
-d
option.) This saves measurement results to the named database. Performance data is always saved to a
caliper
database, whether one is explicitly specified or not.
If the
--database
option is used to specify the explicit name and location of the database,
then that database is used. If the
--database
option is not given, then a database (named for the type of measurement made)
will be saved in the
./.hp_caliper_databases
directory (this default can be
changed with the
CALIPER_DATABASES
environment variable).
The most
recently created measurement database is pointed to by an automatic
"latest" symlink in the
CALIPER_DATABASES
directory. The optional
unique
qualifier will append the process ID of the HP Caliper process to the name
of the database it is writing to, e.g.,
mydb-21952. - --data-summary
Valid only for the
dcache
measurement, this option specifies that
caliper
should additionally record (and report) global variables and process
regions (stack, heap, data, etc.) associated with data cache misses. - --duration=ELAPSED_TIME
(Can also be specified with the
-e
option). Specifies how long the measured application should run (in seconds)
before
caliper
stops measuring it.
When
caliper
stops measuring, the application is resumed and runs freely. - --event-defaults=EVENT_PARAM[:EVENT_PARAM]...
Specifies the default CPU event parameters. These apply to all
CPU events unless an event-specific setting is provided for
the given parameter. EVENT_PARAM
is defined as: [{privilege-level-mask|PLM}=
]{all|user|kernel
} | {threshold|T}=THRESHOLD PLM determines the privilege level setting for a given metric.
By default, metrics are measured when your application runs in user
space. The privilege levels available are:
user,
kernel,
and
all. THRESHOLD
is an integer value that determines the semantics of the counts
reported by HP Caliper.
When the threshold is zero (the default), the reported
count is the number of occurrences of the event.
When the threshold is not
zero, the reported count is the number of CPU cycles during which the number
of occurrences of the event met or exceeded the threshold. - --frame-depth=COUNT
Valid only for the
cstack
measurement, this option specifies the maximum number of frames to
unwind while collecting call stack samples. The default depth is 32.
--metrics=
EVENT_SET[:all|:user|:kernel][,
EVENT_SET]... --metrics=
CPU_EVENT[:
EVENT_PARAM]...[,CPU_EVENT]...
(Can also be specified with the
-m
option.) Specifies the event set or CPU events to measure.
If no event is specified (or
--metrics=
is specified),
then no metrics will be reported.
(Note: Specifying
--metrics=
is not valid for the
cpu
and
ecount
measurements.) EVENT_SET
is a predefined collection of CPU events you specify
only with the measurement,
cpu.
(See
CPU Metrics EVENT_SET Description
below for more information.) CPU_EVENT
is a CPU event as recorded by the Performance
Monitoring Unit (PMU) of the processor.
You can change the
default CPU events recorded for any of the following
measurements:
alat,
branch,
cycles,
dcache,
dtlb,
ecount,
fprof,
icache,
itlb,
pmu_trac,
traps. You can list CPU events
available (along with descriptions) by using
caliper info
(see
Information Options
below). Defaults specific to
a given measurement can be found in the measurement's
configuration file. EVENT_PARAM
allows you to change the privilege level (default:
user)
and threshold (default: 0) used when counting events.
See
--event-defaults
above for the syntax of
EVENT_PARAM.
- --module-default=all|none
Specifies the default setting for load module inclusion in the measurement. - --module-exclude=MODULE[:MODULE]...
Specifies the load modules to be excluded from measurement. Module
names can be given as a simple file name
(libapplib1.so)
which matches libraries of this name in any directory; a full-path file name
(/home/dev/libs/libapplib1.so)
which matches only this one specific library;
or a full-path directory name
(/usr/lib/)
which matches all libraries within
this directory or any lower sub-directories. Note that the trailing
/
is required to distinguish a directory name. For instrumentation-based measurements
(acount,
cgprof,
fcount,
fcover),
the
specified load modules are not instrumented; for all other measurements,
any samples in the specified load modules are simply discarded. - --module-include=MODULE[:MODULE]...
Specifies the load modules to be included in the measurement.
Module names can be given as a simple file name
(libapplib1.so)
which matches libraries of this name in any directory; a full-path file name
(/home/dev/libs/libapplib1.so)
which matches only this one specific library;
or a full-path directory name
(/usr/lib/)
which matches all libraries within
this directory or any lower sub-directories. Note that the trailing
/
is required to distinguish a directory name. - --module-search-path=DIRECTORY[:DIRECTORY]...
Specifies a list of directories to be searched when a load module file
(executable or shared library) cannot be found. A load module
may not be found if the measured process uses
chroot(2)
or
chdir(2)
and then loads libraries or executes other binaries using relative paths.
(See also the entry for this option in the section,
Reporting Options.)
--process={default|all|root|root-forks|[some:][OPT,...]PATTERN[, PATTERN]...}
(Can also be
specified with the
-p
option). Specifies which processes in an application's process tree should be measured. Use
root
to measure only your application's root process. Use
root-forks
to measure your application's root
process and any processes it forks. Use
all
(the default) to measure
all your application's processes. For more information, see
Process Selection
below. --sampling-spec=TIME_PERIOD --sampling-spec=EVT_PERIOD
[,
VARIATION[,CPU_EVENT
[:EVENT_PARAM]...]]" (Can also be specified with the
-s
option). Controls the sampling rate and the event that triggers samples. The first form
(--sampling-spec=TIME_PERIOD)
is used only for the
cpu
and
cstack
measurements. The second form is used for all other sample-based measurements
(fprog,
dcache,
icache,
etc.). TIME_PERIOD
is a sampling period in seconds, milliseconds,
or microseconds (specified as a
Ns,
Nms,
Nus,
respectively,
where
N
is an integer). For the
cpu
measurement, the default
TIME_PERIOD
is 8ms of CPU time (see also
--cpu-aggregation=COUNT). For the
cstack
measurement, the default
TIME_PERIOD
is 100 miliseconds (wall-clock time).
Note that time is measured
in CPU cycles (for
cpu
measurement) or real time (for
cstack
measurement). EVT_PERIOD
specifies how many sampling events should occur between
samples. VARIATION
specifies how much to vary the number
of events between samples (may be specified as either
an exact count or as a percentage of the sampling rate if
followed by
%). CPU_EVENT
specifies the CPU event to use for sampling. You can
list CPU events available (along with descriptions) by using
caliper info
(see
Information Options
below). EVENT_PARAM
allows you to change the privilege level
(default:
user)
and threshold (default: 0) in effect
for the CPU event counter used to trigger samples.
See
--event-defaults
above for the syntax of
EVENT_PARAM.
- --scope={process|system|PSET_LIST}[,attr-mod|,attr-proc|,attr-none]
Defines a measurement's scope. PSET_LIST
is defined as:
pset=pset_id
[:pset_id]... Caliper can measure activity on individual processes
(process
scope), on all CPUs in the system
(system
scope) or, on HP-UX systems, on
all CPUs in selected processor sets
(pset
scope). The
system
and
pset
scopes are supported for all PMU-based measurements
(alat,
branch,
cpu,
cycles,
dcache dtlb,
ecount,
fprof,
icache,
itlb,
pmu_ttrace,
traps).
The default scope is
process. With
system
and
pset
scopes, the qualifiers
attr-mod,
attr-proc,
and
attr-none
can be specified. For measurements involving PMU samples,
this determines how such samples are attributed.
attr-mod
is the default, and tells
caliper
to attribute samples to individual processes and their load modules,
whenever possible. attr-proc
causes attribution simply to processes
alone;
attr-none
specifies no process attribution at all. In all three qualifier cases, samples recorded in the kernel will be attributed
to kernel modules, if possible. The
-w
option is a shortcut for
--scope=system,attr-mod. When the scope is
system,
the command-line arguments program and
program_args
should not be provided. For more information, see
System-Wide Measurements
below. - --thread=sum-all|all
- -t=sum-all|all
Specifies how thread data should be collected and reported. Specify
all
to collect and report data per thread.
(The
-t
option is a shortcut for
--thread=all).
Specify
sum_all
to collect and
report data summed across threads.
Default:
sum_all. Note that this option
is currently only supported for the following measurements:
alat,
branch,
cstack,
cycles,
dcache,
dtlb,
fprof,
icache,
itlb,
traps.
Reporting OptionsThe command line options which affect reporting are:
- --callpath-cutoff=PERC_CUTOFF[,CUM_PERC_CUTOFF[,MIN_COUNT]]
Valid only for the
cstack
measurement, this option specifies
a cutoff value that limits hot call paths reported in
Hot Call Paths sections. Reporting of call paths stops when, for the given
sort metric, a call path is encountered whose associated metric percentage
is below
PERC_CUTOFF
(default 1.0) or when the
CUM_PERCENT_CUTOFF
has been
met or exceeded (default 100.0). The
MIN_COUNT
argument sets the
minimum number of call paths to be displayed (default: 5) regardless
of the settings for
PERC_CUTOFF
and
CUM_PERC_CUTOFF. - --context-lines=COUNT_SOURCE[,COUNT_DISASSEMBLY]
Specifies that function details should show at least
count_source
source lines (default: 5 for source-only reports or 0 for reports with
disassembly code) before and after reporting a source line entry
with associated performance data. Set
COUNT_SOURCE
to
all
to report all source lines for reported functions. As with
COUNT_SOURCE,
set
COUNT_DISASSEMBLY
to show context disassembly lines (default: 3). Applies to PMU histogram
reports only. - --csv-file=CSV_FILE[,append|,create][,per-process|,shared][,unique]
Specifies a file in which to write a
caliper
performance report in
Comma-Separated-Values format (CSV) for use in a spreadsheet or for
further processing. The file can be opened in
append
or
create
mode (default:
create).
Multi-process reports can be generated
per-process
(exec name is appended to each file) or to a
single,
shared
file (default:
shared).
Specify
unique
to have
the process ID appended to the report file name.
--description-details={all|none|[target]
[:processor][:run][:times]
[:sampling][:images][:help
]}
Controls which subsections are included in the description section
at the top of each report. The default:
target:processor:times:sampling. Specify
all
to include all subsections,
none
to exclude all
subsections, or specify a list of the subsections you want included.
The list can include one or more of the following subsection
identifiers (shown with their associated subsection title):
- target
Target Application - processor
Processor Information - run
Run Information - times
Target Execution Time - sampling
Sampling Specification - images
Load Modules Included - help
Report Help
- --detail-cutoff=PERC_CUTOFF[,CUM_PERC_CUTOFF[,MIN_COUNT]]
Specifies a cutoff value that limits functions reported in function details
sections. Reporting of functions stops when, for the given sort metric,
a function is encountered whose associated metric percentage is below
PERC_CUTOFF
(default 1.0) or when the
CUM_PERCENT_CUTOFF
has been met or
exceeded (default 100.0). The
MIN_COUNT
argument sets the minimum number
of functions to be displayed (default: 0) regardless of the settings for
PERC_CUTOFF
and
CUM_PERC_CUTOFF. Applies to PMU histogram reports only. - --group-by=executable|module|none
Specifies how data for
matching
processes and modules (those with matching
basenames) is combined in reports. Specify
executable
to have data for matching processes combined whenever possible. This is the
default. Specify
module
to ignore individual processes, and create a "module-centric" report
for matching modules across all measured processes. Specify
none
to combine no data across any processes. - --html=[OUTPUT_DIR[,STYLE[,ENTRIES_PER_PAGE]]]
Writes performance data as an HTML-formatted report in directory
OUTPUT_DIR
(default:
./Caliper_HTML). STYLE
specifies the color theme of the report.
STYLE
can be set to
black,
gold,
or
white
(default:
white). ENTRIES_PER_PAGE
specifies the number of entries on each web page (default: 20).
The
--html
option is supported for the following reports:
alat,
branch,
cgprof,
cycles,
dcache,
dtlb,
fprof,
icache,
itlb,
scgprof,
and
traps. - --info
Causes HP Caliper to append help information to the end of textual reports. - --kernel-path=PATH
Specifies the path to the kernel file you want HP Caliper to use for symbol
lookup and disassembly. This option only applies when kernel profiling
is involved, typically when the sampling specification for the measurement
has a privilege level mask of
kernel
or
all. - --latency-buckets=TRUE|FALSE
Specifies whether or not the latency bucket information should appear in
dcachereports
(default:
TRUE). This option is used only with the
dcache
measurement. - --module-search-path=DIRECTORY[:DIRECTORY]...
Specifies a list of directories to be searched when a load module file
(executable or shared library) cannot be found. A load module file
may not be found if the load module is not available from the location
recorded at data collection time. - --output-file=OUTFILE[,append|,create][,per-process|,shared][,unique]
(Can also be specified with the
-o
option.) Specifies the file
caliper
writes its report to (default is stdout).
The file can be opened in
append
or
create
mode
(default:
create). Multi-process reports can be generated
per-process
(exec name is appended to each file) or to a
single,
shared
file (default:
shared). Specify
unique
to have
the process ID appended to the report file name. - --per-module-data=TRUE|FALSE
Specifies whether or not
caliper
should report functions in one list
(default:
--per-module-data=FALSE),
or report functions grouped by
load module
(--per-module-data=TRUE). The following measurements support
--per-module-data=TRUE:
alat,
branch,
cycles,
dcache,
dtlb,
fprof,
icache,
itlb,
and
traps. - --percent-columns={total|cumulative|total:cumulative}
Specifies what types of percentages are shown in reports. Specify
total
to have a column showing percentage of samples with respect to
total number of samples taken. Specify
cumulative
to have a column showing
cumulative percentage of samples. Applies to PMU histogram reports only. - --process-cutoff=PERC_CUTOFF[,CUM_PERC_CUTOFF[,MIN_COUNT]]
Specifies a cutoff value that limits processes reported in process summary
section. Reporting of processes stops when, for the given sort metric,
a process is encountered whose associated metric percentage is below
PERC_CUTOFF
(default 2.0) or when the
CUM_PERCENT_CUTOFF
has been met or exceeded (default 100.0). The
MIN_COUNT
argument sets the minimum number
of processes to be displayed (default: 5) regardless of the settings for
PERC_CUTOFF
and
CUM_PERC_CUTOFF. Applies to PMU histogram reports only. - --read-init-file=TRUE|FALSE
Specifies whether or not
caliper
should look for and read
.caliperinit,
the
caliper
initialization file (default:
TRUE). Caliper looks for a
.caliperinit
file in the current directory; if not found,
caliper
then looks in your home directory. Settings in an initialization file take
precedence over measurement configuration file settings.
Command-line options take precedence over settings in an initialization file. - --report-details={all|none|statement|instruction|statement:instruction}
(Can also be specified with the
-r
option).
For PMU histogram reports only. Specifies level of program detail reported
(default:
statement). Specify
statement
to have data aggregated by source statement. Specify
instruction
to obtain reports at the lowest level of granularity
available. Specify
none
to disable detail reports entirely.
Specify
all
to print both source and instruction level details. - --report-details=[module][:directory][:file][:function][:unknown]
(Can also be
specified with the
-r
option).
For function coverage reports only. Specifies which coverage reports are
produced. Default:
module:directory:file:function:unknown. Specify
module
for the load module summary report,
directory
for the source directory summary report,
file
for the source file summary report,
and/or
function
for the function detail report. Additionally specify
unknown
to include functions from unknown source files in the summary and
detail coverage reports. - --skip-functions=FUNC[,FUNC]...
Valid only for
cstack;
specifies functions that are of no interest.
Call stacks are not reported if their leaf routine
is one of the specified functions. - --sort-by=METRIC
Specifies that performance data is to be sorted by values of given metric.
(Default: See
Metrics for Sorts/Cutoffs
below.) - --source-path-map=PATHMAP[:PATHMAP]...
Specifies the
PATHMAP
used in finding source files (used for reporting source statements). PATHMAP
entries are separated by a colon
(:)
and applied in order until a file match is
found. Simple entries are prepended to file names; comma-separated entries
specify to substitute the path to the left of the comma with the path
to the right of the comma.
Perl regular expressions are allowed in the left
half of a substitution. Applies to PMU histogram reports only. - --summary-cutoff=PERC_CUTOFF[,CUM_PERC_CUTOFF[,MIN_COUNT]]
Specifies a cutoff value that limits functions reported in function summary
sections. Reporting of functions stops when, for the given sort metric,
a function is encountered whose associated metric percentage is below
PERC_CUTOFF
(default 0.1) or when the
CUM_PERCENT_CUTOFF
has been met or exceeded (default 100.0). The
MIN_COUNT
argument sets the minimum number
of functions to be displayed (default: 5) regardless of the settings for
PERC_CUTOFF
and
CUM_PERC_CUTOFF. Applies to PMU histogram reports only. - --system-model=MODEL_NUMBER
Specifies the system model number for reporting latency buckets with the
dcache
measurement. This option is necessary if you want HP Caliper to report
system-specific latency buckets on Linux.
If you do not use this option,
a default set of latency buckets will be used. On HP-UX, HP Caliper automatically obtains the model number using the
model
command (see
model(1)). - --traps-reported=TRAP_NAME[,TRAP_NAME]...
Specifies which traps, faults, and interrupts
(collectively referred to as
traps)
caliper
should report. The traps measurement collects samples for 34 different
traps, but only 6 trap types can be shown in a single report.
You can re-report from the same
HP Caliper database to see different sets of traps. TRAP_NAME
is is an abbreviation for a trap name.
The default
TRAP_NAME
values and their associated traps are:
- ITLB
Instruction TBL fault - DTLB
Data TBL fault - UADREF
Unaligned Data Reference fault - GEXCP
General Exception - FPFLT
Floating-Point Fault - FPTRP
Floating-Point Trap
For a list of all available values for
TRAP_NAME,
run the
caliper
command: caliper info -r traps
Advise OptionsThe command line options which affect advise-only runs are:
- --advice-classes=[all]|[[general][:cpu][:memory][:io][:system]]
Specifies which classes of advice to report (default:
all).
Every piece of
advice is classified by which performance area it apples to. Specify one
or more of
general
for general advice,
cpu
for advice related to basic
instruction execution,
memory
for memory-related information,
io
for input-output performance issues, and/or
system
for advice about items such
as system calls. Only advice in the selected classes will be printed. - --advice-cutoff=MIN_INDEX[,MIN_COUNT[,MAX_COUNT]]
Specifies how much advice to report. Each bit of advice has an index value
reflecting its relative importance. Advice is sorted with the most important items first and the list
stops when the index value of an item is below
MIN_INDEX
(default: 5.0) or
MAX_COUNT
(default: 15) items have been
printed. The
MIN_COUNT
(default: 5) argument sets the minimum amount of
advice to report regardless of the setting for
MIN_INDEX.
--advice-details=[all]|[[description][:improvement][:measurement]
[:explanation][:rule]]
Specifies how much detail to report for each piece of advice (default:
all). Each piece of advice may contain a brief description of what it is focused on
(description),
a suggestion for improving performance in this area
(improvement),
additional performance measurements which can be made to
further explore this area
(measurement),
a more detailed explanation of what
this performance area is
(explanation),
and the name of the rule generating
this advice
(rule). Most rules will not have all of these components. One
or more of these can be given to limit what gets reported. --analysis-focus=
{[executable:]all|[executable:]NAME[,[executable:]NAME]...} Specifies which executable program(s) to report on. By default, all
(all)
executables which have performance data in the given database(s) will be
analyzed and a separate report produced for each. To analyze only selected
executables, list them by giving only their simple filename (no path
information).
- --rule-files=RULEFILE1[,RULEFILE2]...
Specifies a list of files which contain the analysis rules to use (default:
default).
Only the rule files listed (and any rule files that they include)
will be used. Each
RULEFILE
can be either a relative path, an absolute path,
or a simple filename. Simple file names are first searched for
in the current directory, then (if not found) in the user's personal
rules directory
(~/caliper_advisor),
and finally (if still not found) in the directory
of the running
caliper
(caliper_root/rules).
GUI OptionsThe command line options which affect the graphical user
interface are:
- --gui
(Can also be specified with the
-g
option.) Start the caliper GUI to interactively measure and explore
performance data. - --jre
Specify the location of the Java Runtime Environment (JRE) that
will run the GUI.
If
--jre
is not specified,
caliper
will first check if the environment variable
JAVA_HOME
is set to a JRE;
if not,
caliper
will attempt to find a JRE via the
PATH
environment variable.
Information OptionsThe command line options which affect information-only runs are:
- --cpu-counter=PARTIAL_COUNTER_NAME|KEYWORD|all
(Can also be specified with the
-c
option.) Specifies that information about the cpu counters which match the given
(partial) name or keyword be output. The information fields to search are
given with the
--search
option. The information fields to output are given with the
--report
option.
Use
all
to report on all cpu counters. The
--cpu-counter
and
--report
options are mutually exclusive. If neither is given, then
--cpu-counter
is assumed. - --detail={all|[name][:abbreviation][:category][:title][:description]}
(Can also be specified with the
-d
option.) Specifies which information fields to include in cpu counter reports. It can
be any combination of
name,
abbreviation,
category,
title,
or
description;
separated by colons or
all. Default:
name:abbreviation:title. - --output-file=OUT_FILE[,append|,create]
(Can also be specified with the
-o
option.) Specifies the file in which to write the
caliper
information report (default is stdout). The file can be opened in
append
or
create
mode
(default:
create). - --report=all|REPORT_TYPE
(Can also be specified with the
-r
option.) Specifies that information about the report types which match the given
(partial)
REPORT_TYPE
name be output.
This is the same information which is
include in measurement reports when the
--info
option is used. Report types are
alat,
branch,
dcache,
dtlb,
ecount,
fprof,
icache,
itlb,
pmu_trace,
or
scgprof. Use
all
to report on all report types. The
--cpu-counter
and
--report
options are mutually exclusive. - --search={all|[name][:abbreviation][:category][:title][:description]}
(Can also be specified with the
-s
option.) Specifies which information fields are to be searched for cpu counter reports.
It can be any combination of
name,
abbreviation,
category,
title,
or
description;
separated by colons or
all. Default:
name:abbreviation.
Help and Version OptionsThe following general options can be used to get syntax help or print the
caliper
version:
- -h or -?
Prints the quick help text.
When used, must be used alone on the
caliper
command line. - --help or -H
Prints the long help text.
When used, must be used alone on the
caliper
command line. - --version or -v
Prints the
caliper
version identification.
When used, must be used alone on the
caliper
command line.
Specifying Settings with an Initialization FileYou can save settings in a file, named
.caliperinit,
that HP Caliper
automatically uses at start-up.
Putting the options in an initialization
file simplifies the command line you use to launch HP Caliper. For example, you can specify
global settings for all of your reports, such as system
libraries to exclude and output file locations.
With your preferences in the
initialization file, you can then simply type:
The resulting report would use your predefined preferences.
Using this approach you could, for example, change your preferences
without having to change the HP Caliper command line in a
Makefile. Note:
Any option specified on the command line overrides
the corresponding
setting in an initialization file. There are a number of reporting options not available from the
command line that you can set in an option file.
These are:
- disasm_mark_branch_targets=TRUE|FALSE
Determines if targets of branch instructions are preceded by a colon
(:)
in disassembly. Default: False - disasm_target_name_limit=LIMIT
Specifies the maximum number of characters to print for branch target
symbols in disassembly. Default: 30 - use_parens_for_statement_data=TRUE|FALSE
If
TRUE,
statement-level data in reports is placed in parentheses. Default: True - suppress_statement_data=TRUE|FALSE
If
TRUE,
no statement-level data will be reported. Default: False - suppress_init_warnings=TRUE|FALSE
If
TRUE,
no warnings will be issued if unrecognized variables are detected
in the initialization file or in measurement configuration files. Default: False
The initialization file can be in the current working
directory, your home directory, or in both locations.
When you start HP Caliper, it searches for the presence
of an initialization in this order: - 1.
.caliperinit
in the current working directory - 2.
.caliperinit
in your home directory.
and when
caliper
finds one, it executes based on that initialization file. An initialization file is a Python script, similar to Caliper measurement
configuration file.
Here is a sample
.caliperinit
file: application="ls"
if caliper_config_file == 'branch':
process='all'
elif caliper_config_file == 'my_count':
application="/opt/mpi/bin/mpirun"
arguments = "-np 2 /proj/dynopt/test_fe/mpi_hello_world"
elif caliper_config_file == 'dcache':
application="/opt/mpi/bin/mpirun"
arguments = "-np 2 /proj/dynopt/test_fe/mpi_hello_world"
elif caliper_config_file == 'itlb':
application="/opt/mpi/bin/mpirun"
arguments = "-np 2 /proj/dynopt/test_fe/mpi_hello_world"
module_exclude="/usr/bin/sh" The syntax inside the initialization file is the same as in measurement
configuration files.
In particular most long options that are specified in the
command line could be specified in the initialization file, replacing the
dash
(-)
in the option long name with underscore
(_)
to form the variable name. Process SelectionWhen dealing with multi-process applications, it is important to be able to
select processes to be measured in a process tree.
This section explains how to
do this selection by using the
--process
option. Caliper has a choice between three behaviors when considering what to do
with a process:
- measure
The process is measured.
Caliper is informed of new processes generated
via
fork,
vfork,
or
exec. - track
The process is not measured.
Caliper is informed of new processes
generated via
fork,
vfork,
or
exec. - ignore
The process is not measured.
Caliper is not informed of new processes
generated via
fork,
vfork,
or
exec.
Caliper will pick which behavior is chosen depending on the
process
option. See
PLATFORM-SPECIFIC ADDENDA
below for additional information on
process selection. This section uses the term
root
process.
The
root
process is the process at the root of the process tree.
It is
either the process started by Caliper or the process to which Caliper attaches. The simple options are:
- --process=root
Only the root process is measured. - --process=root-forks
Only the root process and processes forked from the root process
are measured. - --process=all
Every process in the process tree is measured. This is the default for all
measurements. - --process=default
This option is available to be able to explicitly request the default
behavior. This is equivalent to specifying
root
for all measurements.
The complex options are:
- --process=[some:][OPT1,...]PATTERN
Each
--process=some:...
argument provided to Caliper is interpreted as an additional
filter.
Those filters are applied in order to each new process.
If a filter matches the process, the behavior
(measure,
track,
ignore)
associated with the process is memorized.
Caliper will use the behavior
of the last matching filter to determine what to do with the process. When no
OPT1,...
component is provided, the default interpretation is
as follows: the
PATTERN
component is interpreted as a list of
glob
patterns separated by colons
(:);
if the basename of the executable
matches any of those patterns, the process is measured, otherwise
it is tracked. The presence of keywords in
OPT1,...
modifies those semantics as follow:
- measure
Processes matching this filter will be measured. - track
Processes matching this filter will be tracked. - ignore
Processes matching this filter will be ignored. - glob
The
PATTERN
is interpreted as a list of colon-separated
glob
patterns. - regexp
The
PATTERN
is interpreted as a Python/Perl regular expression that
is tested using the
search()
function (i.e., any non-empty match will
be considered a positive match). - file
The string used against the
PATTERN
is the basename of the main executable of the process. - arg0|argv0
The string used against the
PATTERN
is argument 0 of the process. - arg1|argv1
The string used against the
PATTERN
is argument 1 of the process. - root
The filter only matches the root process. - fork
The filter only matches processes created via
fork(). - exec
The filter only matches processes created by
exec().
For keyword families
measure|track|ignore,
glob|regexp
and
file|arg0|arg1,
only the last keyword used in each family is
considered.
For the keyword family
root|fork|exec,
multiple keywords will be considered as specifying a logical OR operation
between the keywords. The prefix
some:
is only necessary when no option is provided and the
PATTERN
could be mistaken for one of the simple options
(root,
root-forks,
all,
default). - --process=custom:FUNCTION
Allows you to specify a Python function to be used as a filter for
processes.
System-Wide MeasurementsOnly PMU-based measurements are available in system-wide mode
(across all CPUs in the system, instead of selected processes). Measurements involving dynamic instrumentation see
Measurement Categories)
and
cstack
are not supported in system-wide mode.
The measurement can occur at any privilege level; the default privilege
level for system-wide mode is
all:
user and kernel space. By default, samples are attributed to both a process and a load module,
whenever possible.
Alternatively, you can specify (via
--scope)
that samples be attributed to processes only or neither processes
nor load modules.
Both alternatives reduce the overhead of collecting
and reporting performance data. (Note that attribution settings to
--scope
do not affect attribution to samples in the kernel.) In
pmu_trace,
addresses referring to user-space modules will not get
resolved regardless of the sample attribution requested. HP Caliper cannot locate an executable or a shared library on HP-UX if it is
invoked using a relative path. In addition, at certain times, executables and
shared libraries cannot be located even if they are specified with complete
paths. If this problem occurs, the result can be a large number of samples
reported as "unattributed". The workaround is to use the
--module-search-path
option to specify a list of directories where the executables and shared
libraries are located. Usage model:
Replace the executable invocation with
--scope=system
(or
-w) Define a measurement duration
(--duration=seconds or -eseconds)
or measure until SIGINT (Ctrl-C) is received.
On HP-UX, the
--scope pset=pset_id[:pset_id]...
option can be used to measure activity on all CPUs belonging to the specified
processor sets.
For example,
measures activity on all CPUs in processor sets 1 and 2.
You can use the
psrset
command (see
psrset(1))
with the
-i
option to find processor assignment for all processor sets in the system. Metrics for Sorts/CutoffsThe following report types support the use of the following
metrics
for sorting and applying cutoffs, where the default
metric
for sorting is shown enclosed in [ ]:
- alat
[sampled-misses] - branch
target,
branch-ways,
[mispredict],
back-end-only-mispredict - cstack
[samples],
samples-running
(HP-UX only),
samples-blocked
(HP-UX only) - cycles
[samples] - dcache
sampled-misses,
[latency],
avg-latency - dtlb
[sampled-misses],
l2-fills,
hpw-fills,
soft-fills - fprof
[samples] - icache
sampled-misses,
[latency],
avg-latency - itlb
[sampled-misses],
hpw-fills,
soft-fills - scgprof
[samples},
call-count,
msecs-per-call - traps
[samples] By default, traps are always sorted on the first trap
specified with
--traps-reported
(or ITLB if
--traps-reported
is not used).
Specifying
--sort-by=samples
sorts based on values
in the "Trap Samples" column.
Cutoff settings
are based on the same metric as the sort, by default.
Use
--summary-cutoff
or
--detail-cutoff
to
override the default behavior. EXTERNAL INFLUENCESEnvironment Variablescaliper
recognizes the following environment variables:
- CALIPER_DATABASES
specifies the databases directory where implicit
caliper
databases (those not specified with a
--database
option) are stored. The
default databases directory is
./.hp_caliper_databases. - CALIPER_OPTS
Specifies a set of
caliper
options which are used for every measurement run. The contents of
CALIPER_OPTS
is prepended to the command line before it is processed.
It is possible to specify all
caliper
arguements and options via
CALIPER_OPTS.
PLATFORM-SPECIFIC ADDENDAHP-UXThere are some situations where
caliper
cannot insert probe code in a program or portions
of a program due to non-standard or unusual conditions, such as when an
assembly routine violates standard runtime conventions.
In such situations,
caliper
will issue warning messages and proceed to measure as much of the
program as it can. The following additional measurements are supplied with
caliper
on HP-UX:
- acount
Measures basic block arc counts. - cgprof
Measures call graph profile. (This is an enhanced version of the
gprof
command.) - cpu
Measures per-process metrics based on sampled CPU events. - fcount
Measures function counts. - fcover
Measures function coverage.
The following additional options, option arguments, or option features are
available on HP-UX:
- --bus-speed=MHZ
Specifies the bus speed in MHz for the
sysbus
event set.
If you specify the sysbus event set, you must use the
--bus-speed
option to provide bus speed. - --inlines|--noinlines
The
--inlines
option specifies that
caliper
should record and report inline functions, if data is available in the binary
to discover such inlines. - --cpu-aggregation=COUNT
This option is valid only when using the cpu measurement. Specifies how many
samples (sampling period specified using
--sampling-spec=TIME_PERIOD
option) will be aggregated into one aggregated sample. If
COUNT
is 0 or 1, the samples will not be aggregated.
By default 125 low-level samples
will be aggregated into one user-reported sample. - --cpu-details=[statistics|means][:samples]
This option is valid only when using the cpu measurement.
Specifies whether to print all 7 statistical metrics
(statistics)
or just the mean and standard deviation
(means).
samples
controls the printing of each sample in addition to reporting the
summary statistics. The default is
means
(report only the mean and standard deviation values). - --duration
This option is not supported for
cgprof,
fcount,
fcover,
and
acount
runs. - --exclude-caliper=TRUE|FALSE
The option is only valid when
-w
(--scope=system)
is specified.
This option is used to exclude/include the Caliper process activity
as part of the measurement. The default is
TRUE
(exclude). Note that this option is not available on Linux and the behavior
there is equivalent to a setting of
FALSE
(include Caliper). - --exclude-idle=TRUE|FALSE
The option is only valid when
-w
(--scope=system)
is specified.
This option is used to exclude/include the idle task as part of the measurement. The default is
TRUE
(exclude). Note that this option is not available on Linux and the behavior
there is equivalent to a setting of
FALSE
(include the idle task). - --html
This option is also supported for
cgprof
reports. - --kernel-path
The default kernel path used is:
/stand/current/vmunix
(HP-UX 11i v2 and
later). - --measure-on-interrupts=on|off|only
This option controls whether the Itanium PMU is enabled while
processing interrupts/traps.
- on
means that the PMU is enabled all
the time (during regular processing as well as interrupt processing). - off
means that the PMU is only enabled during regular processing
and disabled during interrupt processing. - only
means that the PMU
is enabled during interrupt processing and disabled during regular
processing.
The default is
on
for
--scope=system
and
cpu measurements. The default is
off
or the following
measurements when the measurement scope is
process
(--scope=process):
alat,
branch,
cycles,
cache,
dtlb,
ecount,
fprof,
icache,
itlb,
pmu_trace,
scgprof,
traps. Note that this
option is not available on Linux and the behavior there is equivalent
to a setting of
on
(PMU is enabled during regular processing
as well as interrupt processing). - --memory-usage={all|[begin][:timed][:end][:PERIOD[s|m|h]]}
Controls the collection and reporting of memory usage data. Current memory
use can be measured at any or all
(all)
of:
the beginning
(begin)
of the run, periodically
(timed)
throughout the run, or at the end
(end)
of the run.
When making timed measurements, current memory use is sampled every
PERIOD
number of seconds (default if no qualifier is given), minutes, or hours. Only samples which show a difference in memory utilization from the previous
sample are saved and reported to reduce the volume of data.
No memory usage data is collected or reported unless the option is used. This measurement can be made in conjunction with any Caliper measurement.
This option is only available on HP-UX 11i v2 or later and with
--scope=process
measurement runs. - --pbo-data-type=arc-stride|dcache
Controls the type of data collected when you use the HP compilers'
+Oprofile=collect
option. Choose
arc-stride
to collect arc counts, stride data, or both (depending on which have been
enabled at compile time). Choose
dcache
to collect data cache miss data. Default:
arc-stride. Alternatively, you can choose the type of data collected
by assigning
arc-stride
or
dcache
to the environment variable,
PBO_DATA_TYPE. - --report
This option also supports these report types:
acount,
cgprof,
fcount,
and
fcover. - --system-usage={all|runstatus|syscalls|runstatus:syscalls}
Controls the collection and reporting of system usage data. Two types of system
usage data can be collected:
runstatus
(how much time each process spent
running, eligible to run but not running, and waiting),
syscalls
(the count and
time spent in every syscall called by a process), or
all
(the default). No system usage data is collected or reported unless the option is used. This measurement can be made in conjunction with any Caliper measurement.
This option is only available on HP-UX 11i v2 or later and with
--scope=process
measurement runs. - --user-regions=default|rum-sum
For runs involving the Itanium PMU, specifies whether the data should be
collected for the entire run
(default),
or only in regions delimited by
the PMU enable/disable instructions
(rum-sum).
For more information,
see
Limiting PMU Measurements
below.
When attaching to a process to perform
acount,
cgprof,
fcount,
or
fcover
measurements, the dependent shared libraries of the program
must be mapped as private before you can attach to the process. You can
enable private mapping of the shared libraries by using the
chatr
command with the
+dbg
enable
option on the program file (see
chatr(1)).
When attaching to a process for
acount,
cgprof,
fcount,
or
fcover
runs,
caliper
will remain attached to the target process until it exits (see also
--duration=SECONDS). Stopping
caliper
with the use of SIGINT (e.g., Ctrl-C in a terminal window) for
cgprof,
fcount,
fcover,
and
acount
runs will result in all
processes being forcibly terminated after Caliper generates a performance
report or writes data to a database. CPU Metrics EVENT_SET DescriptionCPU Metrics measurement type requires HP-UX 11i V2 September
2004 OE (B.11.23.0409) or later.
You can specify the event sets
and sampling period with the
--metrics
and
--sampling-spec
options, respectively. The
--cpu-aggregation=COUNT
option specifies how many samples will be aggregated into one
aggregated sample.
You can measure multiple event sets in the same run.
By default, the
overview
metrics consisting of the following
8 event sets will be measured:
cpi,
stall,
dispersal,
l1icache,
l1dcache,
l2cache,
tlb,
fp. Example: caliper cpu -o cpu.txt program This will run
program,
measuring and reporting the following metrics by taking one
sample every 8 milliseconds:
cpi,
stall,
dispersal,
l1icache,
l1dcache,
l2cache,
tlb,
fp.
By default 125 low-level samples will be
aggregated into one user-reported sample resulting in one aggregated
sample collected per second.
The result is saved in the text file,
cpu.txt. You can specify one or more comma separated list of predefined event
sets.
The following event sets are available.
overview
is the default. - brpath
Provides information on the dynamic mix of branch types,
branch path distribution, branch per instruction, etc. - brpred
Provides metrics that are useful in assessing the
effectiveness of branch prediction. - c2c
Provides metrics related to cache coherence activity. - cpi
Provides metrics related to Cycles Per Instruction (CPI) - cpubus
Provides information on the demand that a specific
CPU presents to the CEC chip set, and the demand the
CPU experiences due to the CEC traffic initiated by other
CPUs or I/O components in the system. - cspec
Provides metrics on the effectiveness of control speculation. - dispersal
Provides qualitative view of the parallelism that is
available as seen at instruction dispersal. - dspec
Provides metrics on the effectiveness of data speculation. - l1dcache
Provides miss rate information for the L1 data cache. - l1icache
Provides miss and prefetch usage information for the
L1 instruction cache. - l2cache
Provides miss rate information for the L2 unified cache.
Not available on dual-core processors. - l2dcache
Provides miss rate information for the L2 data cache.
Only available on dual-core processors. - l2icache
Provides miss rate information for the L2 instruction cache.
Only available on dual-core processors. - l3cache
Provides miss rate information for the L3 unified cache. - overview
Provides an overview of processor activity by collecting multiple
event sets. On non-dual-core processors, the event sets used are:
cpi,
stall,
dispersal,
l1dcache,
l1icache,
l2cache,
tlb,
fp. On dual-core processors, the event sets used are:
cpi,
stall,
dispersal,
l1dcache,
l1icache,
l2dcache,
l2icache,
tlb,
fp,
threadswitch. Note that, on dual-core processors, specifying
overview
is equivalent to specifying: --metrics=cpi,stall,dispersal,\ l1dcache,l1icache,l2dcache,l2icache,\ l2dcache,l2icache,tlb,fp,threadswitch - queues
Provide BRQ (Bus Request Queue) metrics that may give
some insight into possible system bus related
performance problems. - stall
Provides metrics on primary CPU performance limiters
by breaking the CPI into seven components. - sysbus
Provides metrics on system bus utilization.
If you specify the sysbus
event set, you must use the
--bus-speed
option to provide bus speed in MHz.
For example,
--bus-speed=200. - tlb
Provides metrics related to TLB misses. - threadswitch
Provides metrics on hyperthreading thread switch behavior.
Only available on dual-core processors.
Limiting PMU MeasurementFor measurements that involve the Itanium PMU, you can restrict measurements to
specific parts of your application.
The supported measurements are:
alat,
branch,
cycles,
dcache,
dtlb,
ecount,
fprof,
icache,
itlb,
pmu_trace,
scgprof,
traps.
By default, HP Caliper measures PMU events for your entire program.
However using
--user-regions=rum-sum
allows you to restrict measurements to
performance-sensitive regions of code. To use this feature: Modify the application source code to use the header file provided
with HP Caliper.
The default location of the header file is
caliper_root/include/caliper_control.h. In your source code, add the HP Caliper macros to enable and disable
the Itanium PMU.
To enable the PMU, insert:
CALIPER_PMU_ENABLE(); Using
CALIPER_PMU_ENABLE()
enables the PMU for the current thread until
the next
CALIPER_PMU_DISABLE().
When the PMU is already enabled,
CALIPER_PMU_ENABLE()
does not have any effect. To disable the PMU, insert:
CALIPER_PMU_DISABLE(); When the PMU is already disabled,
CALIPER_PMU_DISABLE()
does not have any effect.
Use the command-line option
--user-regions=rum-sum
or place
user_regions="rum-sum"
in a measurement configuration file. This option causes HP Caliper to allow the measured applications to control
the PMU. When specified, the PMU is initially
disabled and HP Caliper will not measure
the application until the first
CALIPER_PMU_ENABLE()
is executed.
If you do not specify the
--user-regions=rum-sum
option,
CALIPER_PMU_ENABLE()
and
CALIPER_PMU_DISABLE()
do not have any effect
and the instructions behave as no-ops. Metrics for Sorts/Cutoffs Specific to HP-UXHere is additional information on "Metrics for Sorts/Cutoffs" specific to
HP-UX. The following additional report types support the use of the following
metrics for sorting and applying cutoffs, where the default metric for
sorting is enclosed in [ ]:
- acount
arc-count,
[taken-count] - cgprof
[samples],
seconds,
call-count,
msecs-per-call - fcount
[call-count] - fcover
address,
name,
reached-count,
reached-percent,
unreached-count,
[unreached-percent]
Additional Environmental Variable on HP-UXThe following additional environment variable is available on HP-UX:
- CALIPER_HOME
Specifies the (non-default)
caliper_root
location when
caliper
is automatically invoked. This is only needed when
caliper
is not installed in its default location
(/opt/caliper)
and a program compiled
with the
+Oprofile=collect
option (profile based optimization) is run.
LimitationsThe current HP-UX version of
caliper
has the following limitations:
Only aggregated results can be produced for multi-threaded programs by the
acount,
cgprof,
ecount,
fcount,
and
fcover
measurements. Handwritten assembly functions which do not follow the standard language
runtime conventions may not be properly measured for instrumentation-based
reports:
acount,
cgprof,
fcount,
and
fcover. Only native Itanium programs produced by the HP C, C++ and Fortran 9x compilers
can be measured. PA-RISC programs, although they can run on Itanium systems,
cannot be measured. The option
--scope=system
is only supported on HP-UX B.11.23.0409 or later. The option
--scope=system
can not be used while any other
PMU measurement is running on the system. The option
--scope=system
can only be used by privileged users,
unless this security measure is disabled by setting the kernel
tunable
perfmon_allow_user_per_cpu
to the value 1. DLKM components are only listed if the user is privileged.
Function-level information is not available for
those modules.
LINUX- --kernel-path
There is no default kernel path used when sampling is done while
in kernel mode.
By default, only kernel module and function information is produced
for samples; this option must be used to display disassembled instructions
for kernel modules.
Limitations:The current Linux version of
caliper
has the following limitations:
Correlating sample data to source files created with GNU compilers requires
debug information created with the
-g
compiler option.
EXAMPLESHere are some examples of common uses of
caliper: caliper ecount program
This will run
program,
measuring and reporting the total number of Itanium instructions
executed (IA64_INST_RETIRED),
the total number of nops executed (NOPS_RETIRED)
and the total number of CPU cycles expended (CPU_CYCLES).
caliper pmu_trace --sampling-spec=10000,0,CPU_CYCLES program
This will run
program,
measuring and reporting the number of Itanium instructions executed
(IA64_INST_RETIRED), the number of nops executed (NOPS_RETIRED) and the number
of CPU cycles (CPU_CYCLES) every 10,000 cpu cycles (with no sampling variation.
The
pmu_trace
measurement default is to sample every 50,000,000
cpu cycles.
CALIPER_OPTS="--module-exclude=/usr/lib/" caliper fprof program
This will run
program,
measuring and reporting a flat profile of sampled instruction addresses,
excluding all system libraries.
caliper cstack program
This will run
program,
measuring and reporting a call stack profile by periodically sampling the
application program counter and each of its thread's call stacks.
caliper report --detail=0 fprof
This will re-report the last
fprof
measurement run with all functions included
in the report (no matter how little CPU time they used).
caliper info --detail=all L2
This will produce an information report including all details on all cpu events
with "L2" in name.
caliper info -r itlb
This will produce an information report on the itlb measurement report.
caliper scgprof -w -e10
This will collect all activity across all CPUs in the
system for a duration of 10 seconds, producing a sample-based call graph
report.
caliper ecount -pall \
sh -c '/usr/bin/ls; /usr/bin/echo done'
This will measure
/usr/bin/sh,
/usr/bin/sl,
and
/usr/bin/echo
executions.
caliper ecount --process=ls:echo \
sh -c '/usr/bin/ls; /usr/bin/echo done'
This will measure both /usr/bin/ls and /usr/bin/echo processes.
caliper ecount --process=*[ho] \
sh -c '/usr/bin/ls; /usr/bin/echo done'
This will measure both
/usr/bin/sh
and
/usr/bin/echo
processes.
caliper ecount --process='(arg1,regexp)tmp$' \
sh -c '/usr/bin/ls /var/tmp; /usr/bin/echo tmp listed'
This will measure both the
/usr/bin/ls
and
/usr/bin/echo
processes, since the regular expression
tmp$
matches their argument 1.
caliper fprof -d fprof.db1 cc -g himom.c ;
caliper fprof -d fprof.db2 cc -O himom.c ;
caliper diff -o output fprof.db1 fprof.db2
This will create a report with the difference between the data collected
in the two collection runs.
caliper advise DB1 DB2
This will analyze the data in HP Caliper databases DB1 and DB2,
and make suggestions for performance improvements.
HP-UX ONLY EXAMPLEScaliper cgprof program
This will run
program,
measuring and reporting an extended
gprof-like
call graph profile.
caliper cgprof --html=HTML program
This performs the same cgprof performance measurement as the previous example
but produces an HTML-formatted report in directory
HTML
for browsing.
AUTHORHP Caliper was developed by the Hewlett-Packard Company. FILES- caliper_root
Anchor location of
caliper
installation, default
/opt/caliper
(HP-UX) or
/opt/hp-caliper
(Linux). - caliper_root/LICENSE
The
caliper
license terms. - caliper_root/THIRDPARTYLICENSEREADME.txt
Contains the license terms for third-party software used by
caliper. - caliper_root/bin/caliper
caliper
executable. - caliper_root/config/
Directory containing standard measurement configuration files. - caliper_root/contrib/
Holds useful contributed files. - caliper_root/doc/
Online documentation directory. - caliper_root/examples/
Example files. - caliper_root/gui/
Contains the local GUI client files. - caliper_root/gui_clients/
Remote GUI client installation files. - caliper_root/html/
Contains support files for generated HTML measurement reports. - caliper_root/lib/python2.3/
Python support directory. - caliper_root/man/
Manpage directory. - caliper_root/rules/
Shared analysis rules directory. - ~/caliper_advisor/
User personal analysis rules directory. - ./.hp_caliper_databases/
Implicit databases storage directory.
SEE ALSOaCC(1),
cc(1),
chatr(1),
f90(1),
ld(1). Online help is available at
caliper_root/doc/index.html
(HTML format). The online
HP Caliper User Guide
is located at
caliper_root/doc/caliperug.pdf
(PDF version) and
caliper_root/doc/html/caliper/C/caliperug.hmtl
(HTML version). The online
HP Caliper Rule Writer Guide
is located at
caliper_root/doc/rule_writer_guide.pdf
(PDF version). There are detailed information files describing each HP Caliper measurement
report at
caliper_root/doc/text/*.help There are complete lists of
cpu_event
values with full descriptions in
caliper_root/doc/text/itanium_cpu_counters.txt
and
caliper_root/doc/text/itanium2_cpu_counters.txt. cpu_event
values (PMU events) are also described in the
Intel(R) Itanium(tm) Processor Reference Manual for Software Development
document. The
caliper
website is at
http://www.hp.com/go/caliper
and contains additional technical information and updates. Reference and context-sensitive help for the HP Caliper GUI is available by
selecting the "Help Contents" or "Context-sensitive Help" items in the "Help"
menu.
|