How to Use BCC Kernel Tracing Tools for Linux Performance Analysis

February 15, 2017 | By in TROUBLESHOOTING
| Reply More

Greetings, BCC is now available for Ubuntu and in this article we will be showing you its extended command line use to trace Linux systems for performance monitoring. The BPF Compiler Collection (BCC) is a toolkit for building kernel tracing tools that leverage the functionality provided by the Linux extended Berkeley Packet Filters (BPF). BPF guarantees that the programs loaded into the kernel cannot crash, and cannot run forever, but yet BPF is general purpose enough to perform many arbitrary types of computation. BCC makes eBPF programs easier to write, with kernel instrumentation in C and a front-end in Python. It is suited for many tasks, including performance analysis and network traffic control. eBPF enhances Linux tracing, allowing mini programs to be executed on tracing events and lets you tag events with custom timestamps, store histograms, filter events and emit summarized info to user-level. These capabilities gives the information with the lowest possible overhead cost.

Prerequisites:

In order to install BCC, a Linux kernel version 4.1 or newer is required. If your kernel version is other than the mentioned version then it should have been compiled with the following flags set in the '/boot/config-4.4.0-21-generic' .

CONFIG_BPF=y
CONFIG_BPF_SYSCALL=y
# [optional, for tc filters]
CONFIG_NET_CLS_BPF=m
# [optional, for tc actions]
CONFIG_NET_ACT_BPF=m
CONFIG_BPF_JIT=y
CONFIG_HAVE_BPF_JIT=y
# [optional, for kprobes]
CONFIG_BPF_EVENTS=y

Installing BCC (BPF Compiler Collection) on Ubuntu 16:

In Ubuntu 16.04 need to upgrade or change kernel parameters, just run the following commands to add its source key in the repository and then run system update.

# echo "deb [trusted=yes] https://repo.iovisor.org/apt/xenial xenial-nightly main" | sudo tee /etc/apt/sources.list.d/iovisor.list

# apt-get update

Once the system the updated, run below command to install bcc along with some other libraries.

# apt-get install binutils bcc bcc-tools libbcc-examples python-bcc

bcc installation

BCC tools are also available as a snap and can be quickly and easily installed by just using the following command on your command line terminal.

# snap install --devmode bcc

After the installation is complete, we can test it with the following python commands.

# python /usr/share/bcc/examples/hello_world.py

You will get the image like shown below.

-806 [001] d... 6685.522352: : Hello, World!
-36641 [000] d... 6685.554148: : Hello, World!
-36641 [000] d... 6695.999935: : Hello, World!
-36644 [000] d... 6696.004081: : Hello, World!
-36645 [000] d... 6696.007041: : Hello, World!
-36646 [000] d... 6696.010295: : Hello, World!
-36646 [000] d... 6696.011882: : Hello, World!
-36646 [000] d... 6696.013417: : Hello, World!
-36645 [000] d... 6696.014687: : Hello, World!
-36650 [001] d... 6696.015892: : Hello, World!
-36650 [001] d... 6696.016025: : Hello, World!

# python /usr/share/bcc/examples/tracing/task_switch.py

bcc testing

Using BCC Tools in Ubuntu:

Currently there are about 70 BCC tools available after its installation. Next we will be showing you the usage of most important tools to and tracings.

1) ardist

Display function parameter values as a histogram or frequency count. To get the detailed help about using this tool you use below command.

# argdist -h

argdist help

For example you wanted a histogram of buffer sizes passed to the write()
function across the system, run the following command along with given parameters.

# ./argdist -c -H 'p:c:write(int fd, void *buf, size_t len):size_t:len'

[17:14:42]
len : count distribution
0 -> 1 : 0 | |
2 -> 3 : 0 | |
4 -> 7 : 0 | |
8 -> 15 : 1 |****************************************|
16 -> 31 : 0 | |
32 -> 63 : 0 | |
64 -> 127 : 1 |****************************************|
[17:14:43]
len : count distribution
0 -> 1 : 0 | |
2 -> 3 : 0 | |
4 -> 7 : 0 | |
8 -> 15 : 2 |******** |
16 -> 31 : 0 | |
32 -> 63 : 1 |**** |
64 -> 127 : 10 |****************************************|
128 -> 255 : 3 |************ |
256 -> 511 : 1 |**** |
[17:14:44]

It shows that most writes fall into three buckets: very small writes of 2-3 bytes, medium writes of 32-63 bytes, and larger writes of 64-127 bytes.

2) bashreadline

The bashreadline tool prints bash commands from all running bash shells on the system. To use this tool simply run the command as shown below and you will get a list of all running bash commands.

# ./bashreadline

TIME PID COMMAND
11:28:35 21176 echo hello world
11:28:43 21176 foo this command failed
11:28:45 21176 df -h
11:29:04 3059 echo another shell
11:29:13 21176 echo first shell again

3) biolatency

The biolatency traces block device I/O (disk I/O), and records the distribution of I/O latency (time), printing this as a histogram when Ctrl-C is hit.

# ./biolatency

Tracing block device I/O... Hit Ctrl-C to end.
^C
usecs : count distribution
0 -> 1 : 0 | |
2 -> 3 : 0 | |
4 -> 7 : 0 | |
8 -> 15 : 0 | |
16 -> 31 : 0 | |
32 -> 63 : 0 | |
64 -> 127 : 0 | |
128 -> 255 : 0 | |
256 -> 511 : 2 |****************************************|

For efficiency, biolatency uses an in-kernel eBPF map to store timestamps with requests, and another in-kernel map to store the histogram (the "count") column, which is copied to user-space only when output is printed. These methods lower the performance overhead when tracing is performed.

4) biotop

Short for block device I/O top, biotop summarizes which processes are performing disk I/O. It's top for disks and will be showing the results in the following format.

# ./biotop

Tracing... Output every 1 secs. Hit Ctrl-C to end

18:33:46 loadavg: 0.00 0.03 0.05 1/303 38817

PID COMM D MAJ MIN DISK I/O Kbytes AVGms
24301 cksum R 202 1 sda 361 28832 3.39
7961 dd R 202 1 sda 1628 13024 0.59

5) biosnop

The biosnoop traces block device I/O (disk I/O), and prints a line of output per I/O. This includes the PID and comm (process name) that were on-CPU at the time of issue (which usually means the process responsible). The latency of the disk I/O, measured from the issue to the device to its completion, is included as the last column as shown below.

# ./biosnoop

TIME(s) COMM PID DISK T SECTOR BYTES LAT(ms)
0.000000000 ? 0 R -1 8 0.26
1.771303000 jbd2/sda1-8 192 sda W 17273608 8192 27.53
1.772083000 jbd2/sda1-8 192 sda W 17273624 4096 0.67
2.047987000 ? 0 R -1 8 0.26
4.095943000 ? 0 R -1 8 0.24
6.144020000 ? 0 R -1 8 0.26
8.191950000 ? 0 R -1 8 0.25
10.239991000 ? 0 R -1 8 0.27
12.288032000 ? 0 R -1 8 0.27
14.336013000 ? 0 R -1 8 0.25
16.384132000 ? 0 R -1 8 0.40
18.431956000 ? 0 R -1 8 0.24
20.479998000 ? 0 R -1 8 0.26
22.527986000 ? 0 R -1 8 0.28
24.576031000 ? 0 R -1 8 0.29
26.623940000 ? 0 R -1 8 0.24
28.671949000 ? 0 R -1 8 0.24
30.719936000 ? 0 R -1 8 0.25
32.767940000 ? 0 R -1 8 0.25

This example output is from what should be an idle system, however, the following is visible in iostat output.

# iostat -x 1

Linux 4.4.0-21-generic (k-vm) 02/01/2017 _x86_64_ (2 CPU)

avg-cpu: %user %nice %system %iowait %steal %idle
0.82 1.20 0.37 0.52 0.00 97.10

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
loop0 0.00 0.00 0.01 0.00 0.10 0.00 31.32 0.00 0.14 0.14 0.00 0.06 0.00
loop1 0.00 0.00 0.01 0.00 0.03 0.00 11.23 0.00 0.03 0.03 0.00 0.03 0.00
sda 0.02 6.54 1.74 4.65 45.87 214.31 81.50 0.19 30.48 9.25 38.42 2.70 1.72

avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 0.00 0.00 0.00 100.00

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
loop0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
loop1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

6) bitesize

The aim of this tool is to show I/O distribution for requested block sizes, by process name.

# ./bitesize

Tracing... Hit Ctrl-C to end.
^C
Process Name = 'jbd2/sda1-8'
Kbytes : count distribution
0 -> 1 : 1 |****************************************|
2 -> 3 : 0 | |
4 -> 7 : 1 |****************************************|
8 -> 15 : 1 |****************************************|

7) capable

Capable traces calls to the kernel cap_capable() function, which does security capability checks, and prints details for each call.

# ./capable

TIME UID PID COMM CAP NAME AUDIT
19:20:34 0 39820 capable 21 CAP_SYS_ADMIN 1
19:20:34 0 39820 capable 21 CAP_SYS_ADMIN 1
19:21:09 100 6022 systemd-timesyn 12 CAP_NET_ADMIN 1
19:22:09 0 647 systemd-logind 12 CAP_NET_ADMIN 1
19:22:09 0 17302 systemd-udevd 12 CAP_NET_ADMIN 1

The output above includes various capability checks: snmpd checking CAP_NET_ADMIN, run checking CAP_SYS_RESOURCES, then some short-lived processes
checking CAP_FOWNER, CAP_FSETID, etc. This can be useful for general debugging, and also security enforcement like determining a whitelist of capabilities an application needs.

8) cachestat

cachestat can be used to displays one line of summarized statistics from the system cache every second. This enables for system tuning operations by pointing out low cache hit ratio and high rate of misses.

# ./cachestat

HITS MISSES DIRTIES READ_HIT% WRITE_HIT% BUFFERS_MB CACHED_MB
0 0 0 0.0% 0.0% 170 2156
3 3 1 33.3% 33.3% 170 2156
0 0 0 0.0% 0.0% 170 2156
0 0 0 0.0% 0.0% 170 2156
0 0 0 0.0% 0.0% 170 2156
0 0 0 0.0% 0.0% 170 2156
0 0 0 0.0% 0.0% 170 2156
^C 0 0 0 0.0% 0.0% 170 2156

9) cachetop

cachetop show Linux page cache hit/miss statistics including read and write hit % per processes in a UI like top.

# ./cachetop

20:07:24 Buffers MB: 171 / Cached MB: 2156 / Sort: HITS / Order: ascending
PID UID CMD HITS MISSES DIRTIES READ_HIT% WRITE_HIT%
39844 root python 1 0 0 100.0% 0.0%

10) cpudist

This program summarizes task on-CPU time as a histogram, showing how long tasks spent on the CPU before being descheduled. This provides valuable information that can indicate oversubscription (too many tasks for too few processors), overhead due to excessive context switching (e.g. a common shared lock for multiple threads), uneven workload distribution, too-granular tasks, and more.

# ./cpudist

Tracing on-CPU time... Hit Ctrl-C to end.
^C
usecs : count distribution
0 -> 1 : 0 | |
2 -> 3 : 1 | |
4 -> 7 : 3 | |
8 -> 15 : 1 | |
16 -> 31 : 4 | |
32 -> 63 : 2 | |
64 -> 127 : 4 | |
128 -> 255 : 5 | |
256 -> 511 : 9 | |
512 -> 1023 : 24 | |
1024 -> 2047 : 40 | |
2048 -> 4095 : 1615 |****************************************|
4096 -> 8191 : 9

Following are some other commands that you can use to monitoring your system CPU.
To summarize off-CPU time as a histogram

# ./cpudist -O

To print 1 second summaries, 10 times

# ./cpudist 1 10

1s summaries, milliseconds, and timestamps

# ./cpudist -mT 1

To show each PID separately.

# ./cpudist -P

To trace PID 185 only

# ./cpudist -p 185

11) dcsnoop

dcsnoop traces directory entry cache (dcache) lookups, and can be used for further investigation beyond dcstat(8). The output is likely verbose, as dcache lookups are likely frequent. By default, only failed lookups are shown.

# ./dcsnoop

TIME(s) PID COMM T FILE
1.777623 39844 python M meminfo
6.787286 39844 python M meminfo
8.912472 812 irqbalance M interrupts
8.912550 812 irqbalance M stat
11.795628 39844 python M meminfo
16.803827 39844 python M meminfo
18.912481 812 irqbalance M interrupts
18.912570 812 irqbalance M stat
21.812077 39844 python M meminfo
26.820298 39844 python M meminfo
28.912554 812 irqbalance M interrupts
28.912629 812 irqbalance M stat
31.828839 39844 python M meminfo

The output shows the processes, the type of event ("T" column: M == miss, R == reference), and the filename for the dcache lookup.

12) execsnoop

The execsnoop traces new processes. For example, tracing the commands invoked when running "man rm".

# ./execsnoop

execsnoop

The output shows the parent process/command name (PCOMM), the PID, the return value of the exec() (RET), and the filename with arguments (ARGS).

13) ext4dist

ext4dist traces ext4 reads, writes, opens, and fsyncs, and summarizes their latency as a power-of-2 histogram.

# ./ext4dist

Tracing ext4 operation latency... Hit Ctrl-C to end.
^C
operation = 'read'
usecs : count distribution
0 -> 1 : 44 |****************************************|
2 -> 3 : 20 |****************** |
4 -> 7 : 1 | |

operation = 'write'
usecs : count distribution
0 -> 1 : 0 | |
2 -> 3 : 0 | |
4 -> 7 : 0 | |
8 -> 15 : 2 |****************************************|
16 -> 31 : 1 |******************** |

operation = 'open'
usecs : count distribution
0 -> 1 : 46 |****************************************|
2 -> 3 : 0 | |
4 -> 7 : 2 |* |

The output shows a bi-modal distribution for read latency, with a faster mode of less than 7 microseconds. The count column shows how many events fell into that latency range. It's likely that the faster mode was a hit from the in-memory file system cache, and the slower mode is a read from a storage device (disk). This "latency" is measured from when the operation was issued from the VFS interface to the file system, to when it completed.

14) ext4slower

Ext4slower tool traces common ext4 file operations like reads, writes, opens, and syncs. It measures the time spent in these operations, and prints, details for each that exceeded a threshold.
By default, a minimum millisecond threshold of 10 is used. If a threshold of 0 is used, all events are printed. Trace synchronous file reads and writes slower than 10 ms.

#./ext4slower

To trace slower than 1 ms:

# ./ext4slower 1

Trace slower than 1 ms, and output just the fields in parsable format (csv), use below command.

# ./ext4slower -j 1

To trace all file reads and writes run below.

# ./ext4slower 0

Trace slower than 1 ms, for PID 1182 only:

# ext4slower -p 1182 1

ext4slower

15) filelife

filelife traces short-lived files, those that have been created and then deleted while tracing.

# ./filelife

TIME PID COMM AGE(s) FILE
14:35:01 41399 cron 0.00 tmpfnq6U0D
14:45:01 41403 cron 0.00 tmpf4FoYPW

This has caught short-lived files that were created during a Linux kernel build. The PID shows the process ID that finally deleted the file, and COMM
is its process name. The AGE(s) column shows the age of the file, in seconds, when it was deleted. These are all short-lived, and existed for less than
one tenth of a second.

16) filetop

filetop shows reads and writes by file, with process details.

# ./filetop

14:47:25 loadavg: 0.00 0.01 0.05 1/302 41425

TID COMM READS WRITES R_Kb W_Kb T FILE
41425 clear 2 0 8 0 R xterm
812 irqbalance 4 0 4 0 R interrupts
812 irqbalance 4 0 4 0 R stat
41409 filetop 2 0 2 0 R loadavg
41425 clear 1 0 0 0 R libc-2.23.so
41425 clear 1 0 0 0 R libtinfo.so.5.9
41425 filetop 3 0 0 0 R clear
41425 filetop 2 0 0 0 R ld-2.23.so

This shows various files read and written during a Linux kernel build. Theoutput is sorted by the total read size in Kbytes (R_Kb). This is instrumenting at the VFS interface, so this is reads and writes that may return entirely from the file system cache (page cache).

17) funccount

This program traces functions, tracepoints, or USDT probes that match a specified pattern, and when Ctrl-C is hit prints a summary of their count
while tracing. Eg, tracing all kernel functions that begin with "vfs_".

# ./funccount 'vfs_*'

Tracing 46 functions for "vfs_*"... Hit Ctrl-C to end.
^C
FUNC COUNT
vfs_statfs 2
vfs_unlink 2
vfs_create 2
vfs_lock_file 4
vfs_writev 90
vfs_fstatat 156
vfs_write 396
vfs_fstat 409
vfs_open 456
vfs_getattr 565
vfs_getattr_nosec 565
vfs_read 1600

This is useful for exploring kernel code, to figure out which functions are in use and which are not. This can narrow down an investigation to just a few functions, whose counts are similar to the workload investigated.

# ./funccount 'tcp_*'

Tracing 276 functions for "tcp_*"... Hit Ctrl-C to end.
^C
FUNC COUNT
tcp_delack_timer_handler 1
tcp_update_pacing_rate 1
tcp_send_delayed_ack 1
tcp_delack_timer 1
tcp_rack_advance 1
tcp_cwnd_restart 2
tcp_queue_rcv 2
tcp_check_space 2
tcp_event_data_recv 2
tcp_push 3
tcp_sendmsg 3
tcp_init_tso_segs 3
tcp_rearm_rto.part.56 4
tcp_release_cb 5
tcp_options_write 5
tcp_v4_md5_lookup 8
tcp_established_options 8
tcp_md5_do_lookup 11
tcp_poll 14

18) opensnoop

opensnoop traces the open() syscall system-wide, and prints various details.

# ./opensnoop

PID COMM FD ERR PATH
812 irqbalance 3 0 /proc/interrupts
812 irqbalance 3 0 /proc/stat
812 irqbalance 3 0 /proc/interrupts
812 irqbalance 3 0 /proc/stat
812 irqbalance 3 0 /proc/interrupts
812 irqbalance 3 0 /proc/stat
1 systemd 21 0 /proc/6022/cgroup
1 systemd 21 0 /proc/647/cgroup
1 systemd 21 0 /proc/17302/cgroup
1 systemd 21 0 /proc/225/cgroup
812 irqbalance 3 0 /proc/interrupts
812 irqbalance 3 0 /proc/stat
812 irqbalance 3 0 /proc/interrupts
812 irqbalance 3 0 /proc/stat

While tracing, the systemd process opened various /proc files (reading metrics), and a "run" process read various libraries and config files. opensnoop can be useful for discovering configuration and log files, if used during application startup.

19) pidpersec

This shows the number of new processes created per second, measured by tracing the kernel fork() routine.

# ./pidpersec

Tracing... Ctrl-C to end.
16:27:27: PIDs/sec: 0
16:27:28: PIDs/sec: 0
16:27:29: PIDs/sec: 0
16:27:30: PIDs/sec: 0
16:27:31: PIDs/sec: 0
16:27:32: PIDs/sec: 0
16:27:33: PIDs/sec: 0
16:27:34: PIDs/sec: 0
16:27:35: PIDs/sec: 0
16:27:36: PIDs/sec: 0
16:27:37: PIDs/sec: 0

Each second there is no new processes (this happens to be caused by a launcher script that is retrying in a loop, and encountering errors). When you run any command in the other session you will be seeing an increase in its process number.

20) statsnoop

statsnoop traces the different stat() syscalls system-wide, and prints various details.

# ./statsnoop

PID COMM FD ERR PATH
640 cron 0 0 /etc/localtime
640 cron 0 0 crontabs
640 cron 0 0 /etc/crontab
640 cron 0 0 /etc/cron.d
640 cron 0 0 /etc/cron.d/anacron
640 cron 0 0 /etc/cron.d/popularity-contest
640 cron 0 0 /etc/cron.d/sysstat
640 cron 0 0 /etc/localtime
640 cron 0 0 crontabs
640 cron 0 0 /etc/crontab
640 cron 0 0 /etc/cron.d
640 cron 0 0 /etc/cron.d/anacron
640 cron 0 0 /etc/cron.d/popularity-contest
640 cron 0 0 /etc/cron.d/sysstat

statsnoop can be used for general debugging, to see what file information has been requested, and whether those files exist. It can be used as a companion to opensnoop, which shows what files were actually opened.

21) syncsnoop

This program traces calls to the kernel sync() routine, with basic timestamps.

# ./syncsnoop

TIME(s) CALL
134548.6152 sync()
134151.5339 sync()
^C

While tracing, the "sync" command was executed in another server session. This can be useful to identify that sync() is being called, and its frequency.

22) tcptop

tcptop summarizes throughput by host and port.

# ./tcptop

16:53:21 loadavg: 0.00 0.01 0.05 1/301 41994

PID COMM LADDR RADDR RX_KB TX_KB
36670 sshd 192.168.10.14:22 192.168.9.7:62379 0 0

This example output shows two listings of TCP connections, for IPv4 and IPv6. If there is only traffic for one of these, then only one group is shown.

23) trace

Trace arbitrary functions, with filters. trace probes functions you specify and displays trace messages if a particular
condition is met. You can control the message format to display function arguments and return values.

For example, suppose you want to trace all commands being exec'd across the system.

# ./trace 'sys_execve "%s", arg1'

PID TID COMM FUNC -
42200 42200 bash sys_execve /bin/ls
42201 42201 bash sys_execve /usr/bin/man
42208 42208 man sys_execve /usr/local/sbin/preconv
42208 42208 man sys_execve /usr/local/bin/preconv
42208 42208 man sys_execve /usr/sbin/preconv
42208 42208 man sys_execve /usr/bin/preconv
42212 42212 man sys_execve /usr/local/sbin/pager
42212 42212 man sys_execve /usr/local/bin/pager
42212 42212 man sys_execve /usr/sbin/pager
42212 42212 man sys_execve /usr/bin/pager
42210 42210 man sys_execve /usr/local/sbin/nroff
42210 42210 man sys_execve /usr/local/bin/nroff

The ::sys_execve syntax specifies that you want an entry probe (which is the default), in a kernel function (which is the default) called sys_execve. Next, the format string to print is simply "%s", which prints a string. Finally, the value to print is the first argument to the sys_execve function, which happens to be the command that is exec'd. The above trace was generated by executing "man ls" in a separate shell. As you see, man executes a number of additional programs to finally display the man page.

24) tcpconnect

It traces active TCP connections (eg, via a connect() syscall), and shows the latency (time) for the connection as measured locally. This is a useful performance metric that typically spans kernel TCP/IP processing and the network round trip time (not application runtime). This tool works by use of kernel dynamic tracing of TCP/IP functions, and will need updating to match any changes to these functions. This tool should be updated in the future to use static tracepoints, once they are available.

Run the command below to get more help about its its usage.

# ./tcpconnect -h

tcpconnect

Conclusion

BCC is one of the nice tools kit to trace and monitor Linux systems performance level. Its provides more than 50 system and network monitoring tools that can used with different parameters to trace teh required information in the system. BCC comes with easy installation on any Linux system, while there is enough support manuals are available to get help about its available tools and techniques. I hope you got this article much helpful, let's start using it in your own system and share your experience.

Filed Under : LINUX HOWTO, TROUBLESHOOTING

Free Linux Ebook to Download

Leave a Reply

All comments are subject to moderation.