Performance為什麼
為什麼perf stat
顯示0個上下文切換?
我在 下執行了一個 shell 管道
perf stat
,taskset 0x1
用於將整個管道固定到單個 CPU。我知道taskset 0x1
有效果,因為它使管道的吞吐量增加了一倍以上。但是,perf stat
顯示 0 管道的不同程序之間的上下文切換。
perf stat
那麼上下文切換到底是什麼意思呢?我想我對管道中各個任務的上下文切換數量感興趣。有更好的方法來衡量嗎?
這是在比較的上下文中
dd bs=1M </dev/zero
,todd bs=1M </dev/zero | dd bs=1M >/dev/null
。如果我可以根據需要測量上下文切換,我認為它有助於量化為什麼第一個版本比第二個版本“效率”高幾倍。$ rpm -q perf perf-4.15.0-300.fc27.x86_64 $ uname -r 4.15.17-300.fc27.x86_64 $ perf stat taskset 0x1 sh -c 'dd bs=1M </dev/zero | dd bs=1M >/dev/null' ^C18366+0 records in 18366+0 records out 19258146816 bytes (19 GB, 18 GiB) copied, 5.0566 s, 3.8 GB/s Performance counter stats for 'taskset 0x1 sh -c dd if=/dev/zero bs=1M | dd bs=1M of=/dev/null': 5059.273255 task-clock:u (msec) # 1.000 CPUs utilized 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 414 page-faults:u # 0.082 K/sec 36,915,934 cycles:u # 0.007 GHz 9,511,905 instructions:u # 0.26 insn per cycle 2,480,746 branches:u # 0.490 M/sec 188,295 branch-misses:u # 7.59% of all branches 5.061473119 seconds time elapsed $ perf stat sh -c 'dd bs=1M </dev/zero | dd bs=1M >/dev/null' ^C6637+0 records in 6636+0 records out 6958350336 bytes (7.0 GB, 6.5 GiB) copied, 4.04907 s, 1.7 GB/s 6636+0 records in 6636+0 records out 6958350336 bytes (7.0 GB, 6.5 GiB) copied, 4.0492 s, 1.7 GB/s sh: Interrupt Performance counter stats for 'sh -c dd if=/dev/zero bs=1M | dd bs=1M of=/dev/null': 3560.269345 task-clock:u (msec) # 0.878 CPUs utilized 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 355 page-faults:u # 0.100 K/sec 32,302,387 cycles:u # 0.009 GHz 4,823,855 instructions:u # 0.15 insn per cycle 1,167,126 branches:u # 0.328 M/sec 88,982 branch-misses:u # 7.62% of all branches 4.052844128 seconds time elapsed
由於您不是 root,因此 perf 默默地無法計算上下文切換。
(Linux 有 64k 管道緩衝區。在任何一種情況下,您都可以看到每 64k 傳輸非常接近 2 個上下文切換。不完全確定它是如何工作的,但我懷疑它只是計算從到
dd
另一個dd
或到另一個的上下文切換該 cpu 的空閒任務)。$ sudo perf stat taskset 0x1 sh -c 'dd bs=1M </dev/zero|dd bs=1M >/dev/null' ^C14508+0 records in 14507+0 records out 15211692032 bytes (15 GB, 14 GiB) copied, 3.87098 s, 3.9 GB/s 14508+0 records in 14508+0 records out 15212740608 bytes (15 GB, 14 GiB) copied, 3.87044 s, 3.9 GB/s taskset: Interrupt Performance counter stats for 'taskset 0x1 sh -c dd bs=1M </dev/zero|dd bs=1M >/dev/null': 3872.597645 task-clock (msec) # 1.000 CPUs utilized 464,325 context-switches # 0.120 M/sec 0 cpu-migrations # 0.000 K/sec 928 page-faults # 0.240 K/sec 11,099,016,844 cycles # 2.866 GHz 13,765,220,898 instructions # 1.24 insn per cycle 3,053,464,009 branches # 788.480 M/sec 15,462,959 branch-misses # 0.51% of all branches 3.874121023 seconds time elapsed $ echo $((15212740608 / 464325)) 32763 $ sudo perf stat sh -c 'dd bs=1M </dev/zero|dd bs=1M >/dev/null' ^C7031+0 records in 7031+0 records out 7032+0 records in 7031+0 records out 7372537856 bytes (7.4 GB, 6.9 GiB) copied, 4.27436 s, 1.7 GB/s7372537856 bytes (7.4 GB, 6.9 GiB) copied, 4.27414 s, 1.7 GB/s sh: Interrupt Performance counter stats for 'sh -c dd bs=1M </dev/zero|dd bs=1M >/dev/null': 3736.056509 task-clock (msec) # 0.873 CPUs utilized 218,047 context-switches # 0.058 M/sec 206 cpu-migrations # 0.055 K/sec 877 page-faults # 0.235 K/sec 8,328,413,541 cycles # 2.229 GHz 7,617,859,285 instructions # 0.91 insn per cycle 1,671,904,009 branches # 447.505 M/sec 13,827,669 branch-misses # 0.83% of all branches 4.277591869 seconds time elapsed $ echo $((7372537856 / 218047)) 33811