Performance

為什麼perf stat顯示0個上下文切換?

  • April 22, 2018

我在 下執行了一個 shell 管道perf stattaskset 0x1用於將整個管道固定到單個 CPU。我知道taskset 0x1有效果,因為它使管道的吞吐量增加了一倍以上。但是,perf stat顯示 0 管道的不同程序之間的上下文切換。

perf stat那麼上下文切換到底是什麼意思呢?

我想我對管道中各個任務的上下文切換數量感興趣。有更好的方法來衡量嗎?

這是在比較的上下文中dd bs=1M </dev/zero,to dd bs=1M </dev/zero | dd bs=1M >/dev/null。如果我可以根據需要測量上下文切換,我認為它有助於量化為什麼第一個版本比第二個版本“效率”高幾倍。

$ rpm -q perf
perf-4.15.0-300.fc27.x86_64
$ uname -r
4.15.17-300.fc27.x86_64

$ perf stat taskset 0x1 sh -c 'dd bs=1M </dev/zero | dd bs=1M >/dev/null'
^C18366+0 records in
18366+0 records out
19258146816 bytes (19 GB, 18 GiB) copied, 5.0566 s, 3.8 GB/s

Performance counter stats for 'taskset 0x1 sh -c dd if=/dev/zero bs=1M | dd bs=1M of=/dev/null':

      5059.273255      task-clock:u (msec)       #    1.000 CPUs utilized          
                0      context-switches:u        #    0.000 K/sec                  
                0      cpu-migrations:u          #    0.000 K/sec                  
              414      page-faults:u             #    0.082 K/sec                  
       36,915,934      cycles:u                  #    0.007 GHz                    
        9,511,905      instructions:u            #    0.26  insn per cycle         
        2,480,746      branches:u                #    0.490 M/sec                  
          188,295      branch-misses:u           #    7.59% of all branches        

      5.061473119 seconds time elapsed

$ perf stat sh -c 'dd bs=1M </dev/zero | dd bs=1M >/dev/null'
^C6637+0 records in
6636+0 records out
6958350336 bytes (7.0 GB, 6.5 GiB) copied, 4.04907 s, 1.7 GB/s
6636+0 records in
6636+0 records out
6958350336 bytes (7.0 GB, 6.5 GiB) copied, 4.0492 s, 1.7 GB/s
sh: Interrupt

Performance counter stats for 'sh -c dd if=/dev/zero bs=1M | dd bs=1M of=/dev/null':

      3560.269345      task-clock:u (msec)       #    0.878 CPUs utilized          
                0      context-switches:u        #    0.000 K/sec                  
                0      cpu-migrations:u          #    0.000 K/sec                  
              355      page-faults:u             #    0.100 K/sec                  
       32,302,387      cycles:u                  #    0.009 GHz                    
        4,823,855      instructions:u            #    0.15  insn per cycle         
        1,167,126      branches:u                #    0.328 M/sec                  
           88,982      branch-misses:u           #    7.62% of all branches        

      4.052844128 seconds time elapsed

由於您不是 root,因此 perf 默默地無法計算上下文切換。

(Linux 有 64k 管道緩衝區。在任何一種情況下,您都可以看到每 64k 傳輸非常接近 2 個上下文切換。不完全確定它是如何工作的,但我懷疑它只是計算dd另一個dd或到另一個的上下文切換該 cpu 的空閒任務)。

$ sudo perf stat taskset 0x1 sh -c 'dd bs=1M </dev/zero|dd bs=1M >/dev/null'
^C14508+0 records in
14507+0 records out
15211692032 bytes (15 GB, 14 GiB) copied, 3.87098 s, 3.9 GB/s
14508+0 records in
14508+0 records out
15212740608 bytes (15 GB, 14 GiB) copied, 3.87044 s, 3.9 GB/s
taskset: Interrupt

Performance counter stats for 'taskset 0x1 sh -c dd bs=1M </dev/zero|dd bs=1M >/dev/null':

      3872.597645      task-clock (msec)         #    1.000 CPUs utilized          
          464,325      context-switches          #    0.120 M/sec                  
                0      cpu-migrations            #    0.000 K/sec                  
              928      page-faults               #    0.240 K/sec                  
   11,099,016,844      cycles                    #    2.866 GHz                    
   13,765,220,898      instructions              #    1.24  insn per cycle         
    3,053,464,009      branches                  #  788.480 M/sec                  
       15,462,959      branch-misses             #    0.51% of all branches        

      3.874121023 seconds time elapsed

$ echo $((15212740608 / 464325))
32763

$ sudo perf stat sh -c 'dd bs=1M </dev/zero|dd bs=1M >/dev/null'
^C7031+0 records in
7031+0 records out
7032+0 records in
7031+0 records out
7372537856 bytes (7.4 GB, 6.9 GiB) copied, 4.27436 s, 1.7 GB/s7372537856 bytes (7.4 GB, 6.9 GiB) copied, 4.27414 s, 1.7 GB/s

sh: Interrupt

Performance counter stats for 'sh -c dd bs=1M </dev/zero|dd bs=1M >/dev/null':

      3736.056509      task-clock (msec)         #    0.873 CPUs utilized          
          218,047      context-switches          #    0.058 M/sec                  
              206      cpu-migrations            #    0.055 K/sec                  
              877      page-faults               #    0.235 K/sec                  
    8,328,413,541      cycles                    #    2.229 GHz                    
    7,617,859,285      instructions              #    0.91  insn per cycle         
    1,671,904,009      branches                  #  447.505 M/sec                  
       13,827,669      branch-misses             #    0.83% of all branches        

      4.277591869 seconds time elapsed

$ echo $((7372537856 / 218047))
33811

引用自:https://unix.stackexchange.com/questions/439260