Linux
使用和不使用 %M 選項顯示權限的命令 find 的巨大性能差異
在我的 CentOS 7.6 上,我通過執行以下命令創建了一個包含 3,000,000 個文件的文件夾(稱為 many_files):
for i in {1..3000000}; do echo $i>$i; done;
我正在使用命令
find
將有關此目錄中文件的資訊寫入文件。這工作得非常快:$ time find many_files -printf '%i %y %p\n'>info_file real 0m6.970s user 0m3.812s sys 0m0.904s
現在,如果我添加
%M
以獲取權限:$ time find many_files -printf '%i %y %M %p\n'>info_file real 2m30.677s user 0m5.148s sys 0m37.338s
該命令需要更長的時間。這讓我很驚訝,因為在 C 程序中我們可以
struct stat
用來獲取文件的 inode 和權限資訊,並且在核心中struct inode
保存了這兩個資訊。我的問題:
- 是什麼導致了這種行為?
- 有沒有更快的方法來獲得這麼多文件的文件權限?
第一個版本只需要
readdir(3)
/getdents(2)
目錄,當在支持此功能的文件系統上執行時(ext4:filetype
功能顯示為tune2fs -l /dev/xxx
,xfs:ftype=1
顯示為xfs_info /mount/point
…)。此外,第二個版本還需要
stat(2)
每個文件,需要額外的 inode 查找,因此在文件系統和設備上進行更多的查找,如果它是旋轉磁碟並且沒有保留記憶體,可能會更慢。僅查找名稱、inode 和文件stat
類型時不需要這樣做,因為目錄條目就足夠了:The linux_dirent structure is declared as follows: struct linux_dirent { unsigned long d_ino; /* Inode number */ unsigned long d_off; /* Offset to next linux_dirent */ unsigned short d_reclen; /* Length of this linux_dirent */ char d_name[]; /* Filename (null-terminated) */ /* length is actually (d_reclen - 2 - offsetof(struct linux_dirent, d_name)) */ /* char pad; // Zero padding byte char d_type; // File type (only since Linux // 2.6.4); offset is (d_reclen - 1) */ }
相同的資訊可用於
readdir(3)
:struct dirent { ino_t d_ino; /* Inode number */ off_t d_off; /* Not an offset; see below */ unsigned short d_reclen; /* Length of this record */ unsigned char d_type; /* Type of file; not supported by all filesystem types */ char d_name[256]; /* Null-terminated filename */ };
懷疑但通過比較(在較小的樣本上……)以下兩個輸出來確認:
strace -o v1 find many_files -printf '%i %y %p\n'>info_file strace -o v2 find many_files -printf '%i %y %M %p\n'>info_file
在我的 Linux amd64 核心 5.0.x 上,這只是主要區別:
$$ … $$
getdents(4, /* 0 entries */, 32768) = 0 close(4) = 0 fcntl(5, F_DUPFD_CLOEXEC, 0) = 4 -write(1, "25499894 d many_files\n25502410 f"..., 4096) = 4096 -write(1, "iles/844\n25502253 f many_files/8"..., 4096) = 4096 -write(1, "096 f many_files/686\n25502095 f "..., 4096) = 4096 -write(1, "es/529\n25501938 f many_files/528"..., 4096) = 4096 -write(1, "1 f many_files/371\n25501780 f ma"..., 4096) = 4096 -write(1, "/214\n25497527 f many_files/213\n2"..., 4096) = 4096 -brk(0x55b29a933000) = 0x55b29a933000 +newfstatat(5, "1000", {st_mode=S_IFREG|0644, st_size=5, ...}, AT_SYMLINK_NOFOLLOW) = 0 +newfstatat(5, "999", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0 +newfstatat(5, "998", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0 +newfstatat(5, "997", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0 +newfstatat(5, "996", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0 +newfstatat(5, "995", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0 +newfstatat(5, "994", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0 +newfstatat(5, "993", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0 +newfstatat(5, "992", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0 +newfstatat(5, "991", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0 +newfstatat(5, "990", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
$$ … $$
+newfstatat(5, "891", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0 +write(1, "25499894 d drwxr-xr-x many_files"..., 4096) = 4096 +newfstatat(5, "890", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
$$ … $$