Filesystems

“每個組的 Inodes”和“lazy_itable_init”標誌如何與 ext4 文件系統中的“Inode count”值相關?

  • May 18, 2022

我試圖從根本上解決一個客戶案例,即使用相同命令格式化的 2 個相同驅動器由於額外的 Inode 成本而導致總磁碟空間存在約 55GB 的差異。

我想了解

  1. 關於 2x 如何Inodes per group轉換為 2x的數學運算Inode count
  2. 使用標誌時如何Inodes per group設置lazy_itable_init

環境:

2 個驅動器位於 2 個相同的硬體伺服器上,執行在相同的作業系統上。以下是 2 個驅動器的詳細資訊(敏感資訊已編輯):

驅動器 A:

=== START OF INFORMATION SECTION ===
Vendor:               HPE
Product:              <strip>
Revision:             HPD4
Compliance:           SPC-5
User Capacity:        7,681,501,126,656 bytes [7.68 TB]
Logical block size:   512 bytes
Physical block size:  4096 bytes
LU is resource provisioned, LBPRZ=1
Rotation Rate:        Solid State Device
Form Factor:          2.5 inches
Logical Unit id:      <strip>
Serial number:        <strip>
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Mon Apr 25 07:39:27 2022 GMT
SMART support is:     Available - device has SMART capability.

驅動器 B:

=== START OF INFORMATION SECTION ===
Vendor:               HPE
Product:              <strip>
Revision:             HPD4
Compliance:           SPC-5
User Capacity:        7,681,501,126,656 bytes [7.68 TB]
Logical block size:   512 bytes
Physical block size:  4096 bytes
LU is resource provisioned, LBPRZ=1
Rotation Rate:        Solid State Device
Form Factor:          2.5 inches
Logical Unit id:      <strip>
Serial number:        <strip>
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Mon Apr 25 07:39:23 2022 GMT
SMART support is:     Available - device has SMART capability.

執行格式化驅動器的命令是:

sudo mke2fs -F -m 1 -t ext4 -E lazy_itable_init,nodiscard /dev/sdc1

問題:

驅動器 A 和 B的df -h輸出分別顯示大小為 6.9T 的驅動器 A 與大小為 7.0T 的驅動器 B:

/dev/sdc1       6.9T   89M  6.9T   1% /home/<strip>/data/<serial>
...
/dev/sdc1       7.0T  3.0G  6.9T   1% /home/<strip>/data/<serial>

觀察:

  • 兩個驅動器上的 fdisk 輸出顯示它們都有相同的分區。

驅動器A:

Disk /dev/sdc: 7681.5 GB, 7681501126656 bytes, 15002931888 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 8192 bytes / 8192 bytes
Disk label type: gpt
Disk identifier: 70627C8E-9F97-468E-8EE6-54E960492318


#         Start          End    Size  Type            Name
1         2048  15002929151      7T  Microsoft basic primary

驅動器B:

Disk /dev/sdc: 7681.5 GB, 7681501126656 bytes, 15002931888 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 8192 bytes / 8192 bytes
Disk label type: gpt
Disk identifier: 702A42FA-9A20-4CE4-B938-83D3AB3DCC49


#         Start          End    Size  Type            Name
1         2048  15002929151      7T  Microsoft basic primary
  • /etc/mke2fs.conf兩個系統上的內容是相同的,所以這裡沒有有趣的事情:
================== DriveA =================
[defaults]
       base_features = sparse_super,filetype,resize_inode,dir_index,ext_attr
       enable_periodic_fsck = 1
       blocksize = 4096
       inode_size = 256
       inode_ratio = 16384

[fs_types]
       ext3 = {
               features = has_journal
       }
       ext4 = {
               features = has_journal,extent,huge_file,flex_bg,uninit_bg,dir_nlink,extra_isize,64bit
               inode_size = 256
       }
...
================== DriveB =================
[defaults]
       base_features = sparse_super,filetype,resize_inode,dir_index,ext_attr
       enable_periodic_fsck = 1
       blocksize = 4096
       inode_size = 256
       inode_ratio = 16384

[fs_types]
       ext3 = {
               features = has_journal
       }
       ext4 = {
               features = has_journal,extent,huge_file,flex_bg,uninit_bg,dir_nlink,extra_isize,64bit
               inode_size = 256
       }
  • 如果我們對兩個驅動器的 tune2fs -l 輸出進行比較,我們Inodes per group會在 DriveA 上看到 2x DriveB
  • 我們還在Inode countDriveA 上看到 2xDriveB (Full diff HERE )
DriveA:
   Inode count:              468844544
   Block count:              1875365888
   Reserved block count:     18753658
   Free blocks:              1845578463
   Free inodes:              468843793
   ...
   Fragments per group:      32768
   Inodes per group:         8192
   Inode blocks per group:   512
   Flex block group size:    16
   
DriveB:
   Inode count:              234422272 <----- Half of A
   Block count:              1875365888
   Reserved block count:     18753658
   Free blocks:              1860525018
   Free inodes:              234422261
   ...
   Fragments per group:      32768
   Inodes per group:         4096 <---------- Half of A
   Inode blocks per group:   256  <---------- Half of A
   Flex block group size:    16
write_inode_tables(fs, lazy_itable_init, itable_zeroed);
...
static void write_inode_tables(ext2_filsys fs, int lazy_flag, int itable_zeroed)
...
   if (lazy_flag)
       num = ext2fs_div_ceil((fs->super->s_inodes_per_group - <--------- here
                      ext2fs_bg_itable_unused(fs, i)) *
                     EXT2_INODE_SIZE(fs->super),
                     EXT2_BLOCK_SIZE(fs->super));

如果我們將 inode 計數的差異乘以恆定的 inode 大小 (256),我們將獲得(468844544-234422272)*256 = 60012101632 bytes約 55GiB 的額外 inode 成本。

  1. 任何人都可以幫助我計算 Inode 計數在增加到 2 倍時如何增加到 2Inodes per group倍嗎?
  2. 是否lazy_itable_init在執行時影響決定 的值Inodes per group,如果是,我們如何理解它將設置什麼值?(此標誌是程式碼中對 s_inodes_per_group 的唯一引用)

我發現這兩種情況的不同之處在於 e2fsprogs 版本的不同 - 1.42.9 和 1.45.4。我沒想過要檢查它,只依賴 mke2fs.conf 文件。為這個明顯的失誤道歉,並感謝@lustreone 的建議。

我仍然很想知道與每組 Inode 和 Inode 計數相關的數學。

引用自:https://unix.stackexchange.com/questions/702147