為什麼 initramfs 需要用新的根目錄覆蓋 rootfs？

May 3, 2020

我閱讀了有關initramfs的linux 文件和.switch_root
文件說：
當切換另一個根設備時，initrd 將 pivot_root 然後解除安裝 ramdisk。但是 initramfs 是 rootfs：您既不能 pivot_root rootfs，也不能解除安裝它。而是從 rootfs 中刪除所有內容以釋放空間（find -xdev / -exec rm ‘{}’ ‘;’），用新的 root 覆蓋 rootfs (cd /newmount; mount –move ./; chroot .)，將 stdin/stdout/stderr 附加到新的 /dev/console，並執行新的 init。
並且switch_root確實這樣做了：
if (chdir(newroot)) {
   warn(_("failed to change directory to %s"), newroot);
   return -1;
}

...

if (mount(newroot, "/", NULL, MS_MOVE, NULL) &lt; 0) {
   close(cfd);
   warn(_("failed to mount moving %s to /"), newroot);
   return -1;
}

...

if (chroot(".")) {
   close(cfd);
   warn(_("failed to change root"));
   return -1;
}
為什麼我們需要移動掛載點/？
為什麼 chroot 到 new_root 還不夠？

編輯：感謝@timothy-baldwin 編輯。
new_rootmount over會改變掛載/命名空間的根目錄，chrooting 不overmounting/會導致系統處於一個chroot環境中（根目錄與掛載命名空間的根目錄不匹配）。
這會導致一些問題，例如：
1. chroot 內部不允許創建使用者命名空間。
根據，在 chroot 環境中man 2 unshare，unshareing 使用者命名空間將失敗。EPERM
EPERM (since Linux 3.9)
      CLONE_NEWUSER was specified in flags and the caller is in a  chroot  environment
      (i.e., the  caller's root directory does not match the root directory of the
      mount namespace in which it resides).
$ unshare -U
unshare: unshare failed: Operation not permitted
2.進入掛載命名空間會將根目錄設置為命名空間的根目錄
進入掛載命名空間會將程序的根目錄設置為掛載命名空間的根目錄，因此setns對我們的掛載命名空間進行操作會將我們的根目錄設置為 rootfs 目錄。
$ nsenter -m/proc/self/ns/mnt /bin/sh
$ ls -ld /new_root
new_root
我可以看到我的 chroot 之外的 new_root 目錄。
掛載/並不能真正阻止逃離 chroot
root 使用者可以在umount這個目錄下，重新進入其掛載命名空間 ( setns) 並查看 rootfs：
#define _GNU_SOURCE

#include &lt;sys/types.h&gt;
#include &lt;sys/stat.h&gt;
#include &lt;sys/mount.h&gt;
#include &lt;unistd.h&gt;
#include &lt;fcntl.h&gt;
#include &lt;sched.h&gt;
#include &lt;stdio.h&gt;

int main() {
   int ns = open("/proc/self/ns/mnt", O_RDONLY);
   if (ns == -1) {
       perror("open");
       goto out;
   }

   if (umount2("/", MNT_DETACH)) {
       perror("umount2");
       goto out;
   }

   if (setns(ns, CLONE_NEWNS)) {
       perror("setns");
       goto out;
   }

   char *a[] = { "/bin/sh", NULL };
   char *e[] = { NULL };
   execve(a[0], a, e);

   perror("execve");

out:
   return 1;
}
$ gcc -o main main.c
$ unshare -m ./main
/ # ls -d new_root
new_root
/ # mount -t proc proc /proc
/ # cat /proc/mounts
none / rootfs rw 0 0
proc /proc proc rw,relatime 0 0
~~為了防止 chroot 逃逸，必須安裝new_rootover 。/~~
創建了一個最小的 initramfs 並switch_root用這個 shell 腳本替換二進製文件以獲得一個 shell：
#!/bin/sh

exec /bin/sh
/bin/sh還將 initramfs 內部更改為靜態連結的busybox.
編譯以下程式碼並靜態連結：
#include &lt;sys/stat.h&gt;
#include &lt;fcntl.h&gt;
#include &lt;unistd.h&gt;
#include &lt;stdio.h&gt;

int main() {
   int fd = open(".", O_RDONLY | O_CLOEXEC);
   if (fd &lt; 0) {
       perror("open");
       goto out0;
   }

   if (chroot("tmp")) {
       perror("chroot");
       goto out1;
   }

   if (fchdir(fd)) {
       perror("fchdir");
       goto out1;
   }

   if (chdir("..")) {
       perror("chdir");
       goto out1;
   }

   char *const argvp[] = { "sh", NULL };
   char *const envp[] = { NULL };
   execve("bin/sh", argvp, envp);

   perror("execve");

out1:
   close(fd);

out0:
   return  1;

}
將我的真實根文件系統的根目錄作為/escape.
switch_root在發生之前重新啟動並獲得了一個外殼。
無需過度安裝根
$ mount --move proc new_root/proc
$ mount --move dev new_root/dev
$ mount --move sys new_root/sys
$ mount --move run new_root/run
$ exec chroot new_root
$ ./escape
$ ls -d new_root
new_root
我逃離了 chroot。
帶有超載根
$ mount --move proc new_root/proc
$ mount --move dev new_root/dev
$ mount --move sys new_root/sys
$ mount --move run new_root/run
$ cd new_root
$ mount --move . /
$ exec chroot .
$ ./escape
$ ls -d new_root
ls: cannot access 'new_root': No such file or directory
我無法逃脫 chroot。

不過度掛載 rootfs 會破壞使用者和掛載命名空間：
setns系統呼叫會將呼叫者根目錄設置為掛載命名空間的根目錄，撤消chroot.
如果程序根目錄不是其掛載命名空間的根目錄，則禁止非特權程序創建使用者命名空間。

引用自：https://unix.stackexchange.com/questions/583138

為什麼 initramfs 需要用新的根目錄覆蓋 rootfs？

編輯：感謝@timothy-baldwin 編輯。

1. chroot 內部不允許創建使用者命名空間。

2.進入掛載命名空間會將根目錄設置為命名空間的根目錄

掛載`/`並不能真正阻止逃離 chroot

無需過度安裝根

帶有超載根

相關問答

文件 /proc/cmdline 為空

initramfs 中的空 fstab 引導問題

如何知道 /boot 的正確大小

核心如何掛載根分區？

MTD分區的命名方案

為什麼核心不從 GRUB 繼承文件系統資訊？

為什麼 initramfs 需要用新的根目錄覆蓋 rootfs？

編輯：感謝@timothy-baldwin 編輯。

1. chroot 內部不允許創建使用者命名空間。

2.進入掛載命名空間會將根目錄設置為命名空間的根目錄

掛載/並不能真正阻止逃離 chroot

無需過度安裝根

帶有超載根

相關問答

文件 /proc/cmdline 為空

initramfs 中的空 fstab 引導問題

如何知道 /boot 的正確大小

核心如何掛載根分區？

MTD分區的命名方案

為什麼核心不從 GRUB 繼承文件系統資訊？

掛載`/`並不能真正阻止逃離 chroot