Linux

如何在 Qubes OS 4.0 的重新啟動/關閉期間安全地關閉每個正在執行的 VM,而不會因超時而導致停頓/延遲?(系統問題)

  • September 13, 2018

由於一些影響 Qubes 4.0 的問題,當從 dom0 重新啟動或關閉電腦時,除非首先關閉所有正在執行的虛擬機,否則操作完成前會有一些延遲(停頓)。

在從 xfce 的 Logout 菜單執行 Restart/Shutdown 之前,我必須手動執行一個腳本來關閉所有 VM,否則我可以預期會出現至少 30 秒的停頓(如果我DefaultTimeoutStopSec從預設的90sto拒絕30s)。

這是該腳本及其執行的範例輸出:

[ctor@dom0 ~]$ cat preshutdown 
#!/bin/bash

xl list
time qvm-shutdown --verbose --all --wait; ec="$?"
echo "exitcode: '$ec'"
time while xl list|grep -q -F '(null)'; do xl list;sleep 1; done
exit $ec

$ ./preshutdown 
Name                                        ID   Mem VCPUs  State   Time(s)
Domain-0                                     0  4080     6     r-----     108.6
sys-net                                      1   384     2     -b----       7.0
sys-net-dm                                   2   144     1     -b----      16.5
sys-firewall                                 3  2917     2     -b----       9.7
gmail-basedon-w-s-f-fdr28                    4  3247     2     -b----      28.6
stackexchangelogins-w-s-f-fdr28              5  3241     2     -b----      24.3
dev01-w-s-f-fdr28                            7  8481     6     -b----      32.6
2018-09-06 09:37:08,187 [MainProcess selector_events.__init__:65] asyncio: Using selector: EpollSelector

real    0m14.959s
user    0m0.065s
sys 0m0.017s
exitcode: '0'
Name                                        ID   Mem VCPUs  State   Time(s)
Domain-0                                     0  4095     6     r-----     123.0
(null)                                       1     0     1     --ps-d       7.8
(null)                                       3     0     0     --ps-d      11.0
Name                                        ID   Mem VCPUs  State   Time(s)
Domain-0                                     0  4095     6     r-----     123.1
(null)                                       1     0     1     --ps-d       7.8
(null)                                       3     0     0     --ps-d      11.0
Name                                        ID   Mem VCPUs  State   Time(s)
Domain-0                                     0  4095     6     r-----     123.4
(null)                                       1     0     1     --ps-d       7.8
(null)                                       3     0     0     --ps-d      11.0
Name                                        ID   Mem VCPUs  State   Time(s)
Domain-0                                     0  4095     6     r-----     123.7
(null)                                       1     0     1     --ps-d       7.8
Name                                        ID   Mem VCPUs  State   Time(s)
Domain-0                                     0  4095     6     r-----     123.8
(null)                                       1     0     1     --ps-d       7.8
Name                                        ID   Mem VCPUs  State   Time(s)
Domain-0                                     0  4095     6     r-----     123.9
(null)                                       1     0     1     --ps-d       7.8
Name                                        ID   Mem VCPUs  State   Time(s)
Domain-0                                     0  4095     6     r-----     124.0
(null)                                       1     0     1     --ps-d       7.8

real    0m7.093s
user    0m0.024s
sys 0m0.085s

然而,Dom0 卡在 Fedora 25(Fedora 28 僅可用於 VM),因此systemd無法輕鬆更新(或者我還不知道如何更新) - 它的版本為231,而 240 是 github 上的最新版本 - 我不確定如果這是一個 systemd 問題,或者只是我不知道如何正確修改它qubes-core.service以確保它在 systemd 嘗試關閉某些 DM 設備之前停止。

我確實嘗試使用這個這個答案,但結果沒有改變。

systemd這是停止時 的範例輸出:

[ 443.660340] systemd[1]: qubes-core.service: Installed new job qubes-core.service/stop as 797
[ 443.660426] systemd[1]: dev-block-253:0.device: Installed new job dev-block-253:0.device/stop as 867
[ 533.755109] systemd[1]: dev-block-253:0.device: Job dev-block-253:0.device/stop timed out.
[ 534.047847] systemd[1]: qubes-core.service: About to execute: /usr/bin/pkill qubes-guid
[ 534.048939] systemd[1]: Stopping Qubes Dom0 startup setup...
[ 542.648718] systemd[1]: Stopped Qubes Dom0 startup setup.
[ 547.940019] systemd[1]: dev-block-253:0.device: Failed to send unit remove signal for dev-block-253:0.device: Transport endpoint is not connected

與它不停止時相比:

[ 67.643774] systemd[1]: dev-block-253:0.device: Installed new job dev-block-253:0.device/stop as 777
[ 67.643982] systemd[1]: qubes-core.service: Installed new job qubes-core.service/stop as 860
[   68.032308] systemd[1]: qubes-core.service: About to execute: /usr/bin/pkill qubes-guid
[ 68.033396] systemd[1]: Stopping Qubes Dom0 startup setup...
[ 76.932065] systemd[1]: Stopped Qubes Dom0 startup setup.
[ 76.985423] systemd[1]: dev-block-253:0.device: Redirecting stop request from dev-block-253:0.device to sys-devices-virtual-block-dm\x2d0.device.
[ 82.205556] systemd[1]: dev-block-253:0.device: Failed to send unit remove signal for dev-block-253:0.device: Transport endpoint is not connected

奇怪的是,沒有我改變任何東西就發生了沒有失速然後上面的失速systemd:前 2 次重新啟動沒有失速,第 3 次是失速。(這裡有完整的細節

如何在 Qubes OS 4.0 的**重新啟動/關閉期間安全地關閉每個正在執行的 VM?**也就是說,在從 xfce 菜單進行重新啟動/關閉之前,我不必手動執行腳本。

可能的想法:

如果所有那些超時的設備在使用者註銷時都被停止(session-2.scope?),也就是說,它們被列出systemctl --user status *.device 意味著它們可能會優先?所以他們總是會在停止之前qubes-core.service停止,因為後者是--system一個。你怎麼看?以下是執行時發生的情況systemctl --user(在執行虛擬機的情況下登錄):https ://gist.github.com/constantoverride/a7dbad2146645387209b25e4c07de8ad#gistcomment-2701867

**編輯:**我嘗試使用--user服務,但似乎一切都立即停止(即. concurrently)所以我的腳本和上述同時超時。

編輯:我發現,要麼我不知道如何,要麼沒有辦法告訴 systemd--system在 systemd 嘗試停止某些服務之前停止(並完成停止)我的服務.device,所以我的服務和那些服務都.device失敗並同時超時時間(90 秒後)。請參閱此處的日誌。

程式碼更改(qubes-gui-dom0-4.0.8-1.29.fc25送出)解決了問題。 因此不再需要redsparrow解決方法。

在這裡重現送出

From 612cfe5925d32d8af0269163ee3ad627de4a8226 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Marek=20Marczykowski-G=C3=B3recki?=
<marmarek@invisiblethingslab.com>
Date: Thu, 13 Sep 2018 12:22:19 +0200
Subject: [PATCH] xside: avoid making X11 calls in signal handler

This is very simlar fix to QubesOS/qubes-issues#1406
2148a00 "Do not make X11 requests in X11 error handler"

Since signals can be sent asynchronously at any time, it could also hit
processing another X11 message. For this reason, avoid making X11 calls
if exit() is called from signal handler.

Fixes QubesOS/qubes-issues#1581
---
gui-daemon/xside.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/gui-daemon/xside.c b/gui-daemon/xside.c
index cca28da..3e12012 100644
--- a/gui-daemon/xside.c
+++ b/gui-daemon/xside.c
@@ -2455,6 +2455,13 @@ static void handle_message(Ghandles * g)
/* signal handler - connected to SIGTERM */
static void dummy_signal_handler(int UNUSED(x))
{
+    /* The exit(0) below will call release_all_mapped_mfns (registerd with
+     * atexit(3)), which would try to release window images with XShmDetach. We
+     * can't send X11 requests if one is currently being handled. Since signals
+     * are asynchronous, we don't know that. Clean window images
+     * without calling to X11. And hope that X server will call XShmDetach
+     * internally when cleaning windows of disconnected client */
+    release_all_shm_no_x11_calls();
    exit(0);
}

這樣做是允許qubes-guid安全終止(例如在 SIGTERM 上),因此它不需要 redsparrow 的 SIGKILL。有關其餘資訊,請參閱 redsparrow答案

引用自:https://unix.stackexchange.com/questions/467232