如何在 Qubes OS 4.0 的重新啟動/關閉期間安全地關閉每個正在執行的 VM,而不會因超時而導致停頓/延遲?(系統問題)
由於一些影響 Qubes 4.0 的問題,當從 dom0 重新啟動或關閉電腦時,除非首先關閉所有正在執行的虛擬機,否則操作完成前會有一些延遲(停頓)。
在從 xfce 的 Logout 菜單執行 Restart/Shutdown 之前,我必須手動執行一個腳本來關閉所有 VM,否則我可以預期會出現至少 30 秒的停頓(如果我
DefaultTimeoutStopSec
從預設的90s
to拒絕30s
)。這是該腳本及其執行的範例輸出:
[ctor@dom0 ~]$ cat preshutdown #!/bin/bash xl list time qvm-shutdown --verbose --all --wait; ec="$?" echo "exitcode: '$ec'" time while xl list|grep -q -F '(null)'; do xl list;sleep 1; done exit $ec $ ./preshutdown Name ID Mem VCPUs State Time(s) Domain-0 0 4080 6 r----- 108.6 sys-net 1 384 2 -b---- 7.0 sys-net-dm 2 144 1 -b---- 16.5 sys-firewall 3 2917 2 -b---- 9.7 gmail-basedon-w-s-f-fdr28 4 3247 2 -b---- 28.6 stackexchangelogins-w-s-f-fdr28 5 3241 2 -b---- 24.3 dev01-w-s-f-fdr28 7 8481 6 -b---- 32.6 2018-09-06 09:37:08,187 [MainProcess selector_events.__init__:65] asyncio: Using selector: EpollSelector real 0m14.959s user 0m0.065s sys 0m0.017s exitcode: '0' Name ID Mem VCPUs State Time(s) Domain-0 0 4095 6 r----- 123.0 (null) 1 0 1 --ps-d 7.8 (null) 3 0 0 --ps-d 11.0 Name ID Mem VCPUs State Time(s) Domain-0 0 4095 6 r----- 123.1 (null) 1 0 1 --ps-d 7.8 (null) 3 0 0 --ps-d 11.0 Name ID Mem VCPUs State Time(s) Domain-0 0 4095 6 r----- 123.4 (null) 1 0 1 --ps-d 7.8 (null) 3 0 0 --ps-d 11.0 Name ID Mem VCPUs State Time(s) Domain-0 0 4095 6 r----- 123.7 (null) 1 0 1 --ps-d 7.8 Name ID Mem VCPUs State Time(s) Domain-0 0 4095 6 r----- 123.8 (null) 1 0 1 --ps-d 7.8 Name ID Mem VCPUs State Time(s) Domain-0 0 4095 6 r----- 123.9 (null) 1 0 1 --ps-d 7.8 Name ID Mem VCPUs State Time(s) Domain-0 0 4095 6 r----- 124.0 (null) 1 0 1 --ps-d 7.8 real 0m7.093s user 0m0.024s sys 0m0.085s
然而,Dom0 卡在 Fedora 25(Fedora 28 僅可用於 VM),因此
systemd
無法輕鬆更新(或者我還不知道如何更新) - 它的版本為231,而 240 是 github 上的最新版本 - 我不確定如果這是一個 systemd 問題,或者只是我不知道如何正確修改它qubes-core.service
以確保它在 systemd 嘗試關閉某些 DM 設備之前停止。
systemd
這是停止時 的範例輸出:[ 443.660340] systemd[1]: qubes-core.service: Installed new job qubes-core.service/stop as 797 [ 443.660426] systemd[1]: dev-block-253:0.device: Installed new job dev-block-253:0.device/stop as 867 [ 533.755109] systemd[1]: dev-block-253:0.device: Job dev-block-253:0.device/stop timed out. [ 534.047847] systemd[1]: qubes-core.service: About to execute: /usr/bin/pkill qubes-guid [ 534.048939] systemd[1]: Stopping Qubes Dom0 startup setup... [ 542.648718] systemd[1]: Stopped Qubes Dom0 startup setup. [ 547.940019] systemd[1]: dev-block-253:0.device: Failed to send unit remove signal for dev-block-253:0.device: Transport endpoint is not connected
與它不停止時相比:
[ 67.643774] systemd[1]: dev-block-253:0.device: Installed new job dev-block-253:0.device/stop as 777 [ 67.643982] systemd[1]: qubes-core.service: Installed new job qubes-core.service/stop as 860 [ 68.032308] systemd[1]: qubes-core.service: About to execute: /usr/bin/pkill qubes-guid [ 68.033396] systemd[1]: Stopping Qubes Dom0 startup setup... [ 76.932065] systemd[1]: Stopped Qubes Dom0 startup setup. [ 76.985423] systemd[1]: dev-block-253:0.device: Redirecting stop request from dev-block-253:0.device to sys-devices-virtual-block-dm\x2d0.device. [ 82.205556] systemd[1]: dev-block-253:0.device: Failed to send unit remove signal for dev-block-253:0.device: Transport endpoint is not connected
奇怪的是,沒有我改變任何東西就發生了沒有失速然後上面的失速
systemd
:前 2 次重新啟動沒有失速,第 3 次是失速。(這裡有完整的細節)如何在 Qubes OS 4.0 的**重新啟動/關閉期間安全地關閉每個正在執行的 VM?**也就是說,在從 xfce 菜單進行重新啟動/關閉之前,我不必手動執行腳本。
可能的想法:
如果所有那些超時的設備在使用者註銷時都被停止(
session-2.scope
?),也就是說,它們被列出來systemctl --user status *.device
意味著它們可能會優先?所以他們總是會在停止之前qubes-core.service
停止,因為後者是--system
一個。你怎麼看?以下是執行時發生的情況systemctl --user
(在執行虛擬機的情況下登錄):https ://gist.github.com/constantoverride/a7dbad2146645387209b25e4c07de8ad#gistcomment-2701867**編輯:**我嘗試使用
--user
服務,但似乎一切都立即停止(即. concurrently)所以我的腳本和上述同時超時。編輯:我發現,要麼我不知道如何,要麼沒有辦法告訴 systemd
--system
在 systemd 嘗試停止某些服務之前停止(並完成停止)我的服務.device
,所以我的服務和那些服務都.device
失敗並同時超時時間(90 秒後)。請參閱此處的日誌。
此程式碼更改(此
qubes-gui-dom0-4.0.8-1.29.fc25
送出)解決了問題。 因此不再需要redsparrow解決方法。在這裡重現送出:
From 612cfe5925d32d8af0269163ee3ad627de4a8226 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Marek=20Marczykowski-G=C3=B3recki?= <marmarek@invisiblethingslab.com> Date: Thu, 13 Sep 2018 12:22:19 +0200 Subject: [PATCH] xside: avoid making X11 calls in signal handler This is very simlar fix to QubesOS/qubes-issues#1406 2148a00 "Do not make X11 requests in X11 error handler" Since signals can be sent asynchronously at any time, it could also hit processing another X11 message. For this reason, avoid making X11 calls if exit() is called from signal handler. Fixes QubesOS/qubes-issues#1581 --- gui-daemon/xside.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/gui-daemon/xside.c b/gui-daemon/xside.c index cca28da..3e12012 100644 --- a/gui-daemon/xside.c +++ b/gui-daemon/xside.c @@ -2455,6 +2455,13 @@ static void handle_message(Ghandles * g) /* signal handler - connected to SIGTERM */ static void dummy_signal_handler(int UNUSED(x)) { + /* The exit(0) below will call release_all_mapped_mfns (registerd with + * atexit(3)), which would try to release window images with XShmDetach. We + * can't send X11 requests if one is currently being handled. Since signals + * are asynchronous, we don't know that. Clean window images + * without calling to X11. And hope that X server will call XShmDetach + * internally when cleaning windows of disconnected client */ + release_all_shm_no_x11_calls(); exit(0); }
這樣做是允許
qubes-guid
安全終止(例如在 SIGTERM 上),因此它不需要 redsparrow 的 SIGKILL。有關其餘資訊,請參閱 redsparrow答案。