Gcc

如何使用 FDO (PGO) + LTO 在 Linux 上建構最新的手剎?

  • August 2, 2021

將 CFLAGS 和 CXXFLAGS 傳遞給最新版本(撰寫本文時為 v1.3.3)的 HandBrake 建構將起作用,直到您添加-flto這將使整個建構失敗。

如何使用 LTO 選項建構 HandBrake,-flto並將其作為延伸目標,以及 FDO(回饋定向優化又名 FDO aka PGO)?

HandBrake 中的大多數編解碼器都是使用“手動編碼”程序集開發的,因此許多人斷言編譯器優化收益不會那麼多。

我想測試和挑戰這個斷言!

我已經重新嘗試了最新的標籤版本,即GCC-11 和 CLANG-12 中的**Handbrake v1.4.0 。**需要進行一些更改才能成功建構所需的配置。例如,GCC-11 建構無法為某些模組成功建構,因為它無法在訓練後解析配置文件的路徑(絕對路徑中的 gcda 文件)。

下面是針對 v1.4.0 的 GCC-11 和 CLANG-12 的訓練和 FDO 配置,並且與之前針對 Handbrake v1.3.3 的答案過程不同。

GCC-11:

為 GCC-11 配置和建構命令:

./configure --harden --optimize=speed --enable-fdk-aac --disable-nvenc --build=build-v1.4.0 && cd ./build-v1.4.0 && time make -j$(( $(nproc) + 1 ));

訓練/分析階段手剎 V1.4.0 –> GCC-11 custom.defs 文件:

GCC.args.O.speed = -march=native -msse2avx -O3 -pipe -fprofile-generate -fprofile-update=atomic
GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -msse2avx -O3 -pipe -fprofile-generate -fprofile-update=atomic
X265.GCC.args.O.speed = -march=native -msse2avx -O3 -pipe -fprofile-generate -fprofile-update=atomic -flto
X265.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -msse2avx -O3 -pipe -fprofile-generate -fprofile-update=atomic -flto
X265_8.GCC.args.O.speed = -march=native -msse2avx -O3 -pipe -fprofile-generate -fprofile-update=atomic -flto
X265_8.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -msse2avx -O3 -pipe -fprofile-generate -fprofile-update=atomic -flto
X265_10.GCC.args.O.speed = -march=native -msse2avx -O3 -pipe -fprofile-generate -fprofile-update=atomic -flto
X265_10.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -msse2avx -O3 -pipe -fprofile-generate -fprofile-update=atomic -flto
X265_12.GCC.args.O.speed = -march=native -msse2avx -O3 -pipe -fprofile-generate -fprofile-update=atomic -flto
X265_12.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -msse2avx -O3 -pipe -fprofile-generate -fprofile-update=atomic -flto
LIBHB.GCC.args.O.speed = -march=native -msse2avx -O3 -pipe -fprofile-generate -fprofile-update=atomic -flto
LIBHB.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -msse2avx -O3 -pipe -fprofile-generate -fprofile-update=atomic -flto
LIBDAV1D.GCC.args.O.speed = -march=native -msse2avx -O3 -pipe -fprofile-generate -fprofile-update=atomic -flto
LIBDAV1D.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -msse2avx -O3 -pipe -fprofile-generate -fprofile-update=atomic -flto
GTK.GCC.args.O.speed = -march=native -msse2avx -O3 -pipe -fprofile-generate -fprofile-update=atomic -flto
GTK.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -msse2avx -O3 -pipe -fprofile-generate -fprofile-update=atomic -flto
LIBDVDREAD.GCC.args.O.speed = -march=native -msse2avx -O3 -pipe -fprofile-generate -fprofile-update=atomic -flto
LIBDVDREAD.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -msse2avx -O3 -pipe -fprofile-generate -fprofile-update=atomic -flto
LIBDVDNAV.GCC.args.O.speed = -march=native -msse2avx -O3 -pipe -fprofile-generate -fprofile-update=atomic -flto
LIBDVDNAV.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -msse2avx -O3 -pipe -fprofile-generate -fprofile-update=atomic -flto
LIBBLURAY.GCC.args.O.speed = -march=native -msse2avx -O3 -pipe -fprofile-generate -fprofile-update=atomic -flto
LIBBLURAY.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -msse2avx -O3 -pipe -fprofile-generate -fprofile-update=atomic -flto
TEST.GCC.args.O.speed = -march=native -msse2avx -O3 -pipe -fprofile-generate -fprofile-update=atomic -flto
TEST.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -msse2avx -O3 -pipe -fprofile-generate -fprofile-update=atomic -flto
FDKAAC.GCC.args.O.speed = -march=native -msse2avx -O3 -pipe -fprofile-generate -fprofile-update=atomic -flto
FDKAAC.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -msse2avx -O3 -pipe -fprofile-generate -fprofile-update=atomic -flto
ZIMG.GCC.args.O.speed = -march=native -msse2avx -O3 -pipe -fprofile-generate -fprofile-update=atomic -flto
ZIMG.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -msse2avx -O3 -pipe -fprofile-generate -fprofile-update=atomic -flto
LIBDAV1D.GCC.args.O.speed = -march=native -msse2avx -O3 -pipe -fprofile-generate -fprofile-update=atomic -flto
LIBDAV1D.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -msse2avx -O3 -pipe -fprofile-generate -fprofile-update=atomic -flto

FDO STAGE HANDBRAKE V1.4.0 –> GCC-11 custom.defs 文件:

GCC.args.O.speed = -march=native -msse2avx -O3 -pipe -fprofile-use -fprofile-correction -fprofile-partial-training
GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -msse2avx -O3 -pipe -fprofile-use -fprofile-correction -fprofile-partial-training
X265.GCC.args.O.speed = -march=native -msse2avx -O3 -pipe -fprofile-use -fprofile-correction -fprofile-partial-training -flto
X265.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -msse2avx -O3 -pipe -fprofile-use -fprofile-correction -fprofile-partial-training -flto
X265_8.GCC.args.O.speed = -march=native -msse2avx -O3 -pipe -fprofile-use -fprofile-correction -fprofile-partial-training -flto
X265_8.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -msse2avx -O3 -pipe -fprofile-use -fprofile-correction -fprofile-partial-training -flto
X265_10.GCC.args.O.speed = -march=native -msse2avx -O3 -pipe -fprofile-use -fprofile-correction -fprofile-partial-training -flto
X265_10.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -msse2avx -O3 -pipe -fprofile-use -fprofile-correction -fprofile-partial-training -flto
X265_12.GCC.args.O.speed = -march=native -msse2avx -O3 -pipe -fprofile-use -fprofile-correction -fprofile-partial-training -flto
X265_12.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -msse2avx -O3 -pipe -fprofile-use -fprofile-correction -fprofile-partial-training -flto
LIBHB.GCC.args.O.speed = -march=native -msse2avx -O3 -pipe -fprofile-use -fprofile-correction -fprofile-partial-training -flto
LIBHB.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -msse2avx -O3 -pipe -fprofile-use -fprofile-correction -fprofile-partial-training -flto
LIBDAV1D.GCC.args.O.speed = -march=native -msse2avx -O3 -pipe -fprofile-use -fprofile-correction -fprofile-partial-training -flto
LIBDAV1D.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -msse2avx -O3 -pipe -fprofile-use -fprofile-correction -fprofile-partial-training -flto
GTK.GCC.args.O.speed = -march=native -msse2avx -O3 -pipe -fprofile-use -fprofile-correction -fprofile-partial-training -flto
GTK.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -msse2avx -O3 -pipe -fprofile-use -fprofile-correction -fprofile-partial-training -flto
LIBDVDREAD.GCC.args.O.speed = -march=native -msse2avx -O3 -pipe -fprofile-use -fprofile-correction -fprofile-partial-training -flto
LIBDVDREAD.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -msse2avx -O3 -pipe -fprofile-use -fprofile-correction -fprofile-partial-training -flto
LIBDVDNAV.GCC.args.O.speed = -march=native -msse2avx -O3 -pipe -fprofile-use -fprofile-correction -fprofile-partial-training -flto
LIBDVDNAV.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -msse2avx -O3 -pipe -fprofile-use -fprofile-correction -fprofile-partial-training -flto
LIBBLURAY.GCC.args.O.speed = -march=native -msse2avx -O3 -pipe -fprofile-use -fprofile-correction -fprofile-partial-training -flto
LIBBLURAY.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -msse2avx -O3 -pipe -fprofile-use -fprofile-correction -fprofile-partial-training -flto
TEST.GCC.args.O.speed = -march=native -msse2avx -O3 -pipe -fprofile-use -fprofile-correction -fprofile-partial-training -flto
TEST.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -msse2avx -O3 -pipe -fprofile-use -fprofile-correction -fprofile-partial-training -flto
FDKAAC.GCC.args.O.speed = -march=native -msse2avx -O3 -pipe -fprofile-use -fprofile-correction -fprofile-partial-training -flto
FDKAAC.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -msse2avx -O3 -pipe -fprofile-use -fprofile-correction -fprofile-partial-training -flto
ZIMG.GCC.args.O.speed = -march=native -msse2avx -O3 -pipe -fprofile-use -fprofile-correction -fprofile-partial-training -flto
ZIMG.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -msse2avx -O3 -pipe -fprofile-use -fprofile-correction -fprofile-partial-training -flto
LIBDAV1D.GCC.args.O.speed = -march=native -msse2avx -O3 -pipe -fprofile-use -fprofile-correction -fprofile-partial-training -flto
LIBDAV1D.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -msse2avx -O3 -pipe -fprofile-use -fprofile-correction -fprofile-partial-training -flto

LLVM-12/CLANG-12/LLD-12:

Clang 的 PGO 與 GCC 略有不同。顯而易見的是,與 GCC 和相關的預設工具相比,使用 Clang/LLVM/LLD 時解析模組的絕對路徑沒有問題。然而,Clang 有一個額外的合併步驟來合併 FDO 所需的原始配置文件。

因此有3個步驟:

  1. 培訓/簡介階段
  2. 合併原始配置文件數據
  3. FDO階段

詳細的步驟命令。步驟 1 和 3 的 custom.defs 文件分別在下面的三個步驟之後列出。本節純粹是為了說明每個步驟所需的命令,而不是 custom.defs。因此,在執行配置和建構命令之前,您需要確保 custom.defs 就位:

  1. 為 LLVM-12/CLANG-12/LLD-12 配置和建構命令:
LDFLAGS="-fuse-ld=lld" ./configure --ar /usr/bin/llvm-ar --ranlib /usr/bin/llvm-ranlib --strip /usr/bin/llvm-strip --cc /usr/bin/clang --optimize=speed --enable-fdk-aac --disable-nvenc --build=build-v1.4.0-CLANG && cd ./build-v1.4.0-CLANG && time LDFLAGS="-fuse-ld=lld" make -j$(( $(nproc) + 1 ));

建構後,像使用 GCC 一樣正常訓練/配置文件,或者如果您嘗試了 v1.3.3 的早期說明。

  1. 訓練/分析後,合併原始配置文件數據。將 和 路徑替換為您的建構的正確位置。
llvm-profdata merge -output=<Absolute-Path>/handbrake.profdata <Absolute-Path-To-Profile-Files>/default_*.profraw
  1. FDO Build,這是與步驟 1 完全相同的單行命令。不同之處在 custom.defs 文件中。
LDFLAGS="-fuse-ld=lld" ./configure --ar /usr/bin/llvm-ar --ranlib /usr/bin/llvm-ranlib --strip /usr/bin/llvm-strip --cc /usr/bin/clang --optimize=speed --enable-fdk-aac --disable-nvenc --build=build-v1.4.0-CLANG && cd ./build-v1.4.0-CLANG && time LDFLAGS="-fuse-ld=lld" make -j$(( $(nproc) + 1 ));

訓練/分析階段手剎 V1.4.0 –> LLVM-12/CLANG-12/LLD-12 custom.defs 文件:

請記住將**<Absolute-Path-To-Profile-Files>**替換為正確的絕對路徑。

GCC.args.O.speed = -march=native -O3 -pipe -fprofile-instr-generate=&lt;Absolute-Path-To-Profile-Files&gt;/default_%m.profraw -fprofile-update=atomic
GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -O3 -pipe -fprofile-instr-generate=&lt;Absolute-Path-To-Profile-Files&gt;/default_%m.profraw -fprofile-update=atomic
X265.GCC.args.O.speed = -march=native -O3 -pipe -fprofile-instr-generate=&lt;Absolute-Path-To-Profile-Files&gt;/default_%m.profraw -fprofile-update=atomic -flto=thin
X265.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -O3 -pipe -fprofile-instr-generate=&lt;Absolute-Path-To-Profile-Files&gt;/default_%m.profraw -fprofile-update=atomic -flto=thin
X265_8.GCC.args.O.speed = -march=native -O3 -pipe -fprofile-instr-generate=&lt;Absolute-Path-To-Profile-Files&gt;/default_%m.profraw -fprofile-update=atomic -flto=thin
X265_8.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -O3 -pipe -fprofile-instr-generate=&lt;Absolute-Path-To-Profile-Files&gt;/default_%m.profraw -fprofile-update=atomic -flto=thin
X265_10.GCC.args.O.speed = -march=native -O3 -pipe -fprofile-instr-generate=&lt;Absolute-Path-To-Profile-Files&gt;/default_%m.profraw -fprofile-update=atomic -flto=thin
X265_10.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -O3 -pipe -fprofile-instr-generate=&lt;Absolute-Path-To-Profile-Files&gt;/default_%m.profraw -fprofile-update=atomic -flto=thin
X265_12.GCC.args.O.speed = -march=native -O3 -pipe -fprofile-instr-generate=&lt;Absolute-Path-To-Profile-Files&gt;/default_%m.profraw -fprofile-update=atomic -flto=thin
X265_12.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -O3 -pipe -fprofile-instr-generate=&lt;Absolute-Path-To-Profile-Files&gt;/default_%m.profraw -fprofile-update=atomic -flto=thin
LIBHB.GCC.args.O.speed = -march=native -O3 -pipe -fprofile-instr-generate=&lt;Absolute-Path-To-Profile-Files&gt;/default_%m.profraw -fprofile-update=atomic -flto=thin
LIBHB.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -O3 -pipe -fprofile-instr-generate=&lt;Absolute-Path-To-Profile-Files&gt;/default_%m.profraw -fprofile-update=atomic -flto=thin
LIBDAV1D.GCC.args.O.speed = -march=native -O3 -pipe -fprofile-instr-generate=&lt;Absolute-Path-To-Profile-Files&gt;/default_%m.profraw -fprofile-update=atomic -flto=thin
LIBDAV1D.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -O3 -pipe -fprofile-instr-generate=&lt;Absolute-Path-To-Profile-Files&gt;/default_%m.profraw -fprofile-update=atomic -flto=thin
GTK.GCC.args.O.speed = -march=native -O3 -pipe -fprofile-instr-generate=&lt;Absolute-Path-To-Profile-Files&gt;/default_%m.profraw -fprofile-update=atomic -flto=thin
GTK.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -O3 -pipe -fprofile-instr-generate=&lt;Absolute-Path-To-Profile-Files&gt;/default_%m.profraw -fprofile-update=atomic -flto=thin
LIBDVDREAD.GCC.args.O.speed = -march=native -O3 -pipe -fprofile-instr-generate=&lt;Absolute-Path-To-Profile-Files&gt;/default_%m.profraw -fprofile-update=atomic -flto=thin
LIBDVDREAD.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -O3 -pipe -fprofile-instr-generate=&lt;Absolute-Path-To-Profile-Files&gt;/default_%m.profraw -fprofile-update=atomic -flto=thin
LIBDVDNAV.GCC.args.O.speed = -march=native -O3 -pipe -fprofile-instr-generate=&lt;Absolute-Path-To-Profile-Files&gt;/default_%m.profraw -fprofile-update=atomic -flto=thin
LIBDVDNAV.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -O3 -pipe -fprofile-instr-generate=&lt;Absolute-Path-To-Profile-Files&gt;/default_%m.profraw -fprofile-update=atomic -flto=thin
LIBBLURAY.GCC.args.O.speed = -march=native -O3 -pipe -fprofile-instr-generate=&lt;Absolute-Path-To-Profile-Files&gt;/default_%m.profraw -fprofile-update=atomic -flto=thin
LIBBLURAY.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -O3 -pipe -fprofile-instr-generate=&lt;Absolute-Path-To-Profile-Files&gt;/default_%m.profraw -fprofile-update=atomic -flto=thin
TEST.GCC.args.O.speed = -march=native -O3 -pipe -fprofile-instr-generate=&lt;Absolute-Path-To-Profile-Files&gt;/default_%m.profraw -fprofile-update=atomic -flto=thin
TEST.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -O3 -pipe -fprofile-instr-generate=&lt;Absolute-Path-To-Profile-Files&gt;/default_%m.profraw -fprofile-update=atomic -flto=thin
FDKAAC.GCC.args.O.speed = -march=native -O3 -pipe -fprofile-instr-generate=&lt;Absolute-Path-To-Profile-Files&gt;/default_%m.profraw -fprofile-update=atomic -flto=thin
FDKAAC.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -O3 -pipe -fprofile-instr-generate=&lt;Absolute-Path-To-Profile-Files&gt;/default_%m.profraw -fprofile-update=atomic -flto=thin
ZIMG.GCC.args.O.speed = -march=native -O3 -pipe -fprofile-instr-generate=&lt;Absolute-Path-To-Profile-Files&gt;/default_%m.profraw -fprofile-update=atomic -flto=thin
ZIMG.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -O3 -pipe -fprofile-instr-generate=&lt;Absolute-Path-To-Profile-Files&gt;/default_%m.profraw -fprofile-update=atomic -flto=thin
LIBDAV1D.GCC.args.O.speed = -march=native -O3 -pipe -fprofile-instr-generate=&lt;Absolute-Path-To-Profile-Files&gt;/default_%m.profraw -fprofile-update=atomic -flto=thin
LIBDAV1D.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -O3 -pipe -fprofile-instr-generate=&lt;Absolute-Path-To-Profile-Files&gt;/default_%m.profraw -fprofile-update=atomic -flto=thin

FDO STAGE HANDBRAKE V1.4.0 –> LLVM-12/CLANG-12/LLD-12 custom.defs 文件:

請記住將**<Absolute-Path-To-Merged-Profile>**替換為正確的絕對路徑。

GCC.args.O.speed = -march=native -O3 -pipe -fprofile-instr-use=&lt;Absolute-Path-To-Merged-Profile&gt;/handbrake.profdata
GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -O3 -pipe -fprofile-instr-use=&lt;Absolute-Path-To-Merged-Profile&gt;/handbrake.profdata
X265.GCC.args.O.speed = -march=native -O3 -pipe -fprofile-instr-use=&lt;Absolute-Path-To-Merged-Profile&gt;/handbrake.profdata -flto=thin
X265.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -O3 -pipe -fprofile-instr-use=&lt;Absolute-Path-To-Merged-Profile&gt;/handbrake.profdata -flto=thin
X265_8.GCC.args.O.speed = -march=native -O3 -pipe -fprofile-instr-use=&lt;Absolute-Path-To-Merged-Profile&gt;/handbrake.profdata -flto=thin
X265_8.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -O3 -pipe -fprofile-instr-use=&lt;Absolute-Path-To-Merged-Profile&gt;/handbrake.profdata -flto=thin
X265_10.GCC.args.O.speed = -march=native -O3 -pipe -fprofile-instr-use=&lt;Absolute-Path-To-Merged-Profile&gt;/handbrake.profdata -flto=thin
X265_10.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -O3 -pipe -fprofile-instr-use=&lt;Absolute-Path-To-Merged-Profile&gt;/handbrake.profdata -flto=thin
X265_12.GCC.args.O.speed = -march=native -O3 -pipe -fprofile-instr-use=&lt;Absolute-Path-To-Merged-Profile&gt;/handbrake.profdata -flto=thin
X265_12.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -O3 -pipe -fprofile-instr-use=&lt;Absolute-Path-To-Merged-Profile&gt;/handbrake.profdata -flto=thin
LIBHB.GCC.args.O.speed = -march=native -O3 -pipe -fprofile-instr-use=&lt;Absolute-Path-To-Merged-Profile&gt;/handbrake.profdata -flto=thin
LIBHB.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -O3 -pipe -fprofile-instr-use=&lt;Absolute-Path-To-Merged-Profile&gt;/handbrake.profdata -flto=thin
LIBDAV1D.GCC.args.O.speed = -march=native -O3 -pipe -fprofile-instr-use=&lt;Absolute-Path-To-Merged-Profile&gt;/handbrake.profdata -flto=thin
LIBDAV1D.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -O3 -pipe -fprofile-instr-use=&lt;Absolute-Path-To-Merged-Profile&gt;/handbrake.profdata -flto=thin
GTK.GCC.args.O.speed = -march=native -O3 -pipe -fprofile-instr-use=&lt;Absolute-Path-To-Merged-Profile&gt;/handbrake.profdata -flto=thin
GTK.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -O3 -pipe -fprofile-instr-use=&lt;Absolute-Path-To-Merged-Profile&gt;/handbrake.profdata -flto=thin
LIBDVDREAD.GCC.args.O.speed = -march=native -O3 -pipe -fprofile-instr-use=&lt;Absolute-Path-To-Merged-Profile&gt;/handbrake.profdata -flto=thin
LIBDVDREAD.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -O3 -pipe -fprofile-instr-use=&lt;Absolute-Path-To-Merged-Profile&gt;/handbrake.profdata -flto=thin
LIBDVDNAV.GCC.args.O.speed = -march=native -O3 -pipe -fprofile-instr-use=&lt;Absolute-Path-To-Merged-Profile&gt;/handbrake.profdata -flto=thin
LIBDVDNAV.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -O3 -pipe -fprofile-instr-use=&lt;Absolute-Path-To-Merged-Profile&gt;/handbrake.profdata -flto=thin
LIBBLURAY.GCC.args.O.speed = -march=native -O3 -pipe -fprofile-instr-use=&lt;Absolute-Path-To-Merged-Profile&gt;/handbrake.profdata -flto=thin
LIBBLURAY.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -O3 -pipe -fprofile-instr-use=&lt;Absolute-Path-To-Merged-Profile&gt;/handbrake.profdata -flto=thin
TEST.GCC.args.O.speed = -march=native -O3 -pipe -fprofile-instr-use=&lt;Absolute-Path-To-Merged-Profile&gt;/handbrake.profdata -flto=thin
TEST.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -O3 -pipe -fprofile-instr-use=&lt;Absolute-Path-To-Merged-Profile&gt;/handbrake.profdata -flto=thin
FDKAAC.GCC.args.O.speed = -march=native -O3 -pipe -fprofile-instr-use=&lt;Absolute-Path-To-Merged-Profile&gt;/handbrake.profdata -flto=thin
FDKAAC.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -O3 -pipe -fprofile-instr-use=&lt;Absolute-Path-To-Merged-Profile&gt;/handbrake.profdata -flto=thin
ZIMG.GCC.args.O.speed = -march=native -O3 -pipe -fprofile-instr-use=&lt;Absolute-Path-To-Merged-Profile&gt;/handbrake.profdata -flto=thin
ZIMG.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -O3 -pipe -fprofile-instr-use=&lt;Absolute-Path-To-Merged-Profile&gt;/handbrake.profdata -flto=thin
LIBDAV1D.GCC.args.O.speed = -march=native -O3 -pipe -fprofile-instr-use=&lt;Absolute-Path-To-Merged-Profile&gt;/handbrake.profdata -flto=thin
LIBDAV1D.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -O3 -pipe -fprofile-instr-use=&lt;Absolute-Path-To-Merged-Profile&gt;/handbrake.profdata -flto=thin

好了,您現在可以使用 PGO+LTO針對 GCC 或 LLVM/CLANG/LLD 建構 Handbrake v1.4.0。隨意選擇這兩個中的哪一個讓您喜歡或基準您的內心滿足!:-)

編輯 2021 年 1 月 8 日…以下所有內容都是針對Handbrake v1.3.3 完成的。 請參閱我對Handbrake v1.4.0的更新答案**

我在 GitHub 上回答了一個與我提出的問題類似的問題,並認為答案會更好地為公眾服務,在 stackexchange 上解決類似問題,而不是被埋在 github 問題單中…… https://github.com/HandBrake /HandBrake/issues/1072#issuecomment-865630524

此外,觀察到的好處也將為那些願意付出努力的人服務,並為他們節省大量的編碼/轉碼時間。他們可以在完成後對工作進行基準測試以證明斷言。

大部分程序都是從這裡描述的筆記中推導出來的並進行了實驗…… https://github.com/griff/HandBrake/blob/master/doc/BUILD-Linux

如上面連結中所述,不建議使用 CFLAGS/CXXFLAGS 來引導編譯或建構。建議使用內置的配置機制來設置 gcc 標誌。

如何?

Handbrake 只是很多“crontrib”的前端。要查看每個 contrib 模組是如何建構的,您可以在建構或目標目錄中為每個 contrib 使用“make”報告,然後再創建它們。

要獲取建構目錄,您需要通過…進行初始配置

$  ./configure --build=build --optimize=speed

如果你還沒有。

做報告

例如,假設您正在一個名為“build”的文件夾中建構 HandBrake(如上面配置命令中的值),那麼:

$  cd ./build
$  make report.help
 AVAILABLE MAKEFILE VARS REPORTS
 ----------------------------------------------------------------
 report.main            global general vars
 report.gcc             global gcc vars (inherited by module GCC)
 report.var             usage: make report.var name=VARNAME
 x265.report            X265-scoped vars
 x265_8.report          X265_8-scoped vars
 x265_10.report         X265_10-scoped vars
 x265_12.report         X265_12-scoped vars
 libdav1d.report        LIBDAV1D-scoped vars
 ffmpeg.report          FFMPEG-scoped vars
 libdvdread.report      LIBDVDREAD-scoped vars
 libdvdnav.report       LIBDVDNAV-scoped vars
 libbluray.report       LIBBLURAY-scoped vars
 nvenc.report           NVENC-scoped vars
 libhb.report           LIBHB-scoped vars
 test.report            TEST-scoped vars
 gtk.report             GTK-scoped vars
 pkg.report             PKG-scoped vars

在每一行,上面的第一列,您會看到每個報告。然後您可以通過以下方式訪問報告

$  make &lt;report_name&gt;

在哪裡替換&lt;report_name&gt;為您想要的報告。

重要的是要注意,即使在每個報告中,也存在對上述內容的層次結構和繼承。

report.gcc

可以作為 gcc 標誌的根。

就我而言,我之前選擇使用“速度”配置建構……

$  ./configure --build=build --optimize=speed

哪個映射到

GCC.args.O.speed

在裡面report.gcc

該報告中的另一個重要關鍵是

GCC.args.extra

這基本上“可能”在前者之後附加額外的編譯器選項標誌。正如您對 gcc 所知,如果選項之間存在衝突,則使用最後一個選項。由於我們不能很容易地分辨出許多模組是否使用了一個或另一個或兩者,所以我傾向於確保第一個模組中的任何內容也在後者中。但後者可以包含更多!您可以通過檢查報告查看預設值。

您可以通過在 handbrake 源文件夾的根目錄中創建一個名為“ custom.defs ”的文本文件配置來覆蓋上述內容(如果您 git 複製了它,那麼您基本上執行 git pull 命令的 HandBrake 的頂級文件夾)。

/HandBrake$ ls -h
AUTHORS.markdown  CODE_OF_CONDUCT.md  CONTRIBUTING.md  download  gtk      macosx         pkg              scripts      THANKS.markdown
build             configure           COPYING          gccFDO    libhb    make           preset           SECURITY.md  TRANSLATION.markdown
build2            contrib             custom.defs    graphics  LICENSE  NEWS.markdown  README.markdown  test         win

FDO(又名 PGO)

我在我的做 FDO(回饋導向優化又名 FDO aka PGO - Profile Guided Optimisation)所以我通常首先建構custom.defs定義為

$ cat custom.defs 
GCC.args.O.speed = -march=native -O3 -pipe -fprofile-generate=../gccFDO -fprofile-update=atomic
GCC.args.extra = -mfpmath=sse -march=native -O3 -pipe -fprofile-generate=../gccFDO -fprofile-update=atomic

然後使用不同的編解碼器、過濾器和設置執行 HandBrake 對多個影片進行轉碼;幾天來生成配置文件。然後我通過使用生成的配置文件…

$ cat custom.defs 
GCC.args.O.speed = -march=native -O3 -pipe -fprofile-use=../gccFDO -fprofile-correction -fprofile-partial-training
GCC.args.extra = -mfpmath=sse -march=native -O3 -pipe -fprofile-use=../gccFDO -fprofile-correction -fprofile-partial-training

在一個全新的建構目錄上。典型的分析對象通常是典型目標編碼類型的典型源類型。我的典型目標類型是帶有 AAC 音頻的 x265_10bit:

  1. 從 x264 到 x265_10bit
  2. 從 x265 到 x265_10bit
  3. 從各種形式的 AC3 到您使用的典型 AAC
  4. 從各種形式的 DTS 到您使用的典型 AAC
  5. 您使用的任何典型的預處理、過濾、降噪等。

您可以想像,根據您的硬體,這可能需要一段時間!我的分析花了一個星期!

您可以通過使用我上面為每個模組描述的報告過程並通過在文件中用您想要的值引用它們來覆蓋鍵來微調每個模組的編譯器標誌和優化custom_defs,就像上面的GCC.args.*預設範例一樣。

要使上述所有方法起作用,請記住不要導出 CFLAGS 或 CXXFLAGS。您可以通過以下方式檢查您在 bash 會話中設置的標誌:

$  export -p | grep FLAGS

LTO + FDO:

連結時間優化 LTO 與 FDO 結合起來非常出色,因為可以在 google 上輕鬆研究許多程序和基準測試。

不幸的是,在GCC.args.*使用-flto或為 FFMPEG 模組設置 LTO 時將 LTO 設置為預設值;整個建構**失敗。**那是一個布爾“或”。它會在一個或另一個或兩者上失敗!

但是,可以將 LTO 添加到所有其他模組!

這是我的custom.defs

$ cat custom.defs
GCC.args.O.speed = -march=native -O3 -pipe -fprofile-use=../gccFDO -fprofile-correction -fprofile-partial-training
GCC.args.extra = -mfpmath=sse -march=native -O3 -pipe -fprofile-use=../gccFDO -fprofile-correction -fprofile-partial-training
X265.GCC.args.O.speed = -march=native -O3 -pipe -flto -fprofile-use=../gccFDO -fprofile-correction -fprofile-partial-training
X265.GCC.args.extra = -mfpmath=sse -march=native -O3 -pipe -flto -fprofile-use=../gccFDO -fprofile-correction -fprofile-partial-training
X265_8.GCC.args.O.speed = -march=native -O3 -pipe -flto -fprofile-use=../gccFDO -fprofile-correction -fprofile-partial-training
X265_8.GCC.args.extra = -mfpmath=sse -march=native -O3 -pipe -flto -fprofile-use=../gccFDO -fprofile-correction -fprofile-partial-training
X265_10.GCC.args.O.speed = -march=native -O3 -pipe -flto -fprofile-use=../gccFDO -fprofile-correction -fprofile-partial-training
X265_10.GCC.args.extra = -mfpmath=sse -march=native -O3 -pipe -flto -fprofile-use=../gccFDO -fprofile-correction -fprofile-partial-training
X265_12.GCC.args.O.speed = -march=native -O3 -pipe -flto -fprofile-use=../gccFDO -fprofile-correction -fprofile-partial-training
X265_12.GCC.args.extra = -mfpmath=sse -march=native -O3 -pipe -flto -fprofile-use=../gccFDO -fprofile-correction -fprofile-partial-training
LIBHB.GCC.args.O.speed = -march=native -O3 -pipe -flto -fprofile-use=../gccFDO -fprofile-correction -fprofile-partial-training
LIBHB.GCC.args.extra = -mfpmath=sse -march=native -O3 -pipe -flto -fprofile-use=../gccFDO -fprofile-correction -fprofile-partial-training
LIBDAV1D.GCC.args.O.speed = -march=native -O3 -pipe -flto -fprofile-use=../gccFDO -fprofile-correction -fprofile-partial-training
LIBDAV1D.GCC.args.extra = -mfpmath=sse -march=native -O3 -pipe -flto -fprofile-use=../gccFDO -fprofile-correction -fprofile-partial-training
GTK.GCC.args.O.speed = -march=native -O3 -pipe -flto -fprofile-use=../gccFDO -fprofile-correction -fprofile-partial-training
GTK.GCC.args.extra = -mfpmath=sse -march=native -O3 -pipe -flto -fprofile-use=../gccFDO -fprofile-correction -fprofile-partial-training
LIBDVDREAD.GCC.args.O.speed = -march=native -O3 -pipe -flto -fprofile-use=../gccFDO -fprofile-correction -fprofile-partial-training
LIBDVDREAD.GCC.args.extra = -mfpmath=sse -march=native -O3 -pipe -flto -fprofile-use=../gccFDO -fprofile-correction -fprofile-partial-training
LIBDVDNAV.GCC.args.O.speed = -march=native -O3 -pipe -flto -fprofile-use=../gccFDO -fprofile-correction -fprofile-partial-training
LIBDVDNAV.GCC.args.extra = -mfpmath=sse -march=native -O3 -pipe -flto -fprofile-use=../gccFDO -fprofile-correction -fprofile-partial-training
LIBBLURAY.GCC.args.O.speed = -march=native -O3 -pipe -flto -fprofile-use=../gccFDO -fprofile-correction -fprofile-partial-training
LIBBLURAY.GCC.args.extra = -mfpmath=sse -march=native -O3 -pipe -flto -fprofile-use=../gccFDO -fprofile-correction -fprofile-partial-training
TEST.GCC.args.O.speed = -march=native -O3 -pipe -flto -fprofile-use=../gccFDO -fprofile-correction -fprofile-partial-training
TEST.GCC.args.extra = -mfpmath=sse -march=native -O3 -pipe -flto -fprofile-use=../gccFDO -fprofile-correction -fprofile-partial-training
NVENC.GCC.args.O.speed = -march=native -O3 -pipe -flto -fprofile-use=../gccFDO -fprofile-correction -fprofile-partial-training
NVENC.GCC.args.extra = -mfpmath=sse -march=native -O3 -pipe -flto -fprofile-use=../gccFDO -fprofile-correction -fprofile-partial-training

編輯 01/08/2021… 以上所有內容都是針對 Handbrake v1.3.3完成的。

對於 v1.4.0,上述過程對我來說失敗了 請參閱我對 v1.4.0 的其他答案。

引用自:https://unix.stackexchange.com/questions/655384