如何使用 FDO (PGO) + LTO 在 Linux 上建構最新的手剎?
將 CFLAGS 和 CXXFLAGS 傳遞給最新版本(撰寫本文時為 v1.3.3)的 HandBrake 建構將起作用,直到您添加
-flto
這將使整個建構失敗。如何使用 LTO 選項建構 HandBrake,
-flto
並將其作為延伸目標,以及 FDO(回饋定向優化又名 FDO aka PGO)?HandBrake 中的大多數編解碼器都是使用“手動編碼”程序集開發的,因此許多人斷言編譯器優化收益不會那麼多。
我想測試和挑戰這個斷言!
我已經重新嘗試了最新的標籤版本,即GCC-11 和 CLANG-12 中的**Handbrake v1.4.0 。**需要進行一些更改才能成功建構所需的配置。例如,GCC-11 建構無法為某些模組成功建構,因為它無法在訓練後解析配置文件的路徑(絕對路徑中的 gcda 文件)。
下面是針對 v1.4.0 的 GCC-11 和 CLANG-12 的訓練和 FDO 配置,並且與之前針對 Handbrake v1.3.3 的答案過程不同。
GCC-11:
為 GCC-11 配置和建構命令:
./configure --harden --optimize=speed --enable-fdk-aac --disable-nvenc --build=build-v1.4.0 && cd ./build-v1.4.0 && time make -j$(( $(nproc) + 1 ));
訓練/分析階段手剎 V1.4.0 –> GCC-11 custom.defs 文件:
GCC.args.O.speed = -march=native -msse2avx -O3 -pipe -fprofile-generate -fprofile-update=atomic GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -msse2avx -O3 -pipe -fprofile-generate -fprofile-update=atomic X265.GCC.args.O.speed = -march=native -msse2avx -O3 -pipe -fprofile-generate -fprofile-update=atomic -flto X265.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -msse2avx -O3 -pipe -fprofile-generate -fprofile-update=atomic -flto X265_8.GCC.args.O.speed = -march=native -msse2avx -O3 -pipe -fprofile-generate -fprofile-update=atomic -flto X265_8.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -msse2avx -O3 -pipe -fprofile-generate -fprofile-update=atomic -flto X265_10.GCC.args.O.speed = -march=native -msse2avx -O3 -pipe -fprofile-generate -fprofile-update=atomic -flto X265_10.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -msse2avx -O3 -pipe -fprofile-generate -fprofile-update=atomic -flto X265_12.GCC.args.O.speed = -march=native -msse2avx -O3 -pipe -fprofile-generate -fprofile-update=atomic -flto X265_12.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -msse2avx -O3 -pipe -fprofile-generate -fprofile-update=atomic -flto LIBHB.GCC.args.O.speed = -march=native -msse2avx -O3 -pipe -fprofile-generate -fprofile-update=atomic -flto LIBHB.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -msse2avx -O3 -pipe -fprofile-generate -fprofile-update=atomic -flto LIBDAV1D.GCC.args.O.speed = -march=native -msse2avx -O3 -pipe -fprofile-generate -fprofile-update=atomic -flto LIBDAV1D.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -msse2avx -O3 -pipe -fprofile-generate -fprofile-update=atomic -flto GTK.GCC.args.O.speed = -march=native -msse2avx -O3 -pipe -fprofile-generate -fprofile-update=atomic -flto GTK.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -msse2avx -O3 -pipe -fprofile-generate -fprofile-update=atomic -flto LIBDVDREAD.GCC.args.O.speed = -march=native -msse2avx -O3 -pipe -fprofile-generate -fprofile-update=atomic -flto LIBDVDREAD.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -msse2avx -O3 -pipe -fprofile-generate -fprofile-update=atomic -flto LIBDVDNAV.GCC.args.O.speed = -march=native -msse2avx -O3 -pipe -fprofile-generate -fprofile-update=atomic -flto LIBDVDNAV.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -msse2avx -O3 -pipe -fprofile-generate -fprofile-update=atomic -flto LIBBLURAY.GCC.args.O.speed = -march=native -msse2avx -O3 -pipe -fprofile-generate -fprofile-update=atomic -flto LIBBLURAY.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -msse2avx -O3 -pipe -fprofile-generate -fprofile-update=atomic -flto TEST.GCC.args.O.speed = -march=native -msse2avx -O3 -pipe -fprofile-generate -fprofile-update=atomic -flto TEST.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -msse2avx -O3 -pipe -fprofile-generate -fprofile-update=atomic -flto FDKAAC.GCC.args.O.speed = -march=native -msse2avx -O3 -pipe -fprofile-generate -fprofile-update=atomic -flto FDKAAC.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -msse2avx -O3 -pipe -fprofile-generate -fprofile-update=atomic -flto ZIMG.GCC.args.O.speed = -march=native -msse2avx -O3 -pipe -fprofile-generate -fprofile-update=atomic -flto ZIMG.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -msse2avx -O3 -pipe -fprofile-generate -fprofile-update=atomic -flto LIBDAV1D.GCC.args.O.speed = -march=native -msse2avx -O3 -pipe -fprofile-generate -fprofile-update=atomic -flto LIBDAV1D.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -msse2avx -O3 -pipe -fprofile-generate -fprofile-update=atomic -flto
FDO STAGE HANDBRAKE V1.4.0 –> GCC-11 custom.defs 文件:
GCC.args.O.speed = -march=native -msse2avx -O3 -pipe -fprofile-use -fprofile-correction -fprofile-partial-training GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -msse2avx -O3 -pipe -fprofile-use -fprofile-correction -fprofile-partial-training X265.GCC.args.O.speed = -march=native -msse2avx -O3 -pipe -fprofile-use -fprofile-correction -fprofile-partial-training -flto X265.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -msse2avx -O3 -pipe -fprofile-use -fprofile-correction -fprofile-partial-training -flto X265_8.GCC.args.O.speed = -march=native -msse2avx -O3 -pipe -fprofile-use -fprofile-correction -fprofile-partial-training -flto X265_8.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -msse2avx -O3 -pipe -fprofile-use -fprofile-correction -fprofile-partial-training -flto X265_10.GCC.args.O.speed = -march=native -msse2avx -O3 -pipe -fprofile-use -fprofile-correction -fprofile-partial-training -flto X265_10.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -msse2avx -O3 -pipe -fprofile-use -fprofile-correction -fprofile-partial-training -flto X265_12.GCC.args.O.speed = -march=native -msse2avx -O3 -pipe -fprofile-use -fprofile-correction -fprofile-partial-training -flto X265_12.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -msse2avx -O3 -pipe -fprofile-use -fprofile-correction -fprofile-partial-training -flto LIBHB.GCC.args.O.speed = -march=native -msse2avx -O3 -pipe -fprofile-use -fprofile-correction -fprofile-partial-training -flto LIBHB.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -msse2avx -O3 -pipe -fprofile-use -fprofile-correction -fprofile-partial-training -flto LIBDAV1D.GCC.args.O.speed = -march=native -msse2avx -O3 -pipe -fprofile-use -fprofile-correction -fprofile-partial-training -flto LIBDAV1D.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -msse2avx -O3 -pipe -fprofile-use -fprofile-correction -fprofile-partial-training -flto GTK.GCC.args.O.speed = -march=native -msse2avx -O3 -pipe -fprofile-use -fprofile-correction -fprofile-partial-training -flto GTK.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -msse2avx -O3 -pipe -fprofile-use -fprofile-correction -fprofile-partial-training -flto LIBDVDREAD.GCC.args.O.speed = -march=native -msse2avx -O3 -pipe -fprofile-use -fprofile-correction -fprofile-partial-training -flto LIBDVDREAD.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -msse2avx -O3 -pipe -fprofile-use -fprofile-correction -fprofile-partial-training -flto LIBDVDNAV.GCC.args.O.speed = -march=native -msse2avx -O3 -pipe -fprofile-use -fprofile-correction -fprofile-partial-training -flto LIBDVDNAV.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -msse2avx -O3 -pipe -fprofile-use -fprofile-correction -fprofile-partial-training -flto LIBBLURAY.GCC.args.O.speed = -march=native -msse2avx -O3 -pipe -fprofile-use -fprofile-correction -fprofile-partial-training -flto LIBBLURAY.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -msse2avx -O3 -pipe -fprofile-use -fprofile-correction -fprofile-partial-training -flto TEST.GCC.args.O.speed = -march=native -msse2avx -O3 -pipe -fprofile-use -fprofile-correction -fprofile-partial-training -flto TEST.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -msse2avx -O3 -pipe -fprofile-use -fprofile-correction -fprofile-partial-training -flto FDKAAC.GCC.args.O.speed = -march=native -msse2avx -O3 -pipe -fprofile-use -fprofile-correction -fprofile-partial-training -flto FDKAAC.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -msse2avx -O3 -pipe -fprofile-use -fprofile-correction -fprofile-partial-training -flto ZIMG.GCC.args.O.speed = -march=native -msse2avx -O3 -pipe -fprofile-use -fprofile-correction -fprofile-partial-training -flto ZIMG.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -msse2avx -O3 -pipe -fprofile-use -fprofile-correction -fprofile-partial-training -flto LIBDAV1D.GCC.args.O.speed = -march=native -msse2avx -O3 -pipe -fprofile-use -fprofile-correction -fprofile-partial-training -flto LIBDAV1D.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -msse2avx -O3 -pipe -fprofile-use -fprofile-correction -fprofile-partial-training -flto
LLVM-12/CLANG-12/LLD-12:
Clang 的 PGO 與 GCC 略有不同。顯而易見的是,與 GCC 和相關的預設工具相比,使用 Clang/LLVM/LLD 時解析模組的絕對路徑沒有問題。然而,Clang 有一個額外的合併步驟來合併 FDO 所需的原始配置文件。
因此有3個步驟:
- 培訓/簡介階段
- 合併原始配置文件數據
- FDO階段
詳細的步驟命令。步驟 1 和 3 的 custom.defs 文件分別在下面的三個步驟之後列出。本節純粹是為了說明每個步驟所需的命令,而不是 custom.defs。因此,在執行配置和建構命令之前,您需要確保 custom.defs 就位:
- 為 LLVM-12/CLANG-12/LLD-12 配置和建構命令:
LDFLAGS="-fuse-ld=lld" ./configure --ar /usr/bin/llvm-ar --ranlib /usr/bin/llvm-ranlib --strip /usr/bin/llvm-strip --cc /usr/bin/clang --optimize=speed --enable-fdk-aac --disable-nvenc --build=build-v1.4.0-CLANG && cd ./build-v1.4.0-CLANG && time LDFLAGS="-fuse-ld=lld" make -j$(( $(nproc) + 1 ));
建構後,像使用 GCC 一樣正常訓練/配置文件,或者如果您嘗試了 v1.3.3 的早期說明。
- 訓練/分析後,合併原始配置文件數據。將 和 路徑替換為您的建構的正確位置。
llvm-profdata merge -output=<Absolute-Path>/handbrake.profdata <Absolute-Path-To-Profile-Files>/default_*.profraw
- FDO Build,這是與步驟 1 完全相同的單行命令。不同之處在 custom.defs 文件中。
LDFLAGS="-fuse-ld=lld" ./configure --ar /usr/bin/llvm-ar --ranlib /usr/bin/llvm-ranlib --strip /usr/bin/llvm-strip --cc /usr/bin/clang --optimize=speed --enable-fdk-aac --disable-nvenc --build=build-v1.4.0-CLANG && cd ./build-v1.4.0-CLANG && time LDFLAGS="-fuse-ld=lld" make -j$(( $(nproc) + 1 ));
訓練/分析階段手剎 V1.4.0 –> LLVM-12/CLANG-12/LLD-12 custom.defs 文件:
請記住將**<Absolute-Path-To-Profile-Files>**替換為正確的絕對路徑。
GCC.args.O.speed = -march=native -O3 -pipe -fprofile-instr-generate=<Absolute-Path-To-Profile-Files>/default_%m.profraw -fprofile-update=atomic GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -O3 -pipe -fprofile-instr-generate=<Absolute-Path-To-Profile-Files>/default_%m.profraw -fprofile-update=atomic X265.GCC.args.O.speed = -march=native -O3 -pipe -fprofile-instr-generate=<Absolute-Path-To-Profile-Files>/default_%m.profraw -fprofile-update=atomic -flto=thin X265.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -O3 -pipe -fprofile-instr-generate=<Absolute-Path-To-Profile-Files>/default_%m.profraw -fprofile-update=atomic -flto=thin X265_8.GCC.args.O.speed = -march=native -O3 -pipe -fprofile-instr-generate=<Absolute-Path-To-Profile-Files>/default_%m.profraw -fprofile-update=atomic -flto=thin X265_8.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -O3 -pipe -fprofile-instr-generate=<Absolute-Path-To-Profile-Files>/default_%m.profraw -fprofile-update=atomic -flto=thin X265_10.GCC.args.O.speed = -march=native -O3 -pipe -fprofile-instr-generate=<Absolute-Path-To-Profile-Files>/default_%m.profraw -fprofile-update=atomic -flto=thin X265_10.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -O3 -pipe -fprofile-instr-generate=<Absolute-Path-To-Profile-Files>/default_%m.profraw -fprofile-update=atomic -flto=thin X265_12.GCC.args.O.speed = -march=native -O3 -pipe -fprofile-instr-generate=<Absolute-Path-To-Profile-Files>/default_%m.profraw -fprofile-update=atomic -flto=thin X265_12.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -O3 -pipe -fprofile-instr-generate=<Absolute-Path-To-Profile-Files>/default_%m.profraw -fprofile-update=atomic -flto=thin LIBHB.GCC.args.O.speed = -march=native -O3 -pipe -fprofile-instr-generate=<Absolute-Path-To-Profile-Files>/default_%m.profraw -fprofile-update=atomic -flto=thin LIBHB.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -O3 -pipe -fprofile-instr-generate=<Absolute-Path-To-Profile-Files>/default_%m.profraw -fprofile-update=atomic -flto=thin LIBDAV1D.GCC.args.O.speed = -march=native -O3 -pipe -fprofile-instr-generate=<Absolute-Path-To-Profile-Files>/default_%m.profraw -fprofile-update=atomic -flto=thin LIBDAV1D.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -O3 -pipe -fprofile-instr-generate=<Absolute-Path-To-Profile-Files>/default_%m.profraw -fprofile-update=atomic -flto=thin GTK.GCC.args.O.speed = -march=native -O3 -pipe -fprofile-instr-generate=<Absolute-Path-To-Profile-Files>/default_%m.profraw -fprofile-update=atomic -flto=thin GTK.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -O3 -pipe -fprofile-instr-generate=<Absolute-Path-To-Profile-Files>/default_%m.profraw -fprofile-update=atomic -flto=thin LIBDVDREAD.GCC.args.O.speed = -march=native -O3 -pipe -fprofile-instr-generate=<Absolute-Path-To-Profile-Files>/default_%m.profraw -fprofile-update=atomic -flto=thin LIBDVDREAD.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -O3 -pipe -fprofile-instr-generate=<Absolute-Path-To-Profile-Files>/default_%m.profraw -fprofile-update=atomic -flto=thin LIBDVDNAV.GCC.args.O.speed = -march=native -O3 -pipe -fprofile-instr-generate=<Absolute-Path-To-Profile-Files>/default_%m.profraw -fprofile-update=atomic -flto=thin LIBDVDNAV.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -O3 -pipe -fprofile-instr-generate=<Absolute-Path-To-Profile-Files>/default_%m.profraw -fprofile-update=atomic -flto=thin LIBBLURAY.GCC.args.O.speed = -march=native -O3 -pipe -fprofile-instr-generate=<Absolute-Path-To-Profile-Files>/default_%m.profraw -fprofile-update=atomic -flto=thin LIBBLURAY.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -O3 -pipe -fprofile-instr-generate=<Absolute-Path-To-Profile-Files>/default_%m.profraw -fprofile-update=atomic -flto=thin TEST.GCC.args.O.speed = -march=native -O3 -pipe -fprofile-instr-generate=<Absolute-Path-To-Profile-Files>/default_%m.profraw -fprofile-update=atomic -flto=thin TEST.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -O3 -pipe -fprofile-instr-generate=<Absolute-Path-To-Profile-Files>/default_%m.profraw -fprofile-update=atomic -flto=thin FDKAAC.GCC.args.O.speed = -march=native -O3 -pipe -fprofile-instr-generate=<Absolute-Path-To-Profile-Files>/default_%m.profraw -fprofile-update=atomic -flto=thin FDKAAC.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -O3 -pipe -fprofile-instr-generate=<Absolute-Path-To-Profile-Files>/default_%m.profraw -fprofile-update=atomic -flto=thin ZIMG.GCC.args.O.speed = -march=native -O3 -pipe -fprofile-instr-generate=<Absolute-Path-To-Profile-Files>/default_%m.profraw -fprofile-update=atomic -flto=thin ZIMG.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -O3 -pipe -fprofile-instr-generate=<Absolute-Path-To-Profile-Files>/default_%m.profraw -fprofile-update=atomic -flto=thin LIBDAV1D.GCC.args.O.speed = -march=native -O3 -pipe -fprofile-instr-generate=<Absolute-Path-To-Profile-Files>/default_%m.profraw -fprofile-update=atomic -flto=thin LIBDAV1D.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -O3 -pipe -fprofile-instr-generate=<Absolute-Path-To-Profile-Files>/default_%m.profraw -fprofile-update=atomic -flto=thin
FDO STAGE HANDBRAKE V1.4.0 –> LLVM-12/CLANG-12/LLD-12 custom.defs 文件:
請記住將**<Absolute-Path-To-Merged-Profile>**替換為正確的絕對路徑。
GCC.args.O.speed = -march=native -O3 -pipe -fprofile-instr-use=<Absolute-Path-To-Merged-Profile>/handbrake.profdata GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -O3 -pipe -fprofile-instr-use=<Absolute-Path-To-Merged-Profile>/handbrake.profdata X265.GCC.args.O.speed = -march=native -O3 -pipe -fprofile-instr-use=<Absolute-Path-To-Merged-Profile>/handbrake.profdata -flto=thin X265.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -O3 -pipe -fprofile-instr-use=<Absolute-Path-To-Merged-Profile>/handbrake.profdata -flto=thin X265_8.GCC.args.O.speed = -march=native -O3 -pipe -fprofile-instr-use=<Absolute-Path-To-Merged-Profile>/handbrake.profdata -flto=thin X265_8.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -O3 -pipe -fprofile-instr-use=<Absolute-Path-To-Merged-Profile>/handbrake.profdata -flto=thin X265_10.GCC.args.O.speed = -march=native -O3 -pipe -fprofile-instr-use=<Absolute-Path-To-Merged-Profile>/handbrake.profdata -flto=thin X265_10.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -O3 -pipe -fprofile-instr-use=<Absolute-Path-To-Merged-Profile>/handbrake.profdata -flto=thin X265_12.GCC.args.O.speed = -march=native -O3 -pipe -fprofile-instr-use=<Absolute-Path-To-Merged-Profile>/handbrake.profdata -flto=thin X265_12.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -O3 -pipe -fprofile-instr-use=<Absolute-Path-To-Merged-Profile>/handbrake.profdata -flto=thin LIBHB.GCC.args.O.speed = -march=native -O3 -pipe -fprofile-instr-use=<Absolute-Path-To-Merged-Profile>/handbrake.profdata -flto=thin LIBHB.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -O3 -pipe -fprofile-instr-use=<Absolute-Path-To-Merged-Profile>/handbrake.profdata -flto=thin LIBDAV1D.GCC.args.O.speed = -march=native -O3 -pipe -fprofile-instr-use=<Absolute-Path-To-Merged-Profile>/handbrake.profdata -flto=thin LIBDAV1D.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -O3 -pipe -fprofile-instr-use=<Absolute-Path-To-Merged-Profile>/handbrake.profdata -flto=thin GTK.GCC.args.O.speed = -march=native -O3 -pipe -fprofile-instr-use=<Absolute-Path-To-Merged-Profile>/handbrake.profdata -flto=thin GTK.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -O3 -pipe -fprofile-instr-use=<Absolute-Path-To-Merged-Profile>/handbrake.profdata -flto=thin LIBDVDREAD.GCC.args.O.speed = -march=native -O3 -pipe -fprofile-instr-use=<Absolute-Path-To-Merged-Profile>/handbrake.profdata -flto=thin LIBDVDREAD.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -O3 -pipe -fprofile-instr-use=<Absolute-Path-To-Merged-Profile>/handbrake.profdata -flto=thin LIBDVDNAV.GCC.args.O.speed = -march=native -O3 -pipe -fprofile-instr-use=<Absolute-Path-To-Merged-Profile>/handbrake.profdata -flto=thin LIBDVDNAV.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -O3 -pipe -fprofile-instr-use=<Absolute-Path-To-Merged-Profile>/handbrake.profdata -flto=thin LIBBLURAY.GCC.args.O.speed = -march=native -O3 -pipe -fprofile-instr-use=<Absolute-Path-To-Merged-Profile>/handbrake.profdata -flto=thin LIBBLURAY.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -O3 -pipe -fprofile-instr-use=<Absolute-Path-To-Merged-Profile>/handbrake.profdata -flto=thin TEST.GCC.args.O.speed = -march=native -O3 -pipe -fprofile-instr-use=<Absolute-Path-To-Merged-Profile>/handbrake.profdata -flto=thin TEST.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -O3 -pipe -fprofile-instr-use=<Absolute-Path-To-Merged-Profile>/handbrake.profdata -flto=thin FDKAAC.GCC.args.O.speed = -march=native -O3 -pipe -fprofile-instr-use=<Absolute-Path-To-Merged-Profile>/handbrake.profdata -flto=thin FDKAAC.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -O3 -pipe -fprofile-instr-use=<Absolute-Path-To-Merged-Profile>/handbrake.profdata -flto=thin ZIMG.GCC.args.O.speed = -march=native -O3 -pipe -fprofile-instr-use=<Absolute-Path-To-Merged-Profile>/handbrake.profdata -flto=thin ZIMG.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -O3 -pipe -fprofile-instr-use=<Absolute-Path-To-Merged-Profile>/handbrake.profdata -flto=thin LIBDAV1D.GCC.args.O.speed = -march=native -O3 -pipe -fprofile-instr-use=<Absolute-Path-To-Merged-Profile>/handbrake.profdata -flto=thin LIBDAV1D.GCC.args.extra = -mfpmath=sse -mavx -fstack-protector-strong -D_FORTIFY_SOURCE=2 -march=native -O3 -pipe -fprofile-instr-use=<Absolute-Path-To-Merged-Profile>/handbrake.profdata -flto=thin
好了,您現在可以使用 PGO+LTO針對 GCC 或 LLVM/CLANG/LLD 建構 Handbrake v1.4.0。隨意選擇這兩個中的哪一個讓您喜歡或基準您的內心滿足!:-)
編輯 2021 年 1 月 8 日…以下所有內容都是針對Handbrake v1.3.3 完成的。 請參閱我對Handbrake v1.4.0的更新答案**
我在 GitHub 上回答了一個與我提出的問題類似的問題,並認為答案會更好地為公眾服務,在 stackexchange 上解決類似問題,而不是被埋在 github 問題單中…… https://github.com/HandBrake /HandBrake/issues/1072#issuecomment-865630524
此外,觀察到的好處也將為那些願意付出努力的人服務,並為他們節省大量的編碼/轉碼時間。他們可以在完成後對工作進行基準測試以證明斷言。
大部分程序都是從這裡描述的筆記中推導出來的並進行了實驗…… https://github.com/griff/HandBrake/blob/master/doc/BUILD-Linux
如上面連結中所述,不建議使用 CFLAGS/CXXFLAGS 來引導編譯或建構。建議使用內置的配置機制來設置 gcc 標誌。
如何?
Handbrake 只是很多“crontrib”的前端。要查看每個 contrib 模組是如何建構的,您可以在建構或目標目錄中為每個 contrib 使用“make”報告,然後再創建它們。
要獲取建構目錄,您需要通過…進行初始配置
$ ./configure --build=build --optimize=speed
如果你還沒有。
做報告
例如,假設您正在一個名為“build”的文件夾中建構 HandBrake(如上面配置命令中的值),那麼:
$ cd ./build $ make report.help AVAILABLE MAKEFILE VARS REPORTS ---------------------------------------------------------------- report.main global general vars report.gcc global gcc vars (inherited by module GCC) report.var usage: make report.var name=VARNAME x265.report X265-scoped vars x265_8.report X265_8-scoped vars x265_10.report X265_10-scoped vars x265_12.report X265_12-scoped vars libdav1d.report LIBDAV1D-scoped vars ffmpeg.report FFMPEG-scoped vars libdvdread.report LIBDVDREAD-scoped vars libdvdnav.report LIBDVDNAV-scoped vars libbluray.report LIBBLURAY-scoped vars nvenc.report NVENC-scoped vars libhb.report LIBHB-scoped vars test.report TEST-scoped vars gtk.report GTK-scoped vars pkg.report PKG-scoped vars
在每一行,上面的第一列,您會看到每個報告。然後您可以通過以下方式訪問報告
$ make <report_name>
在哪裡替換
<report_name>
為您想要的報告。重要的是要注意,即使在每個報告中,也存在對上述內容的層次結構和繼承。
report.gcc
可以作為 gcc 標誌的根。
就我而言,我之前選擇使用“速度”配置建構……
$ ./configure --build=build --optimize=speed
哪個映射到
GCC.args.O.speed
在裡面
report.gcc
該報告中的另一個重要關鍵是
GCC.args.extra
這基本上“可能”在前者之後附加額外的編譯器選項標誌。正如您對 gcc 所知,如果選項之間存在衝突,則使用最後一個選項。由於我們不能很容易地分辨出許多模組是否使用了一個或另一個或兩者,所以我傾向於確保第一個模組中的任何內容也在後者中。但後者可以包含更多!您可以通過檢查報告查看預設值。
您可以通過在 handbrake 源文件夾的根目錄中創建一個名為“ custom.defs ”的文本文件配置來覆蓋上述內容(如果您 git 複製了它,那麼您基本上執行 git pull 命令的 HandBrake 的頂級文件夾)。
/HandBrake$ ls -h AUTHORS.markdown CODE_OF_CONDUCT.md CONTRIBUTING.md download gtk macosx pkg scripts THANKS.markdown build configure COPYING gccFDO libhb make preset SECURITY.md TRANSLATION.markdown build2 contrib custom.defs graphics LICENSE NEWS.markdown README.markdown test win
FDO(又名 PGO)
我在我的做 FDO(回饋導向優化又名 FDO aka PGO - Profile Guided Optimisation)所以我通常首先建構
custom.defs
定義為$ cat custom.defs GCC.args.O.speed = -march=native -O3 -pipe -fprofile-generate=../gccFDO -fprofile-update=atomic GCC.args.extra = -mfpmath=sse -march=native -O3 -pipe -fprofile-generate=../gccFDO -fprofile-update=atomic
然後使用不同的編解碼器、過濾器和設置執行 HandBrake 對多個影片進行轉碼;幾天來生成配置文件。然後我通過使用生成的配置文件…
$ cat custom.defs GCC.args.O.speed = -march=native -O3 -pipe -fprofile-use=../gccFDO -fprofile-correction -fprofile-partial-training GCC.args.extra = -mfpmath=sse -march=native -O3 -pipe -fprofile-use=../gccFDO -fprofile-correction -fprofile-partial-training
在一個全新的建構目錄上。典型的分析對象通常是典型目標編碼類型的典型源類型。我的典型目標類型是帶有 AAC 音頻的 x265_10bit:
- 從 x264 到 x265_10bit
- 從 x265 到 x265_10bit
- 從各種形式的 AC3 到您使用的典型 AAC
- 從各種形式的 DTS 到您使用的典型 AAC
- 您使用的任何典型的預處理、過濾、降噪等。
您可以想像,根據您的硬體,這可能需要一段時間!我的分析花了一個星期!
您可以通過使用我上面為每個模組描述的報告過程並通過在文件中用您想要的值引用它們來覆蓋鍵來微調每個模組的編譯器標誌和優化
custom_defs
,就像上面的GCC.args.*
預設範例一樣。要使上述所有方法起作用,請記住不要導出 CFLAGS 或 CXXFLAGS。您可以通過以下方式檢查您在 bash 會話中設置的標誌:
$ export -p | grep FLAGS
LTO + FDO:
連結時間優化 LTO 與 FDO 結合起來非常出色,因為可以在 google 上輕鬆研究許多程序和基準測試。
不幸的是,在
GCC.args.*
使用-flto
或為 FFMPEG 模組設置 LTO 時將 LTO 設置為預設值;整個建構**失敗。**那是一個布爾“或”。它會在一個或另一個或兩者上失敗!但是,可以將 LTO 添加到所有其他模組!
這是我的
custom.defs
…$ cat custom.defs GCC.args.O.speed = -march=native -O3 -pipe -fprofile-use=../gccFDO -fprofile-correction -fprofile-partial-training GCC.args.extra = -mfpmath=sse -march=native -O3 -pipe -fprofile-use=../gccFDO -fprofile-correction -fprofile-partial-training X265.GCC.args.O.speed = -march=native -O3 -pipe -flto -fprofile-use=../gccFDO -fprofile-correction -fprofile-partial-training X265.GCC.args.extra = -mfpmath=sse -march=native -O3 -pipe -flto -fprofile-use=../gccFDO -fprofile-correction -fprofile-partial-training X265_8.GCC.args.O.speed = -march=native -O3 -pipe -flto -fprofile-use=../gccFDO -fprofile-correction -fprofile-partial-training X265_8.GCC.args.extra = -mfpmath=sse -march=native -O3 -pipe -flto -fprofile-use=../gccFDO -fprofile-correction -fprofile-partial-training X265_10.GCC.args.O.speed = -march=native -O3 -pipe -flto -fprofile-use=../gccFDO -fprofile-correction -fprofile-partial-training X265_10.GCC.args.extra = -mfpmath=sse -march=native -O3 -pipe -flto -fprofile-use=../gccFDO -fprofile-correction -fprofile-partial-training X265_12.GCC.args.O.speed = -march=native -O3 -pipe -flto -fprofile-use=../gccFDO -fprofile-correction -fprofile-partial-training X265_12.GCC.args.extra = -mfpmath=sse -march=native -O3 -pipe -flto -fprofile-use=../gccFDO -fprofile-correction -fprofile-partial-training LIBHB.GCC.args.O.speed = -march=native -O3 -pipe -flto -fprofile-use=../gccFDO -fprofile-correction -fprofile-partial-training LIBHB.GCC.args.extra = -mfpmath=sse -march=native -O3 -pipe -flto -fprofile-use=../gccFDO -fprofile-correction -fprofile-partial-training LIBDAV1D.GCC.args.O.speed = -march=native -O3 -pipe -flto -fprofile-use=../gccFDO -fprofile-correction -fprofile-partial-training LIBDAV1D.GCC.args.extra = -mfpmath=sse -march=native -O3 -pipe -flto -fprofile-use=../gccFDO -fprofile-correction -fprofile-partial-training GTK.GCC.args.O.speed = -march=native -O3 -pipe -flto -fprofile-use=../gccFDO -fprofile-correction -fprofile-partial-training GTK.GCC.args.extra = -mfpmath=sse -march=native -O3 -pipe -flto -fprofile-use=../gccFDO -fprofile-correction -fprofile-partial-training LIBDVDREAD.GCC.args.O.speed = -march=native -O3 -pipe -flto -fprofile-use=../gccFDO -fprofile-correction -fprofile-partial-training LIBDVDREAD.GCC.args.extra = -mfpmath=sse -march=native -O3 -pipe -flto -fprofile-use=../gccFDO -fprofile-correction -fprofile-partial-training LIBDVDNAV.GCC.args.O.speed = -march=native -O3 -pipe -flto -fprofile-use=../gccFDO -fprofile-correction -fprofile-partial-training LIBDVDNAV.GCC.args.extra = -mfpmath=sse -march=native -O3 -pipe -flto -fprofile-use=../gccFDO -fprofile-correction -fprofile-partial-training LIBBLURAY.GCC.args.O.speed = -march=native -O3 -pipe -flto -fprofile-use=../gccFDO -fprofile-correction -fprofile-partial-training LIBBLURAY.GCC.args.extra = -mfpmath=sse -march=native -O3 -pipe -flto -fprofile-use=../gccFDO -fprofile-correction -fprofile-partial-training TEST.GCC.args.O.speed = -march=native -O3 -pipe -flto -fprofile-use=../gccFDO -fprofile-correction -fprofile-partial-training TEST.GCC.args.extra = -mfpmath=sse -march=native -O3 -pipe -flto -fprofile-use=../gccFDO -fprofile-correction -fprofile-partial-training NVENC.GCC.args.O.speed = -march=native -O3 -pipe -flto -fprofile-use=../gccFDO -fprofile-correction -fprofile-partial-training NVENC.GCC.args.extra = -mfpmath=sse -march=native -O3 -pipe -flto -fprofile-use=../gccFDO -fprofile-correction -fprofile-partial-training
編輯 01/08/2021… 以上所有內容都是針對 Handbrake v1.3.3完成的。
對於 v1.4.0,上述過程對我來說失敗了 請參閱我對 v1.4.0 的其他答案。