提取 epub 文件的目錄

February 1, 2018

最近我點擊了將列印pdf文件目錄的命令。
mutool show file.pdf outline
我想使用與上述epub格式類似的簡單使用和良好結果的pdf格式命令。
有沒有類似的東西？

.epub文件是.zip包含 XHTML 和 CSS 以及其他一些文件（包括圖像、各種元數據文件，可能還有稱為toc.ncx包含目錄的 XML 文件）的文件。
以下腳本用於unzip -p提取toc.ncx到標準輸出，通過xml2命令通過管道傳輸，然後sed僅提取每個章節標題的文本。
它在命令行上接受一個或多個文件名參數。
#! /bin/sh

# This script needs InfoZIP's unzip program
# and the xml2 tool from http://ofb.net/~egnor/xml2/
# and sed, of course.

for f in "$@" ; do
   echo "$f:"
   unzip -p "$f" toc.ncx | 
       xml2 | 
       sed -n -e 's:^/ncx/navMap/navPoint/navLabel/text=:  :p'
   echo
done
它輸出 epub 的文件名後跟 a :，然後在接下來的行中將每個章節標題縮進兩個空格。例如：
book.epub:
 Chapter One
 Chapter Two
 Chapter Three
 Chapter Four
 Chapter Five

book2.epub:
 Chapter One
 Chapter Two
 Chapter Three
 Chapter Four
 Chapter Five
如果 epub 文件不包含toc.ncx，您將看到該特定書籍的輸出如下：
book3.epub:
caution: filename not matched:  toc.ncx
error: Extra content at the end of the document
第一個錯誤行來自unzip，第二個來自xml2。 xml2還會警告它發現的其他錯誤 - 例如格式不正確的toc.ncx文件。
請注意，錯誤消息在 stderr 上，而書的文件名仍在 stdout 上。
xml2可用於 Debian、Ubuntu 和其他 debian-derivatives 以及可能大多數其他 Linux 發行版的預打包。
對於像這樣的簡單任務（例如，您只想將 XML 轉換為面向行的格式以與sed、awk、cut、grep等xml2一起使用），比xmlstarlet.
順便說一句，如果您還想列印 epub 的標題，請將sed腳本更改為：
sed -n -e 's:^/ncx/navMap/navPoint/navLabel/text=:  :p
          s!^/ncx/docTitle/text=!  Title: !p'
或用awk腳本替換它：
awk -F= '/(navLabel|docTitle)\/text/ {print $2}'

引用自：https://unix.stackexchange.com/questions/284283

提取 epub 文件的目錄

相關問答

為什麼“貓”不能讀取 pdf 文件的內容？

Gnome 電子書閱讀器推薦

SH，不能使平等工作

如何更改 Pandoc 生成的 html 文件中內聯 pdf 的大小？

如何在解壓前更改 tar.gz 中的文件夾名稱？

Linux：作為其他和組刪除文件