Scripting
如何逐行比較兩個文件?
我有兩個文件 A 和 B 幾乎相同,有些行不同,有些行混亂。由於這兩個文件是 systemverilog 文件,因此這些行還包含特殊字元,例如
; , = +
等。我想遍歷fileA的每一行並檢查fileB中是否有相應的匹配項。比較應遵循規則
- 可以忽略行首和行尾的空格。
- 單詞之間的多個空格/製表符可以被視為單個空格。
- 空行可以忽略
結果應顯示文件 A 中存在但文件 B 中不存在的行。
我試過
tkdiff
了,但由於有些線條混亂,它顯示出很多差異。
我無法談論它的便攜性,但我試圖涵蓋所有基礎。我已盡力根據您的資訊在測試中複製這兩個文件。如果您在使用 sed 時遇到特殊字元問題,可以在 cleanLine 函式的第二行中將其轉義。
#!/bin/bash # compare two files and return lines in # first file that are missing in second file ProgName=${0##*/} Pid=$$ CHK_FILE="$1" REF_FILE="$2" D_BUG="$3" TMP_FILE="/tmp/REF_${Pid}.tmp" declare -a MISSING='()' m=0 scriptUsage() { cat <<ENDUSE $ProgName <file_to_check> <reference_file> [-d|--debug] Lines in 'file_to_check' not present in 'reference_file' are printed to standard output. file_to_check: File being checked reference_file: File to be checked against -d|--debug: Run script in debug mode (Optional) -h|--help: Print this help message ENDUSE } # delete temp file on any exit trap 'rm $TMP_FILE > /dev/null 2>&1' EXIT #-- check args [[ $CHK_FILE == "-h" || $CHK_FILE == "--help" ]] && { scriptUsage; exit 0; } [[ -n $CHK_FILE && -n $REF_FILE ]] || { >&2 echo "Not enough arguments!"; scriptUsage; exit 1; } [[ $D_BUG == "-d" || $D_BUG == "--debug" ]] && set -x [[ -s $CHK_FILE ]] || { >&2 echo "File $CHK_FILE not found"; exit 1; } [[ -s $REF_FILE ]] || { >&2 echo "File $REF_FILE not found"; exit 1; } #-- #== edit temp file to 3 match comparison rules # copy ref file to temp for editing cp "$REF_FILE" $TMP_FILE || { >&2 echo "Unable to create temporary file"; exit 1; } # rule 3 - ignore empty lines sed -i '/^\s*$/d' $TMP_FILE # rule 1 - ignore begin/end of line spaces sed -i 's/^[[:space:]][[:space:]]*//;s/[[:space:]][[:space:]]*$//' $TMP_FILE # rule 2 - multi space/tab as single space sed -i 's/[[:space:]][[:space:]]*/ /g' $TMP_FILE #== # function to clean LINE to match 3 rules # & escape '/' and '.' for later sed command cleanLine() { var=$(echo "$1" | sed 's/^[[:space:]][[:space:]]*//;s/[[:space:]][[:space:]]*$//;s/[[:space:]][[:space:]]*/ /g') echo $var | sed 's/\//\\\//g;s/\./\\\./g' } ### parse check file while IFS='' read -r LINE || [[ -n $LINE ]] do if [[ -z $LINE ]] then continue else CLN_LINE=$(cleanLine "$LINE") FOUND=$(sed -n "/$CLN_LINE/{p;q}" $TMP_FILE) [[ -z $FOUND ]] && MISSING[$m]="$LINE" && ((m++)) FOUND="" fi done < "$CHK_FILE" ### #++ print missing line(s) (if any) if (( $m > 0 )) then printf "\n Missing line(s) found:\n" #*SEE BELOW ON THIS for (( p=0; $p<$m; p++ )) do printf " %s\n" "${MISSING[$p]}" done echo else printf "\n **No missing lines found**\n\n" fi #* using 'for p in ${MISSING[@]}' causes: #* "SPACED LINES" to become: #* "SPACED" #* "LINES" when printed to stdout! #++