AWK：在字典中的源術語之後隨機選擇行插入目標術語

February 1, 2022

注意：我已經在AWK 中問過一個類似的問題：Quick way to insert target words after an source term，我是 AWK 的初學者。
這個問題考慮在隨機選擇的行中在源詞之後插入多個目標詞。
有了這個 AWK 程式碼片段
awk '(NR==FNR){a[$1];next}
   FNR in a { gsub(/\&lt;source term\&gt;/,"& target term") }
    1
   ' &lt;(shuf -n 5 -i 1-$(wc -l &lt; file)) file
我想target term在.source term``file
例如：我有一個雙語詞典dict，其中包含左側的源術語和右側的目標術語，例如
apple     : Apfel
banana    : Banane
raspberry : Himbeere
我的file由以下幾行組成：
I love the Raspberry Pi.
The monkey loves eating a banana.
Who wants an apple pi?
Apple pen... pineapple pen... pen-pineapple-apple-pen!
The banana is tasty and healthy.
An apple a day keeps the doctor away.
Which fruit is tastes better: raspberry or strawberry?
假設第一個單詞apple隨機選擇第 1、3、5、4、7 行。帶有單詞 apple 的輸出將如下所示：
I love the Raspberry Pi.
The monkey loves eating a banana.
Who wants an apple Apfel pi?
Apple Apfel pen... pineapple pen... pen-pineapple-apple-pen!
The banana is tasty and healthy.
An apple a day keeps the doctor away.
Which fruit is tastes better: raspberry or strawberry?
然後是另外 5 條隨機線；3、3、5、6、7；對於單詞banana將被選中：
I love the Raspberry Pi .
The monkey loves eating a banana .
Who wants an apple Apfel pi ?
Apple Apfel pen... pineapple pen... pen-pineapple-apple-pen!
The banana Banane is tasty and healthy .
An apple a day keeps the doctor away .
Which fruit is tastes better: raspberry or strawberry?
dict在匹配最後一個條目之前，所有其他條目也是如此。
我想選擇 5 條隨機線。如果這些行有一個完整的源術語，比如我apple只想匹配整個單詞（諸如“菠蘿”之類的術語將被忽略）。如果一行包含兩次源術語，例如，那麼我也想在它之後插入目標術語。匹配應該不區分大小寫，所以我也可以匹配源術語，比如and 。Apfel``apple``apple``apple``Apple
我的問題：我怎樣才能重寫上面的程式碼片段，這樣我就可以使用字典dict，它選擇隨機行file並在源術語後面插入目標術語？

以下是如何使用 awk 從輸入文件中隨機選擇 5 個行號（第一次使用 wc 來計算行號）：
$ awk -v numLines="$(wc -l &lt; file)" 'BEGIN{srand(); for (i=1; i&lt;=5; i++) print int(1+rand()*numLines)}'
7
2
88
13
18
現在您所要做的就是接受我之前的答案，並且對於ARGIND==1塊中讀取的每個“舊”字元串生成 5 個行號，如上所示，填充一個數組，將生成的行號映射到與每個行號關聯的舊字元串，並在讀取最終輸入文件時檢查目前行號是否在數組中，如果是，則循環遍歷儲存在數組中該行號的“舊”，按照gsub()我之前的回答執行。
將 GNU awk 用於ARGIND、IGNORECASE、字邊界、數組數組和的\s簡寫[[:space:]]：
$ cat tst.sh
#!/usr/bin/env bash

awk -v numLines=$(wc -l &lt; file) '
   BEGIN {
       FS = "\\s*:\\s*"
       IGNORECASE = 1
       srand()
   }
   ARGIND == 1 {
       old = "\\&lt;" $1 "\\&gt;"
       new = "& " $2
       for (i=1; i&lt;=5; i++) {
           lineNr = int(1+rand()*numLines)
           map[lineNr][old] = new
       }
       next
   }
   FNR in map {
       for ( old in map[FNR] ) {
           new = map[FNR][old]
           gsub(old,new)
       }
   }
   { print }
' dict file
$ ./tst.sh
I love the Raspberry Pi.
The monkey loves eating a banana Banane.
Who wants an apple Apfel pi?
Apple Apfel pen... pineapple pen... pen-pineapple-apple Apfel-pen!
The banana Banane is tasty and healthy.
An apple a day keeps the doctor away.
Which fruit is tastes better: raspberry Himbeere or strawberry?

引用自：https://unix.stackexchange.com/questions/688689

AWK：在字典中的源術語之後隨機選擇行插入目標術語

相關問答

僅在第一個匹配模式之前插入行塊

如何確保文件每行的最後一個字元以單引號字元結尾

將文件中的一行拆分為兩列

如何僅在文件的最後一行末尾添加單引號

如何合併和修改兩個文件的列

AWK：在源詞之後插入目標詞的快速方法