AWK:在字典中的源術語之後隨機選擇行插入目標術語
注意:我已經在AWK 中問過一個類似的問題:Quick way to insert target words after an source term,我是 AWK 的初學者。
這個問題考慮在隨機選擇的行中在源詞之後插入多個目標詞。
有了這個 AWK 程式碼片段
awk '(NR==FNR){a[$1];next} FNR in a { gsub(/\<source term\>/,"& target term") } 1 ' <(shuf -n 5 -i 1-$(wc -l < file)) file
我想
target term
在.source term``file
例如:我有一個雙語詞典
dict
,其中包含左側的源術語和右側的目標術語,例如apple : Apfel banana : Banane raspberry : Himbeere
我的
file
由以下幾行組成:I love the Raspberry Pi. The monkey loves eating a banana. Who wants an apple pi? Apple pen... pineapple pen... pen-pineapple-apple-pen! The banana is tasty and healthy. An apple a day keeps the doctor away. Which fruit is tastes better: raspberry or strawberry?
假設第一個單詞
apple
隨機選擇第 1、3、5、4、7 行。帶有單詞 apple 的輸出將如下所示:I love the Raspberry Pi. The monkey loves eating a banana. Who wants an apple Apfel pi? Apple Apfel pen... pineapple pen... pen-pineapple-apple-pen! The banana is tasty and healthy. An apple a day keeps the doctor away. Which fruit is tastes better: raspberry or strawberry?
然後是另外 5 條隨機線;3、3、5、6、7;對於單詞
banana
將被選中:I love the Raspberry Pi . The monkey loves eating a banana . Who wants an apple Apfel pi ? Apple Apfel pen... pineapple pen... pen-pineapple-apple-pen! The banana Banane is tasty and healthy . An apple a day keeps the doctor away . Which fruit is tastes better: raspberry or strawberry?
dict
在匹配最後一個條目之前,所有其他條目也是如此。我想選擇 5 條隨機線。如果這些行有一個完整的源術語,比如我
apple
只想匹配整個單詞(諸如“菠蘿”之類的術語將被忽略)。如果一行包含兩次源術語,例如,那麼我也想在它之後插入目標術語。匹配應該不區分大小寫,所以我也可以匹配源術語,比如and 。Apfel``apple``apple``apple``Apple
我的問題:我怎樣才能重寫上面的程式碼片段,這樣我就可以使用字典
dict
,它選擇隨機行file
並在源術語後面插入目標術語?
以下是如何使用 awk 從輸入文件中隨機選擇 5 個行號(第一次使用 wc 來計算行號):
$ awk -v numLines="$(wc -l < file)" 'BEGIN{srand(); for (i=1; i<=5; i++) print int(1+rand()*numLines)}' 7 2 88 13 18
現在您所要做的就是接受我之前的答案,並且對於
ARGIND==1
塊中讀取的每個“舊”字元串生成 5 個行號,如上所示,填充一個數組,將生成的行號映射到與每個行號關聯的舊字元串,並在讀取最終輸入文件時檢查目前行號是否在數組中,如果是,則循環遍歷儲存在數組中該行號的“舊”,按照gsub()
我之前的回答執行。將 GNU awk 用於
ARGIND
、IGNORECASE
、 字邊界、數組數組和 的\s
簡寫[[:space:]]
:$ cat tst.sh #!/usr/bin/env bash awk -v numLines=$(wc -l < file) ' BEGIN { FS = "\\s*:\\s*" IGNORECASE = 1 srand() } ARGIND == 1 { old = "\\<" $1 "\\>" new = "& " $2 for (i=1; i<=5; i++) { lineNr = int(1+rand()*numLines) map[lineNr][old] = new } next } FNR in map { for ( old in map[FNR] ) { new = map[FNR][old] gsub(old,new) } } { print } ' dict file
$ ./tst.sh I love the Raspberry Pi. The monkey loves eating a banana Banane. Who wants an apple Apfel pi? Apple Apfel pen... pineapple pen... pen-pineapple-apple Apfel-pen! The banana Banane is tasty and healthy. An apple a day keeps the doctor away. Which fruit is tastes better: raspberry Himbeere or strawberry?