不精確的文本搜尋

October 14, 2016

是否有任何類似grep甚至uniq不用於不精確搜尋的實用程序，或者我應該自己編寫？
我的意思是它將查看 90%（數量可能會有所不同）匹配，或類似的東西。例如，我有幾個字元串的文件：
abc123
abd123
abc223
qwe938
在這種情況下，此類實用程序應返回前三個字元串或說它們相似。當然，我不知道文件內容的任何模式，例如grepor uniq。

agrep或 tre-grep 會按照您的要求執行；它們是“近似”正則表達式匹配/grep。有關詳細資訊，另請參閱Wikipedia 文章。

% tre-agrep --help | head             (05-23 16:53)
Usage: tre-agrep [OPTION]... PATTERN [FILE]...
Searches for approximate matches of PATTERN in each FILE or standard input.
Example: `tre-agrep -2 optimize foo.txt' outputs all lines in file `foo.txt'     that
match "optimize" within two errors.  E.g. lines which contain "optimise",
"optmise", and "opitmize" all match.

Regexp selection and interpretation:
 -e, --regexp=PATTERN      use PATTERN as a regular expression
 -i, --ignore-case         ignore case distinctions
 -k, --literal             PATTERN is a literal string


% agrep  | head                       (05-23 16:53)
usage: agrep [-@#abcdehiklnoprstvwxyBDGIMSV] [-f patternfile] [-H dir] pattern [files]

summary of frequently used options:
(For a more detailed listing see 'man agrep'.)
-#: find matches with at most # errors
-c: output the number of matched records
-d: define record delimiter
-h: do not output file names
-i: case-insensitive search, e.g., 'a' = 'A'
-l: output the names of files that contain a match
-n: output record prefixed by record number
-v: output those records that have no matches
-w: pattern has to match as a word, e.g., 'win' will not match 'wind'
-B: best match mode. find the closest matches to the pattern
-G: output the files that contain a match
-H 'dir': the cast-dictionary is located in directory 'dir'

引用自：https://unix.stackexchange.com/questions/39240

不精確的文本搜尋

相關問答

正則表達式會在字元串之後 grep 時間

如何在日誌文件中用逗號分隔

使用 grep 獲取多個字元串的出現

如何使用 grep 命令搜尋不包含連續輔音的單詞？

使用 Linux grep 命令搜尋以相同字元開頭和結尾的單詞

如何grep除匹配項和上一行之外的所有內容