使用 sed 刪除字元

September 25, 2018

我正在使用 AIX unix 並嘗試從文件中刪除不可列印的字元，Caucasian male lives in Arizona w/ fiancÃÂÃÂÃÂÃÂÃÂ當我使用 UTF-8 編碼在 Notepad++ 中查看時，數據看起來像在文件中。當我嘗試在 unix 中查看文件時，我得到 ^▒▒^▒▒^▒▒^▒▒^▒▒^▒▒ 而不是特殊字元。
我想用空格替換所有這些特殊字元。
我嘗試了 sed 's/[^[:print:]]/ /g' file，但它沒有刪除這些字元。執行時我的語言環境在下面列出locale -a
C
POSIX
en_US.8859-15
en_US.ISO8859-1
en_US
我什至嘗試過sed -e 's/[^ -~]/ /g' file，它沒有刪除字元。
我看到其他堆棧流答案使用UTF-8帶有 GNU sed 的語言環境，這有效，但我沒有那個語言環境。
我也在使用ksh.

如果目前語言環境已經使用 UTF-8 作為字元集（並且文件是使用該字元集編寫的）：
&lt;file LC_ALL=C sed 's/[^ -~]//g'
或者，要在 AIX sed 中包含控製字元：
&lt;file LC_ALL=C sed "$(printf "s/[^[:print:]\t\r]//g")"

您可以使用以下命令tr：

tr -cd '[:print:]\t\r\n'

解釋：

`[:print:]'
Any character from the `[:space:]' class, and any character that is not in the `[:graph:]' class
\r -- return
\t -- horizontal tab

例子based on Centos 7:_is GNU and UTF-8 encoding

$ echo "fiancÃÂÃÂÃÂÃÂÃÂ" | tr -cd '[:print:]\t\r\n'
fianc

$ echo "get ^▒▒^▒▒^▒▒^▒▒^▒▒^▒▒ " | tr -cd '[:print:]\t\r\n'
get ^^^^^^

echo " Caucasian male lives in Arizona w/ fianc▒^▒▒^▒▒^▒▒^▒▒^▒▒^▒^▒▒^▒▒^▒▒^▒▒^▒▒^▒"  | tr -cd '[:print:]\t\r\n'
Caucasian male lives in Arizona w/ fianc^^^^^^^^^^^^

引用自：https://unix.stackexchange.com/questions/471405

使用 sed 刪除字元

相關問答

使用 sed 為 solaris 上命令的輸出著色

如何使用 sed 將多個空格剝離為一個？

在包含欄位中的返回字元的 csv 文件中的每條記錄的末尾添加字元

awk 匹配參數上方的所有行

不使用“column - t”均勻對齊列

如何從長文件中重複列印選定的行數？