嘗試使用 grep 從 HTML 文件中刪除所有 id

June 10, 2013

我正在嘗試id=" "從文件中刪除所有 s.html但我不確定我哪裡出錯了。我嘗試使用正則表達式，但我得到的只是.html在我的 Ubuntu 終端中呈現的文件。
程式碼：
grep -Ev '^$id\="[a-zA-Z][0-9]"' *.html
我正在執行它bash ex.sh。

雖然這違背了我更好的判斷，但我會發布它（sed部分）。
那就是：如果是為了快速而骯髒的修復，請繼續。如果它更嚴重或您將經常做的事情等。使用其他東西，如 python、perl 等，您不依賴正則表達式，而是依賴模組來處理 HTML 文件。
一種更簡單的方法是使案例如 sed。
sed 's/$&lt;[^&gt;]*$ \+id="[^"]*"$[^&gt;]*&gt;$/\1\2/' sample.html &gt; noid.html
解釋：
           +--------------------------------- Match group 1
           |                      +---------- Match group 2
        ___|___                ___|___
       |       |              |       |  
sed 's/$&lt;[^&gt;]*$ \+id="[^"]*"$[^&gt;]*&gt;$/\1\2/' sample.html &gt; noid.html
    |   |  | |   |  |    | ||    |  |      |
    |   |  | |   |  |    | ||    |  |      +- \1\2  Subst. with group 1 and 2
    |   |  | |   |  |    | ||    |  +-------- &gt;     Closing bracket
    |   |  | |   |  |    | ||    +----------- [^&gt;]* Same as below
    |   |  | |   |  |    | |+---------------- "     Followed by "
    |   |  | |   |  |    | +----------------- *     Zero or more times
    |   |  | |   |  |    +------------------- [^"]  Not double-quote
    |   |  | |   |  +------------------------ id="  Literal string
    |   |  | |   +---------------------------  \+   Space 1 or more times
    |   |  | +------------------------------- *     Zero or more times 
    |   |  +--------------------------------- [^&gt;]  Not closing bracket
    |   +------------------------------------ &lt;     Opening bracket
    +---------------------------------------- s     Substitute
用於sed -i就地編輯文件。（遺憾可能，但不可撤銷。）
更好的; 使用 perl 的範例：
#!/usr/bin/perl

use strict;
use warnings;

use HTML::TokeParser::Simple;
use HTML::Entities;
use utf8;

die "$0 [file]\n" unless defined $ARGV[0];

my $parser = HTML::TokeParser::Simple-&gt;new(file =&gt; $ARGV[0]);

if (!$parser) {
   die "No HTML file found.\n";
}

while (my $token = $parser-&gt;get_token) {
   $token-&gt;delete_attr('id');
   print $token-&gt;as_is;
}
您的 grep 命令將不匹配任何內容。但是當您使用反轉選項-v時，它會列印所有不匹配的內容——因此是整個文件。
grep 不是就地文件修飾符，但通常是在文件中查找內容的工具。嘗試例如：
grep -o '$&lt;[^&gt;]*$id="[^"]*"[^&gt;]*&gt;' sample.html
-o表示只列印匹配的模式。（不是整行）
sed等awk通常用於編輯流或文件。例如上面的例子。
從你的 grep 有一些錯誤的概念：
id\="[a-zA-Z][0-9]"
將完全匹配：
id=
範圍內的一個字元a-z或A-Z
後跟一位數字
換句話說，它將匹配：
id="a0"
id="a1"
id="a2"
...
id="Z9"
沒有什麼像：id="foo99"或id="blah-gah"。
此外，它將匹配：
^ &lt;-- start of line (As it is first in pattern or group)
$ &lt;-- end of line   (As you use the `-E` option)
# Else it would be:
^ &lt;-- start of line (As it is first in pattern or group)
$ &lt;-- dollar sign   (Does not mean end of line unless it is at end of
                     pattern or group)
因此什麼都沒有。

引用自：https://unix.stackexchange.com/questions/72917

嘗試使用 grep 從 HTML 文件中刪除所有 id

相關問答

如何使用 bash、grep 或 sed 從文件中獲取第一個正則表達式結果？

捲曲和 grep html 文本

用於從 HTML 文件中提取數據的 AWK、SED 或 GREP

GNU grep 手冊中的字元類

為什麼它不能辨識模式“10”？在下面的文字中？

正則表達式會在字元串之後 grep 時間