Text-Processing

刪除 Web 伺服器日誌文件中早於 X 天的行?

  • February 11, 2018

我在 Ubuntu 上使用預設的“主”日誌格式執行 Nginx,它會產生如下輸出:

95.108.181.102 - - [11/Feb/2018:11:43:10 +0000] "GET /blog/ HTTP/1.1" 200 4438 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" "-"

我有一個從不旋轉的主日誌文件,我將其與 GoAccess(日誌解析/報告軟體)一起使用。我希望刪除該文件中日誌條目超過 30 天左右的行。這可以做到嗎,最好是使用 bash 單線?

我計劃將此添加到現有的每日 cronjob 以生成滾動的 30 天報告。我希望使用這樣的東西,但我不能完全正確地解析日誌:sed -i '/<magical-invocation-goes-here> --date="-30 days"/d' example.log

GNU**awk**解決方案:

樣品test.log

95.108.181.102 - - [11/Feb/2018:11:43:10 +0000] "GET /blog/ HTTP/1.1" 200 4438 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" "-"
95.108.181.102 - - [11/Aug/2017:11:43:10 +0000] "GET /blog/ HTTP/1.1" 200 4438 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" "-"
95.108.181.102 - - [01/Jan/2018:11:43:10 +0000] "GET /blog/ HTTP/1.1" 200 4438 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" "-"
95.108.181.102 - - [11/Feb/2018:11:43:10 +0000] "GET /blog/ HTTP/1.1" 200 4438 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" "-"

awk -v m1_ago=$(date -d"-1 month" +%s) \
'BEGIN{ 
    split("Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec", month);
    for (i in month) m_nums[month[i]] = i
}
{ split(substr($4,2), a, "[/:]") }
mktime(sprintf("%d %d %d %d %d %d", a[3], m_nums[a[2]], a[1], a[4], a[5], a[6])) > m1_ago
' test.log > tmp_log && mv tmp_log test.log

最終test.log內容:

95.108.181.102 - - [11/Feb/2018:11:43:10 +0000] "GET /blog/ HTTP/1.1" 200 4438 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" "-"
95.108.181.102 - - [11/Feb/2018:11:43:10 +0000] "GET /blog/ HTTP/1.1" 200 4438 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" "-"

引用自:https://unix.stackexchange.com/questions/423462