Shell

用於從文件列表中提取數據並將其保存為 csv 的 Shell 腳本

  • May 12, 2019

我在 CentOS 上。我有一個要讀取的文件列表,從中提取數據並將其組織為 csv 文件。

日誌文件文本格式為:

...
{"name":"test-api","hostname":"ci47","pid":3202,"level":30,"msg":"File: dsiManager, Method: getContract, End { userId: 'AFC5EH5PIHHLO4XS7SG',\n  clientId: '5003700557',\n  intent: 'YesIntent',\n }","time":"2019-01-21T12:23:10.323Z","v":0}
...

輸出格式必須是:

clientId;intent;time;userId
5003700557;YesIntent;2019-01-21T12:23:10.323Z;AFC5EH5PIHHLO4XS7SG

完成這項任務的最簡單方法是什麼?(awk,grep…)

要穩健地解析 JSON 編碼的數據,您將需要一個 JSON 編解碼器。這幾乎意味著 Perl 或 Python(或 Ruby …)。由於我是 Perl 人,這裡有一個 Perl 解決方案。

首先是單線:

$ perl -MJSON -ne 'BEGIN { print("clientId;intent;time;userId\n"); } eval { my $obj = from_json($_); my $msg = $obj->{msg}; $msg =~ s/^.*{\s*|\s*,\s*}.*$//g; my %m = map { m/^([^:]*):\s*(.*)/; ($1, $2) } split(/,\s+/, $msg); print("$m{clientId};$m{intent};$obj->{time};$m{userId}\n"); }; warn($@) if ($@);' <x
clientId;intent;time;userId
5003700557;YesIntent;2019-01-21T12:23:10.323Z;AFC5EH5PIHHLO4XS7SG

由於這有點過分,即使對於 Perl,這裡也是一個可讀的腳本:

#!/usr/bin/perl

use strict;
use warnings;
use JSON;

print("clientId;intent;time;userId\n");
while (<>) {
   # Don't choke on malformed lines
   eval {
       my $obj = from_json($_);
       my $msg = $obj->{msg};
       $msg =~
           s/^.*{\s*    # Trim up to and including the leading '{'
           |
           \s*,\s*}.*$  # Trim trailing ',}'
           //gx;
       # Split $msg into key-value pairs
       my %m = map {
           m/^([^:]*)   # Stuff that isn't ':'
           :\s*         # Field separator
           (.*)         # Everything after the separator
           /x;
           ($1, $2)
       } split(/,\s+/, $msg);
       print("$m{clientId};$m{intent};$obj->{time};$m{userId}\n");
   };
   warn($@) if ($@);
}

試試這個,

awk -F "['\"]" 'NF>=26{print $19","$21","$26","$17}' file.csv


5003700557,YesIntent,2019-01-21T12:23:10.323Z,AFC5EH5PIHHLO4XS7SG
  • ['\"]將單引號和雙引號作為分隔符。
  • NF>=26只需檢查該行是否有超過或等於 26 個欄位。

引用自:https://unix.stackexchange.com/questions/509822