如何從日誌條目中提取 XML？

February 20, 2018

我們在日誌中將 XML 消息記錄到下游系統。
我正在嘗試使用sed從日誌條目中提取 XML，但不確定如何使用它。
這是一個典型的日誌條目：
2018-02-20T10:02:51.395Z|hostname1|user1||Application Name||10062|DEBUG|o.s.i.channel.DirectChannel||postSend (sent=true) on channel 'logger', message: GenericMessage [payload=<?xml version="1.0" encoding="UTF-8" standalone="yes"?><canonMessage xmlns="somenamespace">...the message body...</canonMessage>, headers={quote_format=FpML, id=f572ea65-91dd-a610-7976-5a1e97c16524, quote_message_id=b640bd90-1624-11e8-a904-bd3c0f5af83b_1519120971176, quote_data=Quote Rep, quote_transaction_originator=user1, timestamp=1519120971394}]
如何從 XML 中剝離日誌條目的前端和後端？
上述行的輸出sed應為：
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><canonMessage xmlns="somenamespace">...the message body...</canonMessage>

grep -o '<?xml.*</canonMessage>' /path/to/log應該做的伎倆。
-o選項grep告訴它只輸出與提供的正則表達式匹配的數據。令人高興的是，您在這裡只討論提取（部分）XML，而不是解析它。

我通過使用下面提到的 sed 命令得到了上面提到的輸出

sed  "s/.*payload=//g" input.xml | sed "s/,.*//g"

輸出

&lt;?xml version="1.0" encoding="UTF-8" standalone="yes"?&gt;&lt;canonMessage xmlns="somenamespace"&gt;...the message body...&lt;/canonMessage&gt;

引用自：https://unix.stackexchange.com/questions/425458

如何從日誌條目中提取 XML？

相關問答

sed + 標記線，如果是行進的單詞

使用缺少“s”的變數中的 sed 值替換 xml 文件中的值

sed + 在行首添加字元串，但如果已經存在則忽略

如何在 sed 內的 bash 執行中保留雙引號

sed 正則表達式無法擷取包含該模式的整個段落

匹配開始標籤+XML中的任何內容後搜尋並替換結束標籤