Shell

在 bash 腳本中根據模式提取多個字元串

  • September 8, 2019

我正在編寫一個 shell 腳本來生成目錄列表。

作為輸入,接收一個長的 html 字元串:

https://www.mycompany.com/posts/aureliaflore_china-seoul-startup-activity-6571925510337728512-acAw","$type":"com.traver.voyager.feed.actions.Action"},
link to post","url":"https://www.mycompany.com/posts/aureliaflore_reuters-top-news-on-twitter-activity-6571392661482233856-T3dO","$type":
article","$type":"com.traver.voyager.feed.actions.Action"},{"actionType":"SHARE_VIA","text":"Copy link to post","url":"https://www.mycompany.com/posts/aureliaflore_are-you-thinking-to-the-benefits-of-digitalization-activity-6570119712154451968-927T","$type":"com.traver.voyager

為了使輸出易於定制,腳本只顯示一個 url-table :

https://www.mycompany.com/posts/aureliaflore_china-seoul-startup-activity-6571925510337728512-acAw
https://www.mycompany.com/posts/aureliaflore_reuters-top-news-on-twitter-activity-6571392661482233856-T3dO
https://www.mycompany.com/posts/aureliaflore_are-you-thinking-to-the-benefits-of-digitalization-activity-6570119712154451968-927T

要搜尋的模式是:以“ https://www ”開頭。然後是 XXXXX 個字母(動態大小),然後以 " 結尾(引用不要提取)

我目前的解決方案是基於 cut -f 但總輸入大小是動態的,因此無法找到模式。

您的範例數據看起來像 json 的損壞片段,因此您真的應該使用jq它來從中提取您需要的內容,然後再對導致它看起來像這樣的原始輸入執行任何操作。

但是,要從您所擁有的內容中提取以雙引號字元開頭https://www且不包含雙引號字元的 URL,您可以使用grep

$ grep -o 'https://www[^"]*' input.txt 
https://www.mycompany.com/posts/aureliaflore_china-seoul-startup-activity-6571925510337728512-acAw
https://www.mycompany.com/posts/aureliaflore_reuters-top-news-on-twitter-activity-6571392661482233856-T3dO
https://www.mycompany.com/posts/aureliaflore_are-you-thinking-to-the-benefits-of-digitalization-activity-6570119712154451968-927T

引用自:https://unix.stackexchange.com/questions/539566