Shell
在 bash 腳本中根據模式提取多個字元串
我正在編寫一個 shell 腳本來生成目錄列表。
作為輸入,接收一個長的 html 字元串:
https://www.mycompany.com/posts/aureliaflore_china-seoul-startup-activity-6571925510337728512-acAw","$type":"com.traver.voyager.feed.actions.Action"}, link to post","url":"https://www.mycompany.com/posts/aureliaflore_reuters-top-news-on-twitter-activity-6571392661482233856-T3dO","$type": article","$type":"com.traver.voyager.feed.actions.Action"},{"actionType":"SHARE_VIA","text":"Copy link to post","url":"https://www.mycompany.com/posts/aureliaflore_are-you-thinking-to-the-benefits-of-digitalization-activity-6570119712154451968-927T","$type":"com.traver.voyager
為了使輸出易於定制,腳本只顯示一個 url-table :
https://www.mycompany.com/posts/aureliaflore_china-seoul-startup-activity-6571925510337728512-acAw https://www.mycompany.com/posts/aureliaflore_reuters-top-news-on-twitter-activity-6571392661482233856-T3dO https://www.mycompany.com/posts/aureliaflore_are-you-thinking-to-the-benefits-of-digitalization-activity-6570119712154451968-927T
要搜尋的模式是:以“ https://www ”開頭。然後是 XXXXX 個字母(動態大小),然後以 " 結尾(引用不要提取)
我目前的解決方案是基於 cut -f 但總輸入大小是動態的,因此無法找到模式。
您的範例數據看起來像 json 的損壞片段,因此您真的應該使用
jq
它來從中提取您需要的內容,然後再對導致它看起來像這樣的原始輸入執行任何操作。但是,要從您所擁有的內容中提取以雙引號字元開頭
https://www
且不包含雙引號字元的 URL,您可以使用grep
:$ grep -o 'https://www[^"]*' input.txt https://www.mycompany.com/posts/aureliaflore_china-seoul-startup-activity-6571925510337728512-acAw https://www.mycompany.com/posts/aureliaflore_reuters-top-news-on-twitter-activity-6571392661482233856-T3dO https://www.mycompany.com/posts/aureliaflore_are-you-thinking-to-the-benefits-of-digitalization-activity-6570119712154451968-927T