需要從 HTML 中提取一個數字

December 23, 2017

鑑於這種：
<p>Currencies fluctuate every day. The rate shown is effective for transactions submitted to Visa on <strong>February 5, 2017</strong>, with a bank foreign transaction fee of <st <span><strong>1</strong> Euro = <strong>1.079992</strong> United States Dolla <p>The 'currency calculator' below gives you an indication of the cost of purchas <p>February 5, 2017</p><div class="clear-both"></div> <!-- removed clearboth- <p><strong>1 EUR = 1.079992 USD</strong></p> <div class="clear-both"></di <table width="290" border="0" cellspacing="0" cellpadding="3"> <a href="/content/VISA/US/en_us/home/support/consumer/travel-support/exchange e-calculator.html"> <button class="btn btn-default btn-xs"><span class="retur <p><p>This converter uses a single rate per day with respect to any two currencies. Rates displayed may not precisely reflect actual rate applied to transaction amount due to rounding differences, Rates apply to the date the transaction was processed by Visa; this may differ from the actual date of the transaction. Banks may or may not assess foreign transaction fees on cross-border transactions. Fees are applied at banks’ discretion. Please contact your bank for more information.</p>
我需要提取**1.079992**
我正在使用：
sed -E 's:.*(1\.[0-9\.]+).*:\1:g
…哪個有效…但是有更優雅的方法嗎？
或者，有沒有辦法直接從獲得該值curl？
（我的完整命令是curl 'https://usa.visa.com/support/consumer/travel-support/exchange-rate-calculator.html/?fromCurr=USD&toCurr=EUR&fee=0&exchangedate=02/05/2017' | grep '<p><strong>1' | sed -E 's:.*(1\.[0-9\\.]+).*:\1:g' ：）

用於curl獲取、lynx解析和awk提取
請不要用 , 等解析 XML/HTML。HTMLsed是grep上下文無關的，但sed和朋友只是普通的。1
url='https://usa.visa.com/support/consumer/travel-support/exchange-rate-calculator.html/?fromCurr=USD&toCurr=EUR&fee=0&exchangedate=02/05/2017'
user_agent= 'Mozilla/5.0 (X11; Linux x86_64; rv:57.0) Gecko/20100101 Firefox/57.0'

curl -sA "${user_agent}" "${url}"  \
| lynx -stdin -dump                \
| awk '/1 EUR/{ print $4 }'
您需要某種 HTML 解析器來可靠地提取內容。在這裡，我使用lynx（基於文本的網路瀏覽器），但存在更輕量級的替代方案。
在這裡，curl檢索頁面，然後對其進行lynx解析並轉儲文本表示。搜尋字元串的/1 EUR/原因，僅找到該行：awk``1 EUR
  1 EUR = 1.079992 USD
然後{ print $4 }讓它列印第四列，1.079992.
沒有的替代解決方案curl
由於我選擇的 HTML 解析器是lynx,curl不是必需的：
url='https://usa.visa.com/support/consumer/travel-support/exchange-rate-calculator.html/?fromCurr=USD&toCurr=EUR&fee=0&exchangedate=02/05/2017'
user_agent= 'Mozilla/5.0 (X11; Linux x86_64; rv:57.0) Gecko/20100101 Firefox/57.0'

lynx -useragent="${user_agent}" -dump "${url}"  \
| awk '/1 EUR/{ print $4 }'
1 A pcre（grep -P在某些實現中）可以描述一些無上下文甚至上下文敏感的字元串集，但不是全部。
於 2017 年 12 月 23 日編輯以添加使用者代理字元串（偽裝成 Firefox），因為該站點目前阻止curl和lynx.

引用自：https://unix.stackexchange.com/questions/348893

需要從 HTML 中提取一個數字

用於`curl`獲取、`lynx`解析和`awk`提取

沒有的替代解決方案`curl`

相關問答

第 n 個模式匹配的就地文件替換

將包含數千列的文件中的特定列相乘

使用 sed 在每個 html <ul> 中將最後一行放在首位

通過替換文件中的值進行循環，然後執行 curl 請求 25 次，將計數增加 1000

兩個標籤之間的文本

如何提取多條線的圖案

需要從 HTML 中提取一個數字

用於curl獲取、lynx解析和awk提取

沒有的替代解決方案curl

相關問答

第 n 個模式匹配的就地文件替換

將包含數千列的文件中的特定列相乘

使用 sed 在每個 html <ul> 中將最後一行放在首位

通過替換文件中的值進行循環，然後執行 curl 請求 25 次，將計數增加 1000

兩個標籤之間的文本

如何提取多條線的圖案

用於`curl`獲取、`lynx`解析和`awk`提取

沒有的替代解決方案`curl`