Linux
wget 沒有得到所有的頁面內容
我正在嘗試提取此頁面的藝術家。我嘗試了很多變體
wget https://northside.dk/artister/
和
wget --random-wait -r -p -e robots=off -U mozilla https://northside.dk/artister/
但我只得到
<head> <meta charset="UTF-8"> <meta name="google-site-verification" content="clAYDF67yhmgMMhQ8tcJTXpuo4TGpmHSbo4RyIMu6vY" /> <meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=0"/> <meta name="apple-mobile-web-app-capable" content="yes"> <meta name="apple-mobile-web-app-status-bar-style" content="black-translucent"> <link rel="shortcut icon" href="/img/favicon_43f15e.png"> <title>NorthSide - 4. - 6. juni 2020</title> <script id="CookieConsent" src="https://policy.app.cookieinformation.com/uc.js" type="text/javascript"></script> <!-- Google Tag Manager --> <script>(function (w, d, s, l, i) { w[l] = w[l] || []; w[l].push({ 'gtm.start': new Date().getTime(), event: 'gtm.js' }); var f = d.getElementsByTagName(s)[0], j = d.createElement(s), dl = l != 'dataLayer' ? '&l=' + l : ''; j.async = true; j.src = 'https://www.googletagmanager.com/gtm.js?id=' + i + dl; f.parentNode.insertBefore(j, f); })(window, document, 'script', 'dataLayer', 'GTM-K4RZFD'); if ( typeof "ga" === "function" ) { ga('require', 'linker'); } if ( typeof "gtag" === "function" ) { gtag('config', 'UA-22269830-1', { 'linker': { 'domains': ['northside.dk', 'ticketmaster.dk', 'tmmikrobetaling.dk'] } }); } </script> <!-- End Google Tag Manager --> <!-- Google Tag Manager (noscript) --> <noscript><iframe src="https://www.googletagmanager.com/ns.html?id=GTM-K4RZFD" height="0" width="0" style="display:none;visibility:hidden"></iframe></noscript> <!-- End Google Tag Manager (noscript) --> <link rel="shortcut icon" href="/favicon.png"><script type="text/javascript" src="/main_611e89.js"></script></head> <body> <div id="app-mount"> <div class="marble-loader" id="loader"></div> <h1 id="loading-status">Booking music</h1> </div> </body>
而不是預期的輸出,我從 Firefox 檢查器獲得了以下塊:
<a class="archive-grid-item-shell” href="/artister/3447" data-reactid=".0.1.4.1.0.0.0.$3447.1"> <div class="archive-grid-iten-content" data-reactid=".0.1.4.1.6.0.0.53447.1.0"> <div class="grid-iten-label” data-reactid=".0.1.4.1.0.8.0.$3447.1.0.0"> w/</div> <div class="grid-iten-header’ data-reactid=".0.1.4.1.0.0.0.53447.1.0.1"> <span class="text-with-background" data-reactid=".0.1.4.1.0.0.0.53447.1.0.1.0">Clara</span> </div> </div> </a>
擷取所有對象的位置。我什至嘗試使用無標題瀏覽器 lynx,但結果與 wget 相同。
我做錯了什麼,或者頁面執行的方式是我無法使用 wget 獲取內容?
試試這種方式:
wget -q -O - "https://api.northside.dk/wp-json/wp/v2/cpt-artist/?orderby=menu_order&order=asc&per_page=100" | grep -oP '"raw":.*?[^\\]"' "raw":"Clara" "raw":"Folkeklubben" "raw":"Franc Moody" "raw":"Green Day" "raw":"Hans Philip" "raw":"Johnny Marr" "raw":"Jung" "raw":"Kashmir" "raw":"Lukas Graham" "raw":"Mags" "raw":"Mekdes" "raw":"Mew" "raw":"Robyn" "raw":"Spleen United" "raw":"Weezer" "raw":"White Lies"
更新#1
在瀏覽器中啟動開發者工具,在 chrome 中您可以通過 F12 完成,然後選擇網路書籤並輸入網站地址
此時您應該會看到瀏覽器發送到伺服器的所有請求,以及它們的響應
當您很好地跟踪網路流量時,您會看到一個負責下載必要數據的請求
您可以選擇每個請求來檢查其查詢參數、響應、狀態等。