Wget

wget 沒有檢索到正確大小的文件(文件損壞或不完整?)

  • September 15, 2019

我不明白…

實際的下載連結似乎不是 http 連結,而是一些 Javascript 操作?

javascript:SendFileDownloadCall('PRODIMAGES.CIF.zip','PRODIMAGES.CIF.zip');

所以手動下載後,我去瀏覽器的下載歷史複製直接連結 https://au.ingrammicro.com/_layouts/CommerceServer/IM/FileDownload.aspx?DisplayName=STD_FULL_FILEFEED.TXT&FileName=STDPRICE_FULL.TXT.zip

我將 URL 連同我的網站憑據一起輸入 wget:

wget -q --user=XXXX --password=XXXX "https://au.ingrammicro.com/_layouts/CommerceServer/IM/FileDownload.aspx?DisplayName=STD_FULL_FILEFEED.TXT&FileName=STDPRICE_FULL.TXT.zip" -o STDPRICE.zip

後來,我發現添加 –user 和 –password 沒有區別,所以我省略了:

[root@server datafiles]# wget "https://au.ingrammicro.com/_layouts/CommerceServer/IM/FileDownload.aspx?DisplayName=STD_FULL_FILEFEED.TXT&FileName=STDPRICE_FULL" -O STDPRICE.zip
--2019-09-15 19:53:29--  https://au.ingrammicro.com/_layouts/CommerceServer/IM/FileDownload.aspx?DisplayName=STD_FULL_FILEFEED.TXT&FileName=STDPRICE_FULL
Resolving au.ingrammicro.com (au.ingrammicro.com)... 104.98.45.15
Connecting to au.ingrammicro.com (au.ingrammicro.com)|104.98.45.15|:443... connected.
HTTP request sent, awaiting response... 302 Moved Temporarily
Location: /_layouts/CommerceServer/IM/Login.aspx?ReturnUrl=%2f_layouts%2fCommerceServer%2fIM%2fFileDownload.aspx%3fDisplayName%3dSTD_FULL_FILEFEED.TXT%26FileName%3dSTDPRICE_FULL [following]
--2019-09-15 19:53:29--  https://au.ingrammicro.com/_layouts/CommerceServer/IM/Login.aspx?ReturnUrl=%2f_layouts%2fCommerceServer%2fIM%2fFileDownload.aspx%3fDisplayName%3dSTD_FULL_FILEFEED.TXT%26FileName%3dSTDPRICE_FULL
Reusing existing connection to au.ingrammicro.com:443.
HTTP request sent, awaiting response... 200 OK
Length: 85341 (83K) [text/html]
Saving to: ‘STDPRICE.zip’

100%[===================================================================================================================================================================================================>] 85,341       405KB/s   in 0.2s

2019-09-15 19:53:30 (405 KB/s) - ‘STDPRICE.zip’ saved [85341/85341]

無論如何,我沒有獲得與通過人工點擊並從網站下載獲得的文件相同的文件,而是獲得了一個小得難以置信的文件。

確認我的恐懼,當我嘗試解壓縮時,我得到:

$ [root@server datafiles]# unzip STDPRICE.zip
Archive:  STDPRICE.zip
 End-of-central-directory signature not found.  Either this file is not
 a zipfile, or it constitutes one disk of a multi-part archive.  In the
 latter case the central directory and zipfile comment will be found on
 the last disk(s) of this archive.
unzip:  cannot find zipfile directory in one of STDPRICE.zip or
       STDPRICE.zip.zip, and cannot find STDPRICE.zip.ZIP, period.

檔案檢查:

$ [root@server datafiles]# file STDPRICE.zip
STDPRICE.zip: HTML document, UTF-8 Unicode text, with very long lines, with CRLF line terminators

那麼 wget 實際上已經下載了一個 HTML 文件,該文件被呈現為一個.txt.zip文件?有人可以啟發我嗎?

該站點將您重定向到登錄頁面:

HTTP request sent, awaiting response... 302 Moved Temporarily
Location: /_layouts/CommerceServer/IM/Login.aspx?ReturnUrl=%2f_layouts%2fCommerceServer%2fIM%2fFileDownload.aspx%3fDisplayName%3dSTD_FULL_FILEFEED.TXT%26FileName%3dSTDPRICE_FULL [following]
--2019-09-15 19:53:29--  https://au.ingrammicro.com/_layouts/CommerceServer/IM/Login.aspx?ReturnUrl=%2f_layouts%2fCommerceServer%2fIM%2fFileDownload.aspx%3fDisplayName%3dSTD_FULL_FILEFEED.TXT%26FileName%3dSTDPRICE_FULL

它可能不接受您作為基本身份驗證提供的憑據(這是 wget 發送的),而是使用會話 cookie。您可以嘗試從瀏覽器中提取 cookie(在登錄時)並使用 wget ( --load-cookies) 發送它們。他們可能還會關注您可能嘗試修改的請求的其他方面(例如使用者代理)。

如果您可以改用 curl,請打開檢查器 ( Ctrl+Shift+I),轉到網路選項卡,下載文件,右鍵點擊請求列表中的下載條目,將滑鼠懸停在“複製”上,然後選擇“複製為 cURL”,現在剪貼板上的命令將包含 cookie。

引用自:https://unix.stackexchange.com/questions/541882