問題描述
我一直在嘗試解決從 ftp/ftps 下載文件時出現的問題.文件下載成功,但文件下載完成后不執行任何操作.沒有發生可以提供有關該問題的更多信息的錯誤.我嘗試在 stackoverflow 上搜索這個,發現這個 鏈接 談到了類似的問題陳述,看起來我面臨類似的問題,但我不確定.在解決問題時需要更多幫助.
I have been trying to troubleshoot an issue where in when we are downloading a file from ftp/ftps. File gets downloaded successfully but no operation is performed post file download completion. No error has occurred which could give more information about the issue. I tried searching for this on stackoverflow and found this link which talks about similar problem statement and looks like I am facing similar issue, though I am not sure. Need little more help in resolving the issue.
我嘗試將 FTP 連接超時設置為 60 分鐘,但幫助較少.在此之前,我使用的是 ftplib 的 retrbinary(),但同樣的問題發生在那里.我嘗試傳遞不同的塊大小和窗口大小,但同樣的問題是可重現的.
I tried setting the FTP connection timeout to 60mins but of less help. Prior to this I was using retrbinary() of the ftplib but same issue occurs there. I tried passing different blocksize and windowsize but with that also issue was reproducible.
我正在嘗試從 AWS EMR 集群下載大小約為 3GB 的文件.示例代碼如下.
I am trying to download the file of size ~3GB from AWS EMR cluster. Sample code is written below.
def download_ftp(self, ip, port, user_name, password, file_name, target_path):
try:
os.chdir(target_path)
ftp = FTP(host=ip)
ftp.connect(port=int(port), timeout=3000)
ftp.login(user=user_name, passwd=password)
if ftp.nlst(file_name) != []:
dir = os.path.split(file_name)
ftp.cwd(dir[0])
for filename in ftp.nlst(file_name):
sock = ftp.transfercmd('RETR ' + filename)
def background():
fhandle = open(filename, 'wb')
while True:
block = sock.recv(1024 * 1024)
if not block:
break
fhandle.write(block)
sock.close()
t = threading.Thread(target=background)
t.start()
while t.is_alive():
t.join(60)
ftp.voidcmd('NOOP')
logger.info("File " + filename + " fetched successfully")
return True
else:
logger.error("File " + file_name + " is not present in FTP")
except Exception, e:
logger.error(e)
raise
上述鏈接中建議的另一個選項是在下載小塊文件后關閉連接,然后重新啟動連接.有人可以建議如何實現這一點,不確定如何在關閉連接之前從上次停止文件下載的同一點恢復下載.這種方法是否可以完全證明下載整個文件.
Another option suggested in the above mentioned link is to close the connection post downloading small chunk of the file and then restart the connection. Can someone suggest how can this be achieved, not sure how to resume the download from the same point where the file download was stopped last time before closing the connection. Will this method be full proof of downloading the entire file.
我對 FTP 服務器級別的超時設置了解不多,因此不知道需要更改什么以及如何更改.我基本上想寫一個通用的 FTP 下載器,它可以幫助從 FTP/FTPS 下載文件.
I don't know much about FTP server level timeout settings so didn't know what and how it needs to be altered. I basically want to write a generic FTP down-loader which can help in downloading the files from FTP/FTPS.
當我使用 ftplib 的 retrbinary() 方法并將調試級別設置為 2 時.
When I use retrbinary() method of ftplib and set debug level to 2.
ftp.set_debuglevel(2)
ftp.retrbinary('RETR ' + filename, fhandle.write)
正在打印以下日志.
cmd 'TYPE I'put 'TYPE I 'get '200 類型設置為 I. 'resp '200 類型設置為 I.'cmd 'PASV'put 'PASV 'get '227 進入被動模式 (64,27,160,28,133,251). 'resp '227 進入被動模式(64,27,160,28,133,251).cmd 'RETR FFFT_BRA_PM_R_201711.txt'put 'RETR FFFT_BRA_PM_R_201711.txt 'get '150 打開 FFFT_BRA_PM_R_201711.txt 的 BINARY 模式數據連接. 'resp '150 打開 FFFT_BRA_PM_R_201711.txt 的 BINARY 模式數據連接.'
cmd 'TYPE I' put 'TYPE I ' get '200 Type set to I. ' resp '200 Type set to I.' cmd 'PASV' put 'PASV ' get '227 Entering Passive Mode (64,27,160,28,133,251). ' resp '227 Entering Passive Mode (64,27,160,28,133,251).' cmd 'RETR FFFT_BRA_PM_R_201711.txt' put 'RETR FFFT_BRA_PM_R_201711.txt ' get '150 Opening BINARY mode data connection for FFFT_BRA_PM_R_201711.txt. ' resp '150 Opening BINARY mode data connection for FFFT_BRA_PM_R_201711.txt.'
推薦答案
在做任何事情之前,請注意您的連接存在嚴重問題,診斷并修復它比解決它要好得多.但有時,您只需要處理損壞的服務器,甚至發送保活也無濟于事.那么,你能做什么呢?
Before doing anything, note that there is something very wrong with your connection, and diagnosing that and getting it fixed is far better than working around it. But sometimes, you just have to deal with a broken server, and even sending keepalives doesn't help. So, what can you do?
訣竅是一次下載一個塊,然后中止下載,或者,如果服務器無法處理中止,則關閉并重新打開連接.
The trick is to download a chunk at a time, then abort the download—or, if the server can't handle aborting, close and reopen the connection.
請注意,我正在使用 ftp://speedtest.tele2.net/5MB 測試以下所有內容.zip,希望這不會導致一百萬人開始攻擊他們的服務器.當然,您需要使用實際的服務器對其進行測試.
Note that I'm testing everything below with ftp://speedtest.tele2.net/5MB.zip, which?hopefully this doesn't cause a million people to start hammering their servers. Of course you'll want to test it with your actual server.
整個解決方案當然依賴于能夠恢復傳輸的服務器,而并非所有服務器都能做到這一點——尤其是當您處理嚴重損壞的東西時.所以我們需要對此進行測試.請注意,此測試將非常緩慢,并且在服務器上非常繁重,因此不要使用 3GB 文件進行測試;找到更小的東西.此外,如果您可以在其中放置可讀的內容,這將有助于調試,因為您可能會在十六進制編輯器中比較文件時遇到困難.
The entire solution of course relies on the server being able to resume transfers, which not all servers can do—especially when you're dealing with something badly broken. So we'll need to test for that. Note that this test will be very slow, and very heavy on the server, so do not testing with your 3GB file; find something much smaller. Also, if you can put something readable there, it will help for debugging, because you may be stuck comparing files in a hex editor.
def downit():
with open('5MB.zip', 'wb') as f:
while True:
ftp = FTP(host='speedtest.tele2.net', user='anonymous', passwd='test@example.com')
pos = f.tell()
print(pos)
ftp.sendcmd('TYPE I')
sock = ftp.transfercmd('RETR 5MB.zip', rest=pos)
buf = sock.recv(1024 * 1024)
if not buf:
return
f.write(buf)
您可能不會一次獲得 1MB,而是 8KB 以下.假設您看到的是 1448,然后是 2896、4344 等.
You will probably not get 1MB at a time, but instead something under 8KB. Let's assume you're seeing 1448, then 2896, 4344, etc.
- 如果您從
REST
中獲得異常,則服務器不會處理恢復 - 放棄,您將被淹沒. - 如果文件超出實際文件大小,請點擊 ^C,然后在十六進制編輯器中檢查.
- 如果您一遍又一遍地看到相同的 1448 字節或其他任何內容(您看到它打印出來的數量),那您就完蛋了.
- 如果您有正確的數據,但在每個 1448 字節的塊之間有額外的字節,那實際上是可以修復的.如果您遇到這個問題并且無法弄清楚如何使用
f.seek
來解決它,我可以解釋——但您可能不會遇到它.
- If you get an exception from the
REST
, the server does not handle resuming—give up, you're hosed. - If the file goes on past the actual file size, hit ^C, and check it in a hex editor.
- If you see the same 1448 bytes or whatever (the amount you saw it printing out) over and over again, again, you're hosed.
- If you have the right data, but with extra bytes between each chunk of 1448 bytes, that's actually fixable. If you run into this and can't figure out how to fix it by using
f.seek
, I can explain—but you probably won't run into it.
我們可以做的一件事是嘗試中止下載并且不重新連接.
One thing we can do is try to abort the download and not reconnect.
def downit(): with open('5MB.zip', 'wb') as f: ftp = FTP(host='speedtest.tele2.net', user='anonymous', passwd='test@example.com') while True: pos = f.tell() print(pos) ftp.sendcmd('TYPE I') sock = ftp.transfercmd('RETR 5MB.zip', rest=pos) buf = sock.recv(1024 * 1024) if not buf: return f.write(buf) sock.close() ftp.abort()
您將要嘗試多種變體:
- 沒有
sock.close
. - 沒有
ftp.abort
. - 在
ftp.abort
之后使用sock.close
. - 在
sock.close
之后使用ftp.abort
. - 上述所有四個重復,
TYPE I
移到循環之前而不是每次.
- No
sock.close
. - No
ftp.abort
. - With
sock.close
afterftp.abort
. - With
ftp.abort
aftersock.close
. - All four of the above repeated with
TYPE I
moved to before the loop instead of each time.
有些會引發異常.其他人只會看起來永遠掛起.如果這對所有 8 個都是真的,我們需要放棄中止.但如果其中任何一個有效,那就太好了!
Some will raise exceptions. Others will just appear to hang forever. If that's true for all 8 of them, we need to give up on aborting. But if any of them works, great!
另一種加快速度的方法是在中止或重新連接之前一次下載 1MB(或更多).只需替換此代碼:
The other way to speed things up is to download 1MB (or more) at a time before aborting or reconnecting. Just replace this code:
buf = sock.recv(1024 * 1024) if buf: f.write(buf)
用這個:
chunklen = 1024 * 1024 while chunklen: print(' ', f.tell()) buf = sock.recv(chunklen) if not buf: break f.write(buf) chunklen -= len(buf)
現在,您不再為每次傳輸讀取 1442 或 8192 字節,而是每次傳輸最多讀取 1MB.試著把它推得更遠.
Now, instead of reading 1442 or 8192 bytes for each transfer, you're reading up to 1MB for each transfer. Try pushing it farther.
例如,如果您的下載在 10MB 時失敗,而您問題中的 keepalive 代碼將大小增加到 512MB,但對于 3GB 來說還是不夠,您可以將兩者結合起來.使用 keepalive 一次讀取 512MB,然后中止或重新連接并讀取下一個 512MB,直到完成.
If, say, your downloads were failing at 10MB, and the keepalive code in your question got things up to 512MB, but it just wasn't enough for 3GB—you can combine the two. Use keepalives to read 512MB at a time, then abort or reconnect and read the next 512MB, until you're done.
這篇關于Python:文件下載成功后,使用 ftplib 下載文件永遠掛起的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!
【網站聲明】本站部分內容來源于互聯網,旨在幫助大家更快的解決問題,如果有圖片或者內容侵犯了您的權益,請聯系我們刪除處理,感謝您的支持!