久久久久久久av_日韩在线中文_看一级毛片视频_日本精品二区_成人深夜福利视频_武道仙尊动漫在线观看

是否可以從 Scrapy spider 運行另一個蜘蛛?

2023-05-26 Python問題 html5模板網(wǎng)

Is it possible to run another spider from Scrapy spider?(是否可以從 Scrapy spider 運行另一個蜘蛛?)

本文介紹了是否可以從 Scrapy spider 運行另一個蜘蛛?的處理方法，對大家解決問題具有一定的參考價值，需要的朋友們下面隨著小編來一起學(xué)習(xí)吧！

問題描述

限時送ChatGPT賬號..

現(xiàn)在我有 2 只蜘蛛，我想做的是

For now I have 2 spiders, what I would like to do is

Spider 1 轉(zhuǎn)到 url1 并且如果出現(xiàn) url2 ，用 url2<調(diào)用蜘蛛 2/代碼>.也使用管道保存url1的內(nèi)容.
蜘蛛2去url2做點什么.



Spider 1 goes to url1 and if url2 appears, call spider 2 with url2. Also saves the content of url1 by using pipeline.
Spider 2 goes to url2 and do something.

由于兩種蜘蛛的復(fù)雜性，我想將它們分開.
Due to the complexities of both spiders I would like to have them separated.
我使用 scrapy crawl 的嘗試:
def parse(self, response):
    p = multiprocessing.Process(
        target=self.testfunc())
    p.join()
    p.start()

def testfunc(self):
    settings = get_project_settings()
    crawler = CrawlerRunner(settings)
    crawler.crawl(<spidername>, <arguments>)

它會加載設(shè)置但不會抓取:
It does load the settings but doesn't crawl:
2015-08-24 14:13:32 [scrapy] INFO: Enabled extensions: CloseSpider, LogStats, CoreStats, SpiderState
2015-08-24 14:13:32 [scrapy] INFO: Enabled downloader middlewares: DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, HttpAuthMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2015-08-24 14:13:32 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2015-08-24 14:13:32 [scrapy] INFO: Spider opened
2015-08-24 14:13:32 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)

文檔中有一個關(guān)于從腳本啟動的示例，但我想做的是在使用 scrapy crawl 命令時啟動另一個蜘蛛.
The documentations has a example about launching from script, but what I'm trying to do is launch another spider while using scrapy crawl command.
完整代碼
from scrapy.crawler import CrawlerRunner
from scrapy.utils.project import get_project_settings
from twisted.internet import reactor
from multiprocessing import Process
import scrapy
import os


def info(title):
    print(title)
    print('module name:', __name__)
    if hasattr(os, 'getppid'):  # only available on Unix
        print('parent process:', os.getppid())
    print('process id:', os.getpid())


class TestSpider1(scrapy.Spider):
    name = "test1"
    start_urls = ['http://www.google.com']

    def parse(self, response):
        info('parse')
        a = MyClass()
        a.start_work()


class MyClass(object):

    def start_work(self):
        info('start_work')
        p = Process(target=self.do_work)
        p.start()
        p.join()

    def do_work(self):

        info('do_work')
        settings = get_project_settings()
        runner = CrawlerRunner(settings)
        runner.crawl(TestSpider2)
        d = runner.join()
        d.addBoth(lambda _: reactor.stop())
        reactor.run()
        return

class TestSpider2(scrapy.Spider):

    name = "test2"
    start_urls = ['http://www.google.com']

    def parse(self, response):
        info('testspider2')
        return

我希望是這樣的:
scrapy 抓取測試1(例如，當(dāng) response.status_code 為 200 時:)
在test1中，調(diào)用scrapy crawl test2

推薦答案
我不會深入給出，因為這個問題真的很老，但我會繼續(xù)從官方 Scrappy 文檔中刪除這個片段......你非常接近！哈哈
I won't go in depth  given since this question is really old but I'll go ahead drop this snippet from the official Scrappy docs....   You are very close! lol 
import scrapy
from scrapy.crawler import CrawlerProcess

class MySpider1(scrapy.Spider):
    # Your first spider definition
    ...

class MySpider2(scrapy.Spider):
    # Your second spider definition
    ...

process = CrawlerProcess()
process.crawl(MySpider1)
process.crawl(MySpider2)
process.start() # the script will block here until all crawling jobs are finished

https://doc.scrapy.org/en/latest/topics/實踐.html
然后使用回調(diào)，你可以在你的蜘蛛之間傳遞項目做你所說的邏輯函數(shù)
And then using callbacks you can pass items between your spiders do do w.e logic functions your talking about

                        這篇關(guān)于是否可以從 Scrapy spider 運行另一個蜘蛛?的文章就介紹到這了，希望我們推薦的答案對大家有所幫助，也希望大家多多支持html5模板網(wǎng)！
                【網(wǎng)站聲明】本站部分內(nèi)容來源于互聯(lián)網(wǎng),旨在幫助大家更快的解決問題，如果有圖片或者內(nèi)容侵犯了您的權(quán)益，請聯(lián)系我們刪除處理，感謝您的支持！


                
             
                  
                  上一篇：Process.run() 和 Process.start() 之間的區(qū)別 
                  下一篇：Python 3.4 多處理遞歸 Pool.map()


          
          
            
              
                
                  相關(guān)文檔推薦
                

                
                  
                    Python 多處理模塊的 .join() 方法到底在做什么?
                  
                  What exactly is Python multiprocessing Module#39;s .join() Method Doing?(Python 多處理模塊的 .join() 方法到底在做什么?)
                
                

                
                  
                    在 Python 中將多個參數(shù)傳遞給 pool.map() 函數(shù)
                  
                  Passing multiple parameters to pool.map() function in Python(在 Python 中將多個參數(shù)傳遞給 pool.map() 函數(shù))
                
                

                
                  
                    multiprocessing.pool.MaybeEncodingError: 'TypeError("
                  
                  multiprocessing.pool.MaybeEncodingError: #39;TypeError(quot;cannot serialize #39;_io.BufferedReader#39; objectquot;,)#39;(multiprocessing.pool.MaybeEncodingError: TypeError(cannot serialize _io.BufferedReader object,)) - IT屋-程序員軟件開
                
                

                
                  
                    Python 多進程池.當(dāng)其中一個工作進程確定不再需要
                  
                  Python Multiprocess Pool. How to exit the script when one of the worker process determines no more work needs to be done?(Python 多進程池.當(dāng)其中一個工作進程確定不再需要完成工作時，如何退出腳本?) - IT屋-程序員
                
                

                
                  
                    如何將隊列引用傳遞給 pool.map_async() 管理的函數(shù)
                  
                  How do you pass a Queue reference to a function managed by pool.map_async()?(如何將隊列引用傳遞給 pool.map_async() 管理的函數(shù)?)
                
                

                
                  
                    與多處理錯誤的另一個混淆，“模塊"對象沒
                  
                  yet another confusion with multiprocessing error, #39;module#39; object has no attribute #39;f#39;(與多處理錯誤的另一個混淆，“模塊對象沒有屬性“f)


        
<i id='VmOGD'><tr id='VmOGD'><dt id='VmOGD'><q id='VmOGD'><span id='VmOGD'><b id='VmOGD'><form id='VmOGD'><ins id='VmOGD'></ins><ul id='VmOGD'></ul><sub id='VmOGD'></sub></form><legend id='VmOGD'></legend><bdo id='VmOGD'><pre id='VmOGD'><center id='VmOGD'></center></pre></bdo></b><th id='VmOGD'></th></span></q></dt></tr></i><div   class="qwawimqqmiuu"   id='VmOGD'><tfoot id='VmOGD'></tfoot><dl id='VmOGD'><fieldset id='VmOGD'></fieldset></dl></div>
<tbody id='VmOGD'></tbody>
<small id='VmOGD'></small><noframes id='VmOGD'>
<tfoot id='VmOGD'></tfoot>
<bdo id='VmOGD'></bdo><ul id='VmOGD'></ul>
<legend id='VmOGD'><style id='VmOGD'><dir id='VmOGD'><q id='VmOGD'></q></dir></style></legend>

        
        
          
          
            
               
                欄目導(dǎo)航
                前端問題解決Java問題php問題Python問題C#/.NET問題C/C++問題移動開發(fā)問題數(shù)據(jù)庫問題
                
              
            
          
          
          
          
            
              
                最新文章
                
                    • 在python中添加背景圖像...
                  

                    • 'numpy.float64' 對象不可迭...
                  

                    • ElementClickInterceptedException:消息...
                  

                    • OMP:錯誤 #15:正在初始化 libi...
                  

                    • ftp.retrbinary() 幫助 python...
                  

                    • 在 gitlab CI 期間激活 conda 環(huán)境...
                  

                    • 將十六進制轉(zhuǎn)換為浮點數(shù)...
                  

                    • OpenCV findChessboardCorners 函數(shù)在...
                  

                    • Python:“ModuleNotFoundError"，...
                  

                    • 如何使用 Python Pandas 將 JMP ...
                  

                    • ValueError:無法將字符串轉(zhuǎn)換為...
                  

                    • Ftplib ConnectionRefusedError:[Errn...
                  

              
            
          
          
          
          
            
              
                熱門文章
                
                    • 在python中添加背景圖像...
                  

                    • 'numpy.float64' 對象不可迭...
                  

                    • ElementClickInterceptedException:消息...
                  

                    • OMP:錯誤 #15:正在初始化 libi...
                  

                    • ftp.retrbinary() 幫助 python...
                  

                    • 在 gitlab CI 期間激活 conda 環(huán)境...
                  

                    • 將十六進制轉(zhuǎn)換為浮點數(shù)...
                  

                    • OpenCV findChessboardCorners 函數(shù)在...
                  

                    • Python:“ModuleNotFoundError"，...
                  

                    • 如何使用 Python Pandas 將 JMP ...
                  

                    • ValueError:無法將字符串轉(zhuǎn)換為...
                  

                    • Ftplib ConnectionRefusedError:[Errn...
                  

              
            
          
          
          
          
            
              
                熱門標(biāo)簽
                
        	旅游公司
         	
        	服裝服飾
         	
        	機械設(shè)備
         	
        	電子產(chǎn)品
         	
        	政府協(xié)會
         	
        	網(wǎng)絡(luò)營銷
         	
        	環(huán)保科技
         	
        	科技公司
         	
        	家政服務(wù)
         	
        	營銷型
         	
        	環(huán)保
         	
        	軟件開發(fā)
         	
        	傳媒公司
         	
        	金融服務(wù)
         	
        	雙語
         	
        	培訓(xùn)機構(gòu)
         	
        	零部件
         	
        	教育培訓(xùn)
         	
        	博客主題
         	
        	軸承
         	
        	新聞資訊
         	
        	視頻
         	
        	進銷存系統(tǒng)
         	
        	bootstrap
         	
        	商城模板
         	
        	商務(wù)合作
         	
        	廣告設(shè)計
         	
        	驗證碼
         	
        	門戶
         	
        	ar
         	
        	OElove
         	
        	漫畫網(wǎng)
         	
        	全景
         	
        	商城
         	
        	區(qū)塊鏈
         	
        	虛擬幣
         	
        	你畫我猜
         	
        	卡券
         	
        	動畫特效
         	
        	在線客服
         	
        	地板
         	
        	域名停放
         	
        	canvas
         	
        	html5
         	
        	svg
         	
        	博客
         	
        	攝影
         	
        	導(dǎo)航
         	
        	小說源碼
         	
        	污染治理
         	
        	蘋果cms
         	
        	微擎微贊
         	
        	微商
         	
        	訂單系統(tǒng)
         	
        	小程序
         	
        	電影源碼
         	
        	微信程序
         	
        	帝國cms
         	
        	掃碼點餐
         	
        	jquery
         	
        	angular
         	
        	視頻打賞
         	
        	thinkphp
         	
        	360
         	
        	動畫模板
         	
        	淘寶客
         	
        	音樂
         	
        	分發(fā)系統(tǒng)
         	
        	o2o
         	
        	微擎