logo
Loading...

如何廻圈跑多次scrapy 爬蟲 - Cupoy

Day029 爬蟲我在main.py設一個list含兩個PTT看板,然後用FOR廻圈去分別爬兩個看板...

pycrawler,pycrawler-d29

如何廻圈跑多次scrapy 爬蟲

2020/01/07 03:44 AM
Python網路爬蟲論壇
簡崇哲
觀看數:0
回答數:6
收藏數:6
pycrawler
pycrawler-d29

Day029 爬蟲我在main.py設一個list含兩個PTT看板,然後用FOR廻圈去分別爬兩個看板,但第一看板爬完,第二個看板會失敗.

程式碼跟錯誤訊息如下,請教各位高手這問題要如何解,感謝.


1. main.py

def main():

    target_board = ['Tech_Job', 'Stock']

    process = CrawlerProcess(get_project_settings())

    for board in target_board:

        print("board : ", board)

        process.crawl('PTTCrawler', board=board)

        process.start()


if __name__ == '__main__':

    main()


2.  Error Msg:

['myproject.pipelines.JSONPipeline']

2020-01-07 11:39:30 [scrapy.core.engine] INFO: Spider opened

2020-01-07 11:39:30 [PTTCrawler] DEBUG: Create temp file for store JSON - C:\Users\chong\Desktop\Python\Python Marathon\Day029_Scrapy_PTT\Day029_Scrapy_PTT\myproject\crawled_data\.tmp.json.swp

2020-01-07 11:39:30 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)

2020-01-07 11:39:30 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023

Traceback (most recent call last):

 File "main.py", line 33, in 

   main()

 File "main.py", line 30, in main

   process.start()

 File "C:\Users\chong\Anaconda3\lib\site-packages\scrapy\crawler.py", line 309, in start

reactor.run(installSignalHandlers=False)  # blocking call

 File "C:\Users\chong\Anaconda3\lib\site-packages\twisted\internet\base.py", line 1282, in run

   self.startRunning(installSignalHandlers=installSignalHandlers)

 File "C:\Users\chong\Anaconda3\lib\site-packages\twisted\internet\base.py", line 1262, in startRunning

   ReactorBase.startRunning(self)

 File "C:\Users\chong\Anaconda3\lib\site-packages\twisted\internet\base.py", line 765, in startRunning

   raise error.ReactorNotRestartable()

twisted.internet.error.ReactorNotRestartable