想請問我爬取酷航為何出現下文中的錯誤訊息
原始碼:
import requests
payload = {
'revAvailabilitySearch.SearchInfo.AdultCount'':'' 1',
'revAvailabilitySearch.SearchInfo.ChildrenCount'':'' 0',
'revAvailabilitySearch.SearchInfo.InfantCount'':'' 0',
'revAvailabilitySearch.SearchInfo.Direction'':'' Return',
'revAvailabilitySearch.SearchInfo.PromoCode'':''',
'revAvailabilitySearch.SearchInfo.SalesCode'':''',
'revAvailabilitySearch.SearchInfo.SearchStations[0].DepartureStationCode'':'' TPE',
'revAvailabilitySearch.SearchInfo.SearchStations[0].ArrivalStationCode'':'' HKG',
'revAvailabilitySearch.SearchInfo.SearchStations[0].DepartureDate'':'' 09/25/2020',
'revAvailabilitySearch.SearchInfo.SearchStations[1].DepartureStationCode'':'' HKG',
'revAvailabilitySearch.SearchInfo.SearchStations[1].ArrivalStationCode'':'' TPE',
'revAvailabilitySearch.SearchInfo.SearchStations[1].DepartureDate'':'' 10/01/2020',
'revAvailabilitySearch.DeepLink.OrganisationCode'':''',
'revAvailabilitySearch.DeepLink.Locale'':''',
'revAvailabilitySearch.SearchInfo.OrganisationToken'':''',
'revAvailabilitySearch.DeepLink.OrganisationToken'':''',
'revAvailabilitySearch.SearchInfo.MultiCurrencyCode'':'' ',
}
payload2 = {
'Accept':' text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Accept-Encoding':' gzip, deflate, br',
'Accept-Language':' zh-TW,zh;q=0.9,en-US;q=0.8,en;q=0.7',
'Cache-Control':' max-age=0',
'Connection':' keep-alive',
'Cookie':' _gcl_au=1.1.2145652438.1599192149; country=TW; DG_ZID=372F7D22-C7E0-341A-884D-98BB19443833; DG_ZUID=E027F75D-66B0-3134-ABF0-E969F982F01C; DG_HID=E162BC9B-DF7B-30CB-B9B0-BD601285E1DD; DG_SID=101.137.183.63:6geoJhklOMZ1YL2amaYpAByUJsEJdfmjkqYU66RKxhI; jumpseat_uid=k52oi8k6cuciq0--GKxKEK; _gid=GA1.2.2018774162.1599192161; _fbp=fb.1.1599192161849.1033470205; __qca=P0-248672092-1599192161613; DG_IID=5053A727-4AE4-380F-9F6F-AD3A6F244562; DG_UID=2A1422A2-1F33-3093-A7CB-3A45F5A7AD52; cookieconsent_status=dismiss; _ga=GA1.3.2118670273.1599192154; _gid=GA1.3.2018774162.1599192161; curr=SGD; ins-storage-version=11; AMP_TOKEN=%24NOT_FOUND; ins-storage-version=11; _dc_gtm_UA-26211105-1=1; _ga=GA1.1.2118670273.1599192154; _ga_GFV545L5B3=GS1.1.1599224384.3.1.1599224412.0; QSI_HistorySession=https%3A%2F%2Fwww.flyscoot.com%2Fzhtw~1599224414164; acw_tc=0bc19b0d15992244300764179e51177508c552f847ae50446b89fcef043aa4; ASP.NET_SessionId=0ayemwnro1vhfwomqp1f2nrm; dtCookie=v_4_srv_1_sn_B97DFF178BE60EAC91C7A3056B913907_perc_100000_ol_0_mul_1; dotrez=3708871690.20480.0000',
'Host':' makeabooking.flyscoot.com',
'Referer':' https://www.flyscoot.com/zhtw',
'Sec-Fetch-Dest':' document',
'Sec-Fetch-Mode':' navigate',
'Sec-Fetch-Site':' same-site',
'Sec-Fetch-User':' ?1',
'Upgrade-Insecure-Requests':' 1',
'User-Agent':' Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36',
}
rs = requests.session()
res = rs.post('https://makeabooking.flyscoot.com/Book/?culture=zh-tw', data= payload)
res2 = rs.get('https://makeabooking.flyscoot.com/Book/Flight', data= payload2)
print(res2.text)
出現以下訊息:
<div>
<h2 style="margin-top: 0;">Pardon Our Interruption<br />十分抱歉中断了您的访问</h2>
<p>As you were browsing, something about your browser
made us think you were a bot. There are a few reasons why this might happen:<br />
当您浏览网页时,您的浏览器某些信息让我们认为您是一个机器人程序,有以下几种可能的原因导致这种情况发生:
</p>
<ul>
<li>You're a power user moving through this website with super-human speed</li>
<li>You've disabled JavaScript in your web browser</li>
<li>A third-party browser plugin, such as Ghostery or NoScript, is preventing JavaScript from running.
Additional information is available in this <a href="http://ds.tl/help-third-party-plugins"
title="Third party browser plugins that block javascript"
target="_blank">support article.</a></li>
<li>您正以超乎寻常的速度浏览该网站</li>
<li>您在您的浏览器中禁用了JavaScript</li>
<li>第三方浏览器插件(如Ghostery或NoScript)正阻止JavaScript运行。此<a href="http://ds.tl/help-third-party-plugins"
title="Third party browser plugins that block javascript"
target="_blank">支持文档</a>中提供更多信息。</li>
</ul>
<p>After completing the CAPTCHA below, you will immediately regain access to the site.<br />
完成以下验证码查验之后,您将可以马上重新访问网站。</p>
</div>
請問要怎麼改code才能順利爬取日期和價格?
回答列表
-
2020/09/05 上午 00:12張維元 (WeiYuan)贊同數:0不贊同數:0留言數:0
嗨,因為被發現是爬蟲了 >"<
建議可以再請求的時候加上 Header 試試看。
如果這個回答對你有幫助請主動點選「有幫助」或「最佳解答」的按鈕,也可以追蹤我的GITHUB 帳號。若還有問題的話,也歡迎再開一個新的問題繼續發問,或者把你理解的部分整理上來,我都會提供你 Review 和 Feedback 😃😃😃另外我目前有舉辦一個課程:【資料科學家的 12 堂心法課】,歡迎一起來玩玩!(By the way,我並不是主辦單位的人員,所以如果覺得我回答得不好,不要找他們抱怨/客訴XD)