爬蟲之後出現亂碼
url = 'https://www.zhihu.com/explore'
headers = {
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'zh-TW,zh;q=0.9,en-US;q=0.8,en;q=0.7,zh-CN;q=0.6,ja;q=0.5',
'cache-control': 'max-age=0',
'cookie': '_zap=a5634a47-a806-4828-8ab4-2cbc03734bda; d_c0="AFBoGjc1fg6PTus2_H76YgMM3xvztVHRnCs=|1541767835"; __gads=ID=18e1196642fc5994:T=1544975081:S=ALNI_MZ2lQjiLHPlLxjVqGGH6o-EiL2luQ; z_c0="2|1:0|10:1551204934|4:z_c0|92:Mi4xM1FkMERnQUFBQUFBVUdnYU56Vi1EaVlBQUFCZ0FsVk5SdEJpWFFDUXZ6REVHVWw0QWtXblh6WGt4T183ekdxNFhn|fa542d11758c34207cdd6a1edf85de768ac02624c8cb53aa687017af49accd7b"; tst=r; q_c1=aed50b9b158344d6ac78a230c8970d83|1560948516000|1543510922000; __utmv=51854390.100--|2=registration_date=20190226=1^3=entry_date=20181130=1; _xsrf=13cd498d-6139-4c50-99a6-df5c749dd64c; tgw_l7_route=4860b599c6644634a0abcd4d10d37251; __utma=51854390.567487929.1560948518.1560950333.1562469680.3; __utmb=51854390.0.10.1562469680; __utmc=51854390; __utmz=51854390.1562469680.3.3.utmcsr=localhost:8888|utmccn=(referral)|utmcmd=referral|utmcct=/notebooks/day2-example.ipynb',
'referer': 'http://localhost:8888/notebooks/day2-example.ipynb',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36'
}
r = requests.get(url, headers=headers)
r.encoding = 'utf-8'
print(r.text[0:600])
產出結果
0�@�<Ve�����G<J�]�WvM��a��ʑ`H:6{-5@1�����aT��:#u���.�r����O�(T� �k�ڮ�cH���5(ux��'#*�s�g�|���t���v1!�~7���bR�i��Ւ��O��Oj�H�g�$,�� aZ- �f�|��vk��Sg�U�q�����?Ǣ�6��Dr�`�0��`���_Ͻ3��T����a{�[�κ���E�$h9�̩j_r���Ts�e��;�W!��t�U�b�����X�,@Q3Ƃ@��K�������f\ ��ԥD��^"xr��tN��2.��T/��!W�+�s.]^�I�t�Q�eL����u�*!�0D�1��s��%C�=�60dк9{��*7a�-����5�*G]V��Q�LR3(W�{\��:���O�[�*��*_���%ԄM�9�V�
^�C�J5ۮ��{�O&,��g|�HZ�_~��Q�r��l�Y��R�ωO*���\��[�(}�]
請問一下 以上產出結果要怎麼轉換成看得懂的文字.....
回答列表
-
2019/12/16 下午 04:33張維元 (WeiYuan)贊同數:0不贊同數:0留言數:0
嗨,把 headers 中的 accept-encoding 註解掉,這個 Headers 會得到壓縮後的資料導致編碼錯誤。
-
2019/12/17 下午 00:14張維元 (WeiYuan)贊同數:0不贊同數:0留言數:0
嗨,我根據 Headers 的問題整理了一篇文章,提供給大家做參考:https://www.cupoy.com/club/ai_tw/0000016E62FB84E4000000026375706F795F72656C656173654B5741535354434C5542/content/home