關於第四天的作業 想問一下我收件人那部分的r_name為什麼印出來會是有三個元素的列表?
2021/02/02 下午 03:27
NLP 經典機器學習馬拉松
Tommy
觀看數:6
回答數:1
收藏數:0
emails_list = [] #創建空list來儲存所有email資訊
for mail in emails[:20]: #只取前20筆資料 (處理速度比較快)
###取得寄件者姓名與地址###
emails_dict = dict() #創建空字典儲存資訊
#Step1: 取得寄件者資訊 (hint: From:)
##
sender = re.findall(r"From\: .*", mail)
#Step2: 取得姓名與地址 (hint: 要注意有時會有沒取到配對的情況)
##
if len(sender) == 1:
s_name = re.findall(r"(?<=From\: )\"?([^\"]*)\"?(?= \<)", sender[0])
s_add = re.findall(r"\([^A-Z\<\: ]*\@[^A-Z\>]*)\>?", sender[0])
#Step3: 將取得的姓名與地址存入字典中
##
if len(s_name) == 1 :
emails_dict["sender_name"] = s_name[0]
else:
emails_dict["sender_name"] = None
if len(s_add) == 1 :
emails_dict["sender_add"] = s_add[0]
else:
emails_dict["sender_add"] = None
###取得收件者姓名與地址###
#Step1: 取得收件者資訊 (hint: To:)
##
recipient = re.findall(r"\nTo\: .*", mail)
print(recipient)
#Step2: 取得姓名與地址 (hint: 要注意有時會有沒取到配對的情況)
##
if len(recipient) == 1:
r_name = re.findall(r".*", recipient[0])
print(r_name)
r_add = re.findall(r"\w+\-*\w*\@\w+\.*\w*\.*\w*", recipient[0])
#Step3: 將取得的姓名與地址存入字典中
##
if len(r_name) == 1 :
emails_dict["recipient_name"] = r_name[0]
else:
emails_dict["recipient_name"] = None
if len(r_add) == 1:
emails_dict["recipient_add"] = r_add[0]
else:
emails_dict["recipient_add"] = None
###取得信件日期###
#Step1: 取得日期資訊 (hint: To:)
##
date = re.findall(r"Date\: .*", mail)
#Step2: 取得詳細日期(只需取得DD MMM YYYY)
##
if len(date) == 1:
r_date = re.findall(r"\d{1,2} \w{3} \d{4}", date[0])
#Step3: 將取得的日期資訊存入字典中
##
if len(r_date) == 1:
emails_dict["recieve_date"] = r_date[0]
else:
emails_dict["recieve_date"] = None
###取得信件主旨###
#Step1: 取得主旨資訊 (hint: Subject:)
##
subject = re.findall(r"Subject\: .*", mail)
#Step2: 移除不必要文字 (hint: Subject: )
##
if len(subject) == 1:
sub = re.findall(r"(?<=Subject\: ).*", subject[0])
else:
sub = None
#Step3: 將取得的主旨存入字典中
##
if len(sub) == 1:
emails_dict["subject"] = sub[0]
else:
emails_dict["subject"] = None
###取得信件內文###
#這裡我們使用email package來取出email內文 (可以不需深究,本章節重點在正規表達式)
try:
full_email = email.message_from_string(mail)
body = full_email.get_payload()
emails_dict["email_body"] = body
except:
emails_dict["email_body"] = None
###將字典加入list###
##
a = emails_list.append(emails_dict)
print(recipient)
print(r_name)的結果如下圖
![1612250780093.jpg](http://kwassistfile.cupoy.com/00000177619E8C4B000001256375706F795F72656C656173655155455354/1611820869682/large)
為什麼r_name會迸出頭尾兩個空字串?