logo
Loading...

均值編碼特徵欄位篩選 - Cupoy

請問於下列程式碼中的line8 若只移除'Survived' 欄位,則預測的分數會是: 1.0; 這...

ml100-2,ml100-2-d23

均值編碼特徵欄位篩選

2019/05/13 03:10 AM
機器學習新手論壇
Angus Tu
觀看數:0
回答數:1
收藏數:0
ml100-2
ml100-2-d23

請問於下列程式碼中的line8 若只移除'Survived' 欄位,則預測的分數會是: 1.0; 這好像不太正確。

若依照範例解答中,須多移除 'Name_mean', 'Ticket_mean'兩欄位,則預測分數會是: 0.835; 不太懂為何要多刪除原本的特徵欄位?

可否幫忙說明,謝謝。


Day23 -作業二:

# 均值編碼 + 邏輯斯迴歸

data = pd.concat([df[:train_num], train_Y], axis=1)

for c in df.columns:

    mean_df = data.groupby([c])['Survived'].mean().reset_index()

    mean_df.columns = [c, f'{c}_mean']

    data = pd.merge(data, mean_df, on=c, how='left')

    data = data.drop([c] , axis=1)

data = data.drop(['Survived', 'Name_mean', 'Ticket_mean'] , axis=1)

estimator = LogisticRegression()

start = time.time()

print(f'shape : {train_X.shape}')

print(f'score : {cross_val_score(estimator, data, train_Y, cv=5).mean()}')

print(f'time : {time.time() - start} sec')