Loading...

關於監督式學習GDBT&Random Forest下的模型預測值 - Cupoy

我對於Random Forest跟GDBT模型的預測值有一些問題，由於共學營的例子都是使用Cross...

關於監督式學習GDBT&Random Forest下的模型預測值

2020/05/15 下午 01:15

機器學習共學討論版

zero0827z

觀看數：18

回答數：4

收藏數：0

我對於Random Forest跟GDBT模型的預測值有一些問題，由於共學營的例子都是使用Cross-data，我Data使用的是Time-series Data時，會出現預測值會每次重複訓練後就會某幾個預測值有所不同。

我想詢問一下當遇到這樣的情況時，我如果希望他的預測值每次都相同，除了改變Data的屬性之外，還有沒有甚麼更好的方法處理這個問題呢?

我希望他的樣本數是固定的，因此我轉成Rolling的方式。

# 將資料最大最小化
MME_US_train_x = MinMaxScaler().fit_transform(US_train_x)
MME_UK_train_x = MinMaxScaler().fit_transform(UK_train_x)
MME_JP_train_x = MinMaxScaler().fit_transform(JP_train_x)

# 使用三種模型 : 梯度提升機 / 隨機森林, 參數使用 Random Search 尋找
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import GradientBoostingClassifier, RandomForestClassifier
gdbt = GradientBoostingClassifier(tol=100, subsample=0.75, n_estimators=250, max_features=20,max_depth=6, learning_rate=0.03)
rf = RandomForestClassifier(n_estimators=100,min_samples_split=2, min_samples_leaf=1,max_features='sqrt', max_depth=6, bootstrap=True)
 
US_gdbt_pred=np.zeros(12)

for i in range(12):
    gdbt.fit(MME_US_train_x[[i,i+99],], US_train_Y.iloc[[i,i+99]])
    US_gdbt_pred_value = gdbt.predict(MME_US_train_x[[i+100],])
    print(US_gdbt_pred_value)
    US_gdbt_pred[i]=US_gdbt_pred_value

# 將資料最大最小化
MME_US_train_x = MinMaxScaler().fit_transform(US_train_x)
MME_UK_train_x = MinMaxScaler().fit_transform(UK_train_x)
MME_JP_train_x = MinMaxScaler().fit_transform(JP_train_x)

# 使用三種模型 : 梯度提升機 / 隨機森林, 參數使用 Random Search 尋找
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import GradientBoostingClassifier, RandomForestClassifier
gdbt = GradientBoostingClassifier(tol=100, subsample=0.75, n_estimators=250, max_features=20,max_depth=6, learning_rate=0.03)
rf = RandomForestClassifier(n_estimators=100,min_samples_split=2, min_samples_leaf=1,max_features='sqrt', max_depth=6, bootstrap=True)
 
US_gdbt_pred=np.zeros(12)

for i in range(12):
    gdbt.fit(MME_US_train_x[[i,i+99],], US_train_Y.iloc[[i,i+99]])
    US_gdbt_pred_value = gdbt.predict(MME_US_train_x[[i+100],])
    print(US_gdbt_pred_value)
    US_gdbt_pred[i]=US_gdbt_pred_value

第一次產出的預測值

第二次產出的預測值

第三次產出的預測值

回答列表

2020/05/15 下午 01:23

徐正憲

贊同數：0

不贊同數：0

留言數：2

模型參數加random_state呢?
2020/05/16 上午 10:31

Jeffrey

贊同數：0

不贊同數：0

留言數：2

既使是樣本數固定, 它每次切分的 data content 都不一定會一樣, 所以, 預測值會有些許的變動.
2020/05/17 上午 01:53

張維元 (WeiYuan)

贊同數：0

不贊同數：0

留言數：3

「謝謝你的建議。我有先在外面處理過資料，可以確定每一次每一個迴圈裡面的Data都是相同的。」

=> 資料相同，Random Forest 也有可能不同，因為 Random Forest 的算法內本身有存在隨機的機制（可以試試看設定 random_state）。

如果這個回答對你有幫助請主動點選「有幫助」的按鈕，也可以追蹤我的GITHUB帳號。若還有問題的話，也歡迎繼續再追問或者把你理解的部分整理上來，我都會提供你 Review 和 Feedback 😃😃😃
2020/05/23 上午 01:55

張維元 (WeiYuan)

贊同數：0

不贊同數：0

留言數：0

「您好，我其實在一開始看完１００天的課程標題後一直有一個問題，在平常都常聽到Support vector machine的模型，只是在這100天裡面似乎沒有看到有關這個的內容，因為常看到有人拿SVM來跟Random Forest之類的模型比較，所以有一點好奇這個模型是怎麼運作的。」

=> Support vector machine 也是另外一種常見的模型，網路上可以找到蠻多支援的。

如果這個回答對你有幫助請主動點選「有幫助」的按鈕，也可以追蹤我的GITHUB帳號。若還有問題的話，也歡迎繼續再追問或者把你理解的部分整理上來，我都會提供你 Review 和 Feedback 😃😃😃