Random Search問題

2020/05/17 下午 10:41

機器學習共學討論版

陳志堅

觀看數：47

回答數：3

收藏數：0

隨機搜尋

各位教練晚安, 有些Random Search疑問想請教:

1. 在D47 的講義中,有提到:

隨機搜尋 (Random Search)：指定超參參數的"範圍"，用均勻分布進行參數抽樣，用抽到的參數進行訓練，再根據驗證集的結果選擇最佳參參數

2. 而在D50的範例中,有提到"參數使用 Random Search 尋找",但乍看並無指定超參參數的範圍,請問,從哪一段code可得知這是Random Search?

3. D47 的範例是用grid search : GridSearchCV 函式

為何 D50 的 random search 不是用 : RandomizedSearchCV 函式, 而是用以下寫法.想請問哪裡有random search 一般寫法的範本.

ps:

D50 範例:

# 使用三種模型 : 線性迴歸 / 梯度提升機 / 隨機森林, 參數使用 Random Search 尋找

from sklearn.linear_model import LinearRegression

from sklearn.ensemble import GradientBoostingRegressor, RandomForestRegressor

linear = LinearRegression(normalize=False, fit_intercept=True, copy_X=True)

gdbt = GradientBoostingRegressor(tol=0.1, subsample=0.37, n_estimators=200, max_features=20,

max_depth=6, learning_rate=0.03)

rf = RandomForestRegressor(n_estimators=300, min_samples_split=9, min_samples_leaf=10,

max_features='sqrt', max_depth=8, bootstrap=False)

回答列表

2020/05/21 下午 05:48

Jeffrey

贊同數：0

不贊同數：0

留言數：0

這邊提供一個參考程式碼, 使用 k-fold CV:

result = []

# Number of iterations

N_search = 300

# Random seed initialization

np.random.seed(1)

for i in range(N_search):

    # Generate a random number of features

    N_columns = list(np.random.choice(range(data.shape[1]),1)+1)



    # Given the number of features, generate features without replacement

    columns = list(np.random.choice(range(data.shape[1]), N_columns, replace=False))



    # Perform k-fold cross validation

    scores = cross_val_score(lr,data[:,columns], target, cv=5, scoring="accuracy")



    # Store the result

    result.append({'columns':columns,'performance':np.mean(scores)})

# Sort the result array in descending order for performance measure

result.sort(key=lambda x : -x['performance'])
2020/05/21 下午 06:01

Jeffrey

贊同數：0

不贊同數：0

留言數：0

建議, 可以參考scikit-learn 的官網如下列連結:

https://scikit-learn.org/stable/auto_examples/index.html?highlight=random%20search

這邊提供的example 可以說明D50 的 random search的應用
2020/05/23 上午 01:51

張維元 (WeiYuan)

贊同數：0

不贊同數：0

留言數：0

嗨，以下簡單回覆你的問題：

2. 而在D50的範例中,有提到"參數使用 Random Search 尋找",但乍看並無指定超參參數的範圍,請問,從哪一段code可得知這是Random Search?

=> 我覺得這句話的意思是可能是「tol=0.1, subsample=0.37, n_estimators=200, max_features=20, max_depth=6, learning_rate=0.03」這裡面的數字是用 GridSearchCV 找到的（可能是之前的結果），但這一天的 Code 上沒有反映出來。

如果這個回答對你有幫助請主動點選「有幫助」的按鈕，也可以追蹤我的GITHUB帳號。若還有問題的話，也歡迎繼續再追問或者把你理解的部分整理上來，我都會提供你 Review 和 Feedback 😃😃😃