acc跑出來是負的,預設超參數跑出來也有0.95,?
2019/06/11 下午 04:19
機器學習共學討論版
侯懿桐
觀看數:4
回答數:1
收藏數:0
day47
預設超參數跑出來也有0.95
但是用搜索的居然跑出挺糟糕的結果....
詭異
https://github.com/qwewsxz0/2nd-ML100Days/blob/master/Day_047_HW.ipynb
In [38]:
wine = datasets.load_wine()
boston = datasets.load_boston()
breast_cancer = datasets.load_breast_cancer()
In [39]:
# 切分訓練集/測試集
x_train, x_test, y_train, y_test = train_test_split(breast_cancer.data, breast_cancer.target, test_size=0.25, random_state=42)
In [40]:
# 建立模型
clf_ADA = AdaBoostClassifier()
# 訓練模型
clf_ADA.fit(x_train, y_train)
# 預測測試集
y_pred = clf_ADA.predict(x_test)
In [41]:
acc = metrics.accuracy_score(y_test, y_pred)
print("Acuuracy: ", acc)
Acuuracy: 0.951048951048951
In [42]:
# 設定要訓練的超參數組合
n_estimators = [50,100, 200, 300,1000]
learning_rate = [0.1, 0.15, 1.5 , 1 ,10]
param_grid = dict(n_estimators=n_estimators, learning_rate=learning_rate)
## 建立搜尋物件,放入模型及參數組合字典 (n_jobs=-1 會使用全部 cpu 平行運算)
grid_search = GridSearchCV(clf_ADA, param_grid, scoring="neg_mean_squared_error", n_jobs=-1, verbose=1)
# 開始搜尋最佳參數
grid_result = grid_search.fit(x_train, y_train)
# 預設會跑 3-fold cross-validadtion,總共 9 種參數組合,總共要 train 27 次模型
C:\Users\iris168\.conda\envs\tensorflow_gpu\lib\site-packages\sklearn\model_selection\_split.py:1978: FutureWarning: The default value of cv will change from 3 to 5 in version 0.22. Specify it explicitly to silence this warning.
warnings.warn(CV_WARNING, FutureWarning)
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
Fitting 3 folds for each of 25 candidates, totalling 75 fits
[Parallel(n_jobs=-1)]: Done 60 out of 75 | elapsed: 5.3s remaining: 1.3s
[Parallel(n_jobs=-1)]: Done 75 out of 75 | elapsed: 7.8s finished
C:\Users\iris168\.conda\envs\tensorflow_gpu\lib\site-packages\sklearn\model_selection\_search.py:813: DeprecationWarning: The default of the `iid` parameter will change from True to False in version 0.22 and will be removed in 0.24. This will change numeric results when test-set sizes are unequal.
DeprecationWarning)
In [43]:
# 印出最佳結果與最佳參數
print("Best Accuracy: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
Best Accuracy: -0.025822 using {'learning_rate': 1.5, 'n_estimators': 1000}
回答列表
-
2019/06/11 下午 05:23Jimmy贊同數:1不贊同數:0留言數:1
Hi 懿桐!
如果是分類問題,你的 grid_searchcv 中的 scoring 要修改喔!原本是用來評估回歸問題的。這部分修改後應該就會正常了:)
grid_search =GridSearchCV(clf_ADA, param_grid, scoring="???", n_jobs=-1, verbose=1)