logo
Loading...

使用 binary_crossentropy與categorical_crossentropy的loss function區別 - Cupoy

在這次作業 compile 過程中用這兩個 loss 時,發現 binary_crossentrop...

keras,loss,binary_crossentropy,categorical_crossentropy,ml100-2-d71,ml100-2

使用 binary_crossentropy與categorical_crossentropy的loss function區別

2019/07/10 02:30 AM
機器學習新手論壇
JS
觀看數:0
回答數:3
收藏數:0
keras
loss
binary_crossentropy
categorical_crossentropy
ml100-2-d71
ml100-2

在這次作業 compile 過程中用這兩個 loss 時,發現 binary_crossentropy 比 categorical_crossentropy 的 accuracy 好很多:


- 使用 categorical_crossentropy


- 使用 binary_crossentropy



查看這兩種 loss 的區別:


1. 以輸入格式來看似乎都是轉成OHE ,並無不同


###  Keras binary_crossentropy vs categorical_crossentropy performance?  https://stackoverflow.com/questions/42081257/keras-binary-crossentropy-vs-categorical-crossentropy-performance   


I'm trying to train a CNN to categorize text by topic. When I use binary_crossentropy I get ~80% acc, with categorical_crossentrop I get ~50% acc.  I don't understand why this is. It's a multiclass problem, does that mean I have to use categorical and the binary results are meaningless?   


> If it is a multiclass problem, you have to use categorical_crossentropy. Also labels need to converted into the categorical format.   


> ###  See [to_categorical](https://keras.io/utils/)  to do this. Also see definitions of categorical and binary crossentropies [here](http://deeplearning.net/software/theano/library/tensor/nnet/nnet.html#theano.tensor.nnet.nnet.binary_crossentropy

         

#  Consider an array of 5 labels out of a set of 3 classes {0, 1, 2}:         

> labels         

array([0, 2, 1, 2, 0])         

# `to_categorical` converts this into a matrix with as many         

#  columns as there are classes. The number of rows         

#  stays the same.         

> to_categorical(labels)         

array([[ 1.,  0.,  0.],                

            [ 0.,  0.,  1.],                

            [ 0.,  1.,  0.],               

            [ 0.,  0.,  1.],                

            [ 1.,  0.,  0.]], dtype=float32)  


2. 由這兩篇回應得知使用 activation function 不同


(1)  https://www.cupoy.com/qa/kwassist/ai_tw/0000016A0CE5806C000000306375706F795F72656C656173655155455354


binary cross-entropycategorical cross-entropy主要的差別是他們採用的輸出層採用的激活函數不同,前者是sigmoid後者是softmax

binary cross-entropy 跟 sigmoid 也可以用在多分類的問題下,其概念就是把多分類當成很多個二元問題來處理。其最主要的差別是:用softmax解多分類問題,會預設一個資料只會屬於一種類別,所以其計算上會比較嚴謹,但如果使用sigmoid的話就不會有這樣的假設,計算上比較不嚴謹。



(2) https://stackoverflow.com/questions/42081257/keras-binary-crossentropy-vs-categorical-crossentropy-performance  


I came across an "inverted" issue — I was getting good results with categorical_crossentropy (with 2 classes) and poor with binary_crossentropy. It seems that problem was with wrong activation function. The correct settings were:  

- for binary_crossentropy: sigmoid activation, scalar target 

- for categorical_crossentropy: softmax activation, one-hot encoded target


而 MSE 執行結果如下


-------------- 


請問:

1. 由上述回答感覺 categorical_crossentropy 的計算會更嚴謹,但在 cifar10 這樣的資料集(應為多元分類),為什麼使用 binary_crossentropy 比 categorical_crossentropy 的結果好這麼多?和 activation function、資料集的特性或是什麼因素有關呢?


2. 由MSE執行結果可以看到,validation loss值 已經來到0.06左右,比其他兩者小很多,但accuracy卻還是很糟,

(1) 一般來說 loss值小,不是代表模型好嗎?

(2) 為什麼這個資料集應是分類問題,但使用MSE(回歸的評估方式)會得到這麼小的loss值呢?


謝謝!