【QA】CNN經典模型－LeNet-5? - Cupoy

本次要與各位簡單介紹一下經典的CNN手寫字體辨識模型LetNet-5

【QA】CNN經典模型－LeNet-5?

2021/10/12 下午 10:34

機器學習共學討論版

Ray

觀看數：516

回答數：1

收藏數：1

本次要與各位簡單介紹一下經典的CNN手寫字體辨識模型LetNet-5

回答列表

2021/10/12 下午 10:41

Ray

贊同數：0

不贊同數：0

留言數：0

手寫字體識別模型LeNet-5是最早的卷積神經網絡之一，由Yann LeCun團隊於1994年提出，LeNet5利用卷積、權值共享、池化等操作提取特徵以避免了大量的計算成本，最後再使用全連接神經網絡進行分類識別， LeNet5的網絡結構示意圖如下所示： ![15214284381699o337s5p98.jpeg](http://kwassistfile.cupoy.com/0000017C74ED3E42000000096375706F795F72656C65617365414E53/1632821348576/large)[參考圖片來源](https://my.oschina.net/u/876354/blog/1632862) 由上圖可以看到 LeNet-5 的網路架構共有七層(不包含輸入層)：卷積層 (Convolutions, C1)、池化層 (Subsampling, S2)、卷積層 (C3)、池化層 (S4)、全連接卷積層 (C5)、全連接層 (F6)、Gaussian 連接層 (output) 輸入層是一個 32x32 的圖片，而 Filter size皆為5x5，第一個 Filter的輸出通道為6(也可以看作是有六個卷積核)並使用 Sigmoid 作為激活函數，因此得到六個特徵圖，其大小為（32-5+1）×（32-5+1）= 28×28，由此可推斷出stride=1、padding=0，該層的參數個數為（5×5(filter_size)+1(bias)）×6= 156，連接數為（5×5+1）×6×28×28=122304；池化層的窗口為2x2，stride 為 2，使用平均池化進行採樣，特徵圖大小為（14-5+1）×（14-5+1）= 10×10，其計算過程如下圖： ![012957_l7Oh_876354.png](http://kwassistfile.cupoy.com/0000017C74ED3E42000000096375706F795F72656C65617365414E53/1632821348577/large)[參考圖片來源](https://my.oschina.net/u/876354/blog/1632862) 而第二個 Filter 的輸出通道為16，同樣使用 Sigmoid 作為激活函數，得到16個特徵圖，其大小為（14-5+1）×（14-5+1）= 10×10，由於上一層個池化層與該層為部分連接，其連接規則如下： ![013017_pIe9_876354.png](http://kwassistfile.cupoy.com/0000017C74ED3E42000000096375706F795F72656C65617365414E53/1632821348578/large)[參考圖片來源](https://my.oschina.net/u/876354/blog/1632862) 該層的參數個數為（5×5×3+1）×6 +（5×5×4+1）×9 +5×5×6+1 = 1516，連接數為1516×10×10= 151600，下一層池化層的窗口同樣為2x2，stride 為 2，使用平均池化進行採樣，特徵圖大小為5×5，而最後一層卷積層在卷積後其特徵圖大小為（5-5+1）×（5-5+1）= 1×1，剛好變為全連接層；之後全連接層的神經元數量為84個，選這個數字的原因是來自於輸出層的設計，對應於一個7×12的比特圖。 ![013047_ApKN_876354.png](http://kwassistfile.cupoy.com/0000017C74ED3E42000000096375706F795F72656C65617365414E53/1632821348579/large)[參考圖片來源](https://my.oschina.net/u/876354/blog/1632862) 而最後一層輸出層是 Gaussian連接層，採用 RBF 函數 (徑向歐式距離函數)，計算輸入向量和參數向量之間的歐式距離。因為 LeNet 應用於辨識手寫圖像，數字為0~9，所以輸出層為 10 個神經元 ------------ LetNet-5程式碼： ```python class LeNet(nn.Module): def __init__(self): super(LeNet, self).__init__() self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5, padding=2, stride=1) self.conv2 = nn.Conv2d(in_channels=6, out_channels=16, kernel_size=5) self.fc1 = nn.Linear(in_features=16*5*5, out_features=120) self.fc2 = nn.Linear(in_features=120, out_features=84) self.fc3 = nn.Linear(in_features=84, out_features=10) def forward(self, x): x = F.sigmoid(self.conv1(x)) x = F.avg_pool2d(x, kernel_size=2, stride=2) x = F.sigmoid(self.conv2(x)) x = F.avg_pool2d(x, kernel_size=2, stride=2) # x = x.view(-1, 16*5*5) x = torch.flatten(x, 1) x = F.sigmoid(self.fc1(x)) x = F.sigmoid(self.fc2(x)) x = self.fc3(x) return x ``` ----------------- 有興趣進一步了解的人可參考以下連結： * [卷積神經網絡 CNN 經典模型 — LeNet、AlexNet、VGG、NiN with Pytorch code](https://medium.com/ching-i/%E5%8D%B7%E7%A9%8D%E7%A5%9E%E7%B6%93%E7%B6%B2%E7%B5%A1-cnn-%E7%B6%93%E5%85%B8%E6%A8%A1%E5%9E%8B-lenet-alexnet-vgg-nin-with-pytorch-code-84462d6cf60c) * [剖析細節，說說深度學習的最經典模型 LeNet ](https://kknews.cc/zh-tw/tech/mqmombz.html) * [深度學習 --- 卷積神經網路CNN（LeNet-5網路詳解）](https://www.itread01.com/content/1544868842.html) * [CNN入门算法LeNet-5介绍（论文详细解读）](https://www.datalearner.com/blog/1051558664111790) * [Gradient-based learning applied to document recognition](https://ieeexplore.ieee.org/document/726791) * [大话CNN经典模型：LeNet](https://my.oschina.net/u/876354/blog/1632862)