Preface
Since I have read two research papers, I will brief both of them in this article following order.
-
Gradient-Based Learning Applied to Document Recognition
Authors: Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffer -
Recent Advances in Convolutional Neural Networks : https://qiita.com/Rowing0914/items/4627b4a15b22ae53760b
Authors: Jiuxiang Gua,∗ , Zhenhua Wangb,∗ , Jason Kuenb , Lianyang Mab , Amir Shahroudyb , Bing Shuaib , Ting Liub , Xingxing Wangb , Li Wangb , Gang Wangb , Jianfei Caic , Tsuhan Chenc
Implementation of CNN in python with Numpy
Math in CNN
Brief summary of Gradient-Based Learning Applied to Document Recognition
Abstract
In this paper, they have proposed a novel approach called Convolutional Neural Networks with GTN(Graph Transformer Networks). And they have verified that the technique outperformed all existing machine learning techniques.
As for real-world problem, they have programmed the application to recognise the handwritten characters.
But, unfortunately they didn't elaborate the mathematical proof in this paper.
So I will leave that part to my another article described above.
Nomenclature
GT: Graph Transformer
GTN: Graph Transformer Network
HMM: Hidden Markov Model
HOS: Heuristic Oversegmentation
K-NN: K-nearest Neighbour
NN: Neural Network
OCR: Optical Character Recognition
PCA: Principal Component Analysis
RBF: Radial Basis Function
RS-SVM: Reduced-set Support Vector Machine
SDNN: Space Displacement Neural Network
SVM: Support Vector Machine
TDNN: Time Delay Neural Network
V-SVM: Virtual Support Vector Machine
Contents
- Introduction
- Learning from Data
- Gradient-Based Learning
- Gradient Backpropagation
- Learning in Real Handwriting Recognition System
- Globally Trainable Systems
- Convolutional Neural Network for Isolated Character Recognition
- Convolutional Networks
- LeNet-5
- Loss Function
- Results and Comparison with Other Methods
- DataBase: the Modified NIST set
- Results
- Comparison with Other Classifiers
- Discussion
- Invariance and Noise Resistance
- Multi-Module Systems and Graph Transformation Networks
- An Object-Oriented Approach
- Special Modules
- Graph Transformer Networks
- Multiple Object Recognition: Heuristic Over-Segmentation
- Segmentation Graph
- Recognition Transformer and Viterbi Transformer
- Global Training for Graph Transformer Networks
- Viterbi Training
- Discriminative Viterbi Training
- Forward Scoring, and Forward Training
- Discriminative Forward Training
- Remarks on Discriminative Training
- Multiple Object Recognition: Space Displacement Neural Network
- Interpreting the Output of an SDNN with a GTN
- Experiments with SDNN
- Global Training of SDNN
- Object Detection and Spotting with SDNN
- Graph Transformer Networks and Transducers
- Previous Work
- Standard Transduction
- Generalised Transduction
- Notes on the Graph Structures
- GTN and Hidden Markov Models
- An On-Line Handwriting Recognition System
- Preprocessing
- Network Architecture
- Network Training
- Experimental Resutls
- Check Reading System
- GTN for Check Amount Recognition
- Gradient-Based Learning
- Rejecting Low Confidence Checks
- Results
- Conclusions
1. Introduction
Historically, the basic architecture for handwriting recognition is separated into two modules as below.
And mostly we just relied on some commercial products like OCR.
But applying GTN, we could unify the entire module.
1. Learning from Data
Basic Notations
Input: Z
Output: D
Dataset: {$(Z^1, D^1), (Z^2, D^2), ... , (Z^P, D^P)$}
Parameter: W
Function: F
Prediction: $Y^P = F(Z^P, W)$
Loss Function: $E^P = D(D^P, F(Z^P, W))$
2. Gradient-Based Learning
The general procedure to minimise the error for loss function is modifying the parameter using gradient-based learning.
$W_k = W_{k-1} - \epsilon \frac{\partial E(W)}{\partial W}$
3. Gradient Backpropagation
To propagate the error correctly, we use backprop.
4. Learning in Real Handwriting Recognition System
One of the hardest problem in handwritten recognition arise in how to distinguish the characters respectively.
5. Globally Trainable Systems
Graph Transformer takes one or more graphs as input, and produces a graph as output. Such networks are called GTNs(Graph Transformer Network), and requires gradient-based learning to efficiently learn the pattern of characters in the images.
2. Convolutional Neural Network for Isolated Character Recognition
As for training of feature extraction, certainly we have achieved good performance with neural networks previously though, there are some problems.
- Image size and quality is growing so fast, e.g. often it requires several hundreds pixels. So this ends up with burring us in tons of parameters.
- Deficiency of fully-connected design is that the topology of inputs is entirely ignored.
1. Convolutional Networks
Three features of CNN
-
Local Receptive Fields
-
Shared Weights
-
Sub-Sampling
-
Neurons can extract elementary visual features like, scale oriented edges, end-point and so on. Hence, once we have detected the character in that image, we could successfully spot that character, even if that one rotate or move to different position.
This is mostly equivalent of the feature of the robustness of convolutional layers. -
While local receptive fields are filtering the feature map of the image, they share the weight on that image's feature map.
-
locally average the feature map and do sub-sampling(in other word, max/average pooling).