The perfect learning exists. We mean a learning model that can be generalized, and moreover, that can always fit perfectly the test data, as well as the training data. We have performed in this thesis many experiments that validate this concept in many ways. The tools are given through the chapters that contain our developments. The classical Multilayer Feedforward model has been re-considered and a novel $N_k$-architecture is proposed to fit any multivariate regression task. This model can easily be augmented to thousands of possible layers without loss of predictive power, and has the potential to overcome our difficulties simultaneously in building a model that has a good fit on the test data, and don't overfit. His hyper-parameters, the learning rate, the batch size, the number of training times (epochs), the size of each layer, the number of hidden layers, all can be chosen experimentally with cross-validation methods. There is a great advantage to build a more powerful model using mixture models properties. They can self-classify many high dimensional data in a few numbers of mixture components. This is also the case of the Shallow Gibbs Network model that we built as a Random Gibbs Network Forest to reach the performance of the Multilayer feedforward Neural Network in a few numbers of parameters, and fewer backpropagation iterations. To make it happens, we propose a novel optimization framework for our Bayesian Shallow Network, called the {Double Backpropagation Scheme} (DBS) that can also fit perfectly the data with appropriate learning rate, and which is convergent and universally applicable to any Bayesian neural network problem. The contribution of this model is broad. First, it integrates all the advantages of the Potts Model, which is a very rich random partitions model, that we have also modified to propose its Complete Shrinkage version using agglomerative clustering techniques. The model takes also an advantage of Gibbs Fields for its weights precision matrix structure, mainly through Markov Random Fields, and even has five (5) variants structures at the end: the Full-Gibbs, the Sparse-Gibbs, the Between layer Sparse Gibbs which is the B-Sparse Gibbs in a short, the Compound Symmetry Gibbs (CS-Gibbs in short), and the Sparse Compound Symmetry Gibbs (Sparse-CS-Gibbs) model. The Full-Gibbs is mainly to remind fully-connected models, and the other structures are useful to show how the model can be reduced in terms of complexity with sparsity and parsimony. All those models have been experimented, and the results arouse interest in those structures, in a sense that different structures help to reach different results in terms of Mean Squared Error (MSE) and Relative Root Mean Squared Error (RRMSE). For the Shallow Gibbs Network model, we have found the perfect learning framework : it is the $(l_1, \boldsymbol{\zeta}, \epsilon_{dbs})-\textbf{DBS}$ configuration, which is a combination of the \emph{Universal Approximation Theorem}, and the DBS optimization, coupled with the (\emph{dist})-Nearest Neighbor-(h)-Taylor Series-Perfect Multivariate Interpolation (\emph{dist}-NN-(h)-TS-PMI) model [which in turn is a combination of the research of the Nearest Neighborhood for a good Train-Test association, the Taylor Approximation Theorem, and finally the Multivariate Interpolation Method]. It indicates that, with an appropriate number $l_1$ of neurons on the hidden layer, an optimal number $\zeta$ of DBS updates, an optimal DBS learnnig rate $\epsilon_{dbs}$, an optimal distance \emph{dist}$_{opt}$ in the research of the nearest neighbor in the training dataset for each test data $x_i^{\mbox{test}}$, an optimal order $h_{opt}$ of the Taylor approximation for the Perfect Multivariate Interpolation (\emph{dist}-NN-(h)-TS-PMI) model once the {\bfseries DBS} has overfitted the training dataset, the train and the test error converge to zero (0). As the Potts Models and many random Partitions are based on a similarity measure, we open the door to find \emph{sufficient} invariants descriptors in any recognition problem for complex objects such as image; using \emph{metric} learning and invariance descriptor tools, to always reach 100\% accuracy. This is also possible with invariant networks that are also universal approximators. Our work closes the gap between the theory and the practice in artificial intelligence, in a sense that it confirms that it is possible to learn with very small error allowed.
More..