文档详情

Hinton大师解析神经网络(neural_network)、信念网络(belief_net)、玻尔兹曼机(RBM)概要1.ppt

发布:2017-06-26约2.75万字共70页下载文档
文本预览下载声明
* * * * * * * * * * * Motions that I show you next will use 2 layers * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * The replicated softmax model: How to modify an RBM to model word count vectors Modification 1: Keep the binary hidden units but use “softmax” visible units that represent 1-of-N Modification 2: Make each hidden unit use the same weights for all the visible softmax units. Modification 3: Use as many softmax visible units as there are non-stop words in the document. So its actually a family of different-sized RBMs that share weights. It not a single generative model. Modification 4: Multiply each hidden bias by the number of words in the document (not done in our earlier work) The replicated softmax model is much better at modeling bags of words than LDA topic models (in NIPS 2009) The replicated softmax model All the models in this family have 5 hidden units. This model is for 8-word documents. Time series models Inference is difficult in directed models of time series if we use non-linear distributed representations in the hidden units. It is hard to fit Dynamic Bayes Nets to high-dimensional sequences (e.g motion capture data). So people tend to avoid distributed representations and use much weaker methods (e.g. HMM’s). Time series models If we really need distributed representations (which we nearly always do), we can make inference much simpler by using three tricks: Use an RBM for the interactions between hidden and visible variables. This ensures that the main source of information wants the posterior to be factorial. Model short-range temporal information by allowing several previous frames to provide input to the hidden units and to the visible units. This leads to a temporal module that can be stacked So we can use greedy learning to learn deep models of temporal structure. The conditional RBM model (a partially observed CRF) Start with a generic RBM. Add two types of conditioning connections. Given the data, the hidd
显示全部
相似文档