Harmonic mixtures combining mixture models and graph-based methods for inductive and scalab.pdf
文本预览下载声明
Harmonic mixtures: combining mixture models and graph-based methods for
inductive and scalable semi-supervised learning
Xiaojin Zhu ZHUXJ@CS.CMU.EDU
John Lafferty LAFFERTY@CS.CMU.EDU
School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213 USA
Abstract
Graph-based methods for semi-supervised learn-
ing have recently been shown to be promising for
combining labeled and unlabeled data in classifi-
cation problems. However, inference for graph-
based methods often does not scale well to very
large data sets, since it requires inversion of a
large matrix or solution of a large linear program.
Moreover, such approaches are inherently trans-
ductive, giving predictions for only those points
in the unlabeled set, and not for an arbitrary test
point. In this paper a new approach is presented
that preserves the strengths of graph-based semi-
supervised learning while overcoming the lim-
itations of scalability and non-inductive infer-
ence, through a combination of generative mix-
ture models and discriminative regularization us-
ing the graph Laplacian. Experimental results
show that this approach preserves the accuracy of
purely graph-based transductive methods when
the data has “manifold structure,” and at the
same time achieves inductive learning with sig-
nificantly reduced computational cost.
1. Introduction
The availability of large data collections, with only limited
human annotation, has turned the attention of a growing
community of machine learning researchers to the problem
of semi-supervised learning. The broad research agenda
of semi-supervised learning is to develop methods that can
leverage a large amount of unlabeled data to build more
accurate classification algorithms than can be achieved us-
ing purely supervised learning. An attractive new family of
semi-supervised methods is based on the use of a graphi-
cal representation of the unlabeled data—examples of this
Appearing in Proceedings of the 22
nd
International Conference
on Machine
显示全部