Computing gaussian mixture models with EM using equivalence constraints.pdf
文本预览下载声明
Computing Gaussian Mixture Models with EM using
Side-Information
Noam Shental fenoam@cs.huji.ac.il
Aharon Bar-Hillel aharonbh@cs.huji.ac.il
Tomer Hertz tomboy@cs.huji.ac.il
Daphna Weinshall daphna@cs.huji.ac.il
School of Computer Science and Engineering and the Center for Neural Computation, The Hebrew University
of Jerusalem, 91904 Jerusalem, ISRAEL
Abstract
Estimation of Gaussian mixture models is an
efficient and popular technique for clustering
and density estimation. An EM procedure is
widely used to estimate the model parame-
ters. In this paper we show how side informa-
tion in the form of equivalence constraints can
be incorporated into this procedure, leading
to improved clustering results. Equivalence
constraints are prior knowledge concerning
pairs of data points, indicating if the points
arise from the same source (positive con-
straint) or from different sources (negative
constraint). Such constraints can be gath-
ered automatically in some learning prob-
lems, and are a natural form of supervision
in others. We present a closed form EM
procedure for handling positive constraints,
and a Generalized EM procedure using a
Markov net for the incorporation of negative
constraints. Using publicly available data
sets we demonstrate that such side informa-
tion may lead to considerable improvement
in clustering tasks, and that our algorithm is
preferable to another suggested method using
this type of side information.
Keywords: Learning from partial knowledge, semi-
supervised learning, Gaussian mixture models, clus-
tering.
1. Introduction
We are used to thinking about learning from labels as
supervised learning, and learning without labels as un-
supervised learning, where ’supervised’ implies a need
for human intervention. However, in unsupervised
learning we are not limited to using data statistics
only. Similarly supervised learning is not limited to
using labels. In this work we focus on semi-supervised
learning using side-information, which is not given
显示全部