statistics for categorical surveys—a new strategy for multivariate classification and determining variable importance统计数据分类查看一个新战略多元分类和确定变量的重要性.pdf
文本预览下载声明
Sustainability 2010, 2, 533-550; doi:10.3390/su2020533
OPEN ACCESS
sustainability
ISSN 2071-1050
/journal/sustainability
Article
Statistics for Categorical Surveys—A New Strategy for
Multivariate Classification and Determining Variable
Importance
Alexander Herr
CSIRO Sustainable Ecosystems, Gungahlin Homestead, Bellenden Street, GPO Box 284, Crace,
ACT 2601, Canberra, Australia; E-Mail: alexander.herr@csiro.au; Tel.: +61-2-6242-1542;
Fax: +61-2-6242-1705.
Received: 20 December 2009 / Accepted: 9 February 2010 / Published: 10 February 2010
Abstract : Surveys can be a rich source of information. However, the extraction of
underlying variables from the analysis of mixed categoric and numeric survey data is
fraught with complications when using grouping techniques such as clustering or
ordination. Here I present a new strategy to deal with classification of households into
clusters, and identification of cluster membership for new households. The strategy relies
on probabilistic methods for identifying variables underlying the clusters. It incorporates
existing methods that (i) help determine the optimal cluster number, (ii) directly identify
variables underlying clusters, and (iii) identify the variables important for classifying new
cases into existing clusters. The strategy uses the R statistical software, which is freely
accessible to anyone.
Keywords: nominal; cluster; typology; statistics; data analysis; deci
显示全部