Summary of multivariate statistical clustering method has been widely used in natural science and social science fields, but in reality, cluster analysis multivariate data processing, statistical software can not do without the support, R software because of its free, open source, powerful Statistical analysis and mapping capabilities have been the perfect growing concern with the application, this paper describes an instance of the R statistical software in a multi-system analysis applications.

Introduction

Multivariate statistical analysis is an important branch of statistics, also known as multivariate statistical analysis, in real life, shared by many indicators of the role and impact of the phenomenon abound, multivariate statistical analysis is to study the interdependent relationship between the number of random variables its an important subject within the statistical laws, the most commonly used cluster analysis, cluster analysis as multivariate statistical methods generally involve complex mathematical theory, the general can not be calculated by hand, must have computer and statistical software.Cluster analysis is very rich, systematic clustering method, ordered sample clustering, dynamic clustering, fuzzy clustering, graph theory, clustering, clustering prediction method, the most commonly used cluster analysis of the most successful clustering method for the system, the system's basic idea of ??clustering first n samples of each as a class, then the provisions of the sample between the 'distance' between classes and the distance from the nearest two options combined into a new class, new classes and other types of computing (the distance of the current class, then the nearest two combined so that each class merger reduced until all samples are the property into a class so far.

Hierarchical clustering methods: 1, shortest distance, 2, the most long-distance method, 3, middle distance, 4, center of gravity method, 5, class average, 6, sum of squared deviations method (Ward method.

It can be seen from Figure 1, different methods of classification in general, as with the specific circumstances of Shandong Province, the most long-distance classification method is better.

In cluster analysis, using R software is the most convenient, simplest, most easy to learn, and depending on the circumstances, can modify other people's programs, more convenient, you can handle multiple data clustering analysis, using R software has a great advantage.

