Etude des regles d' arret en classification numerique.
1994
Baamal L.
The stopping rules are methods that allow the determination of the optimal number of clusters when performing cluster analysis. Several rules have been suggested, but they are not efficient in all cases. Performing simulations, a new rule was established that can be used for clustering data drawn from multinormal populations by methods which minimize the within-cluster variation. Two series of critical values have been determined, corresponding to the first kind error risks (alpha) of 5 per cent and 1 per cent. The performances of the proposed rule, of two inferential (BEALE's and DUDA-HART's rules) and three non-inferential rules (Cubic Clustering Criterion or CCC, pseudo-F and gamma-coefficient rules) have been evaluated using simulated and real data. The determination of the number of clusters is easier when sample size and the number of variables are large. For low values of these factors, the proportion of underestimation increases. The results of real data confirm those of simulated data with similar configuration. Under the conditions cited above and for sample sizes (n) higher than 200, we recommend the use of our rule at alpha=1 per cent if the variable number (p) is less than 16, and the rule of the gamma if it is larger than 16. If n is inferior than 200, we advise the use of CCC rule if p is less than 4, our rule at alpha=5 per cent if p is between 5 and 16 and gamma rule for p higher than 16. Finally, in order to permit a better use of real data, we have given elements for interpretation of the obtained clusters.
اظهر المزيد [+] اقل [-]الكلمات المفتاحية الخاصة بالمكنز الزراعي (أجروفوك)
المعلومات البيبليوغرافية
تم تزويد هذا السجل من قبل Wolters Kluwer