Accounting for outliers and heteroskedasticity in multibreed genetic evaluations of postweaning gain of Nelore-Hereford cattle
Cardoso, F.F. | Rosa, G.J.M. | Tempelman, R.J.
The objectives of this study were to demonstrate the utility of hierarchical Bayesian models combining residual heteroskedasticity with robustness for outlier detection and muting and to evaluate the effects of such joint modeling in multibreed genetic evaluations. A 3 x 2 factorial specification of 6 residual variance models based on several distributional (Gaussian, Student's t, or Slash) and variability (homoskedastic or heteroskedastic) assumptions was used to analyze 22,717 postweaning gain records from a Nelore-Hereford population (40,082 animals in the pedigree). To illustrate the utility of the 2 robust distributional specifications (Student's t and Slash) for outlier detection and muting, 3 records from the same contemporary group (an extreme residual outlier, a mild residual outlier, and a near-zero residual) were chosen for further study. The posterior densities of the corresponding weighting variables of these records were used to assess their degree of Gaussian outlyingness and the ability of the robust models to mute the effects of deviant records. The Student's t heteroskedastic provided the best-fit model among the 6 specifications and was preferred for genetic merit inference. Kendall rank correlations of the posterior means of the additive genetic effects of the animals, used to compare the selection order of the Student's t and Gaussian models, were reasonably high across all animals within the most frequent genotypes, ranging from 0.83 to 0.91 and from 0.89 to 0.95 for the homoskedastic and the heteroskedastic versions, respectively. However, when considering only animals ranked in the top 10% by the customary Gaussian homoskedastic model, these rank correlations were reduced considerably, ranging from 0.29 to 0.57 and from 0.72 to 0.85 between the 2 residual densities within the homoskedastic and heteroskedastic versions, respectively. Rank correlations between the homoskedastic and heteroskedastic versions within each of the Gaussian and Student's t error models tended to be smaller, with a range from 0.68 to 0.90 across all animals and from 0.28 to 0.67 for animals ranked in the top 10%. These results support the implementation of robust models accounting for sources of heteroskedasticity to increase the precision and stability of multibreed genetic evaluations with proper statistical treatment of deviant records.Show more [+] Less [-]