Application of Hybrid Model Based on LASSO-SMOTE-BO-SVM to Lithology Identification During Drilling
2025
Hui Yao | Manyu Liang | Shangxian Yin | Qing Zhang | Yunlei Tian | Guoan Wang | Enke Hou | Huiqing Lian | Jinfu Zhang | Chuanshi Wu
Lithology identification during drilling plays a vital role in geological and geotechnical exploration, as it facilitates the early detection of formation-related hazards and supports the development of optimized mining strategies. Traditional lithology identification research involves problems such as fuzzy indicator characteristics and unbalanced sample quantities, which affect the accuracy and interpretability of model identification. In order to solve these problems, the Shanxi Guoqiang Coal Mine was taken as the research object, and a combined machine learning model was used to conduct a study on lithology identification during drilling. First, the least absolute shrinkage and selection operator (LASSO) algorithm was used to screen the independent variables and retain the parameters that contributed the most to lithology identification. Then, the synthetic minority oversampling technique (SMOTE) algorithm was used to expand the data samples, increase the amounts of minority sample data, and keep the ratios of various lithology data at 1:1. Then, the Bayesian optimization (BO) algorithm was used to optimize the penalty factor C and kernel function hyperparameter &gamma:&mdash:two important parameters of the support vector machine (SVM) model&mdash:and the BO-SVM lithology identification model was established. Finally, the data samples were processed, and the results were compared with those of single models and unbalanced sample processing to evaluate their effect. The results showed the following: during the drilling process, the four indicators of drilling speed, mud pressure, slurry flow rate, and torque are strongly correlated with the lithology and can be used for lithology identification and classification research. After the data set was oversampled using the SMOTE algorithm, each model had better robustness and generalization ability: the classification result evaluation indicators were also greatly improved, especially for the random forest model, which had a poor original evaluation effect. The BO algorithm was used to optimize the parameters of the SVM model and establish a combined model that correctly identified 95 groups of data out of 96 groups of test samples with an identification accuracy rate of 99%, which was better than that of the traditional machine learning model. The evaluation results were compared with measured data, which confirmed the reliability of the combined model classification method and its potential to be extended to lithology identification and classification work.
Show more [+] Less [-]AGROVOC Keywords
Bibliographic information
This bibliographic record has been provided by Multidisciplinary Digital Publishing Institute