Multi-Grained Temporal Clip Transformer for Skeleton-Based Human Activity Recognition
2025
Peiwang Zhu | Chengwu Liang | Yalong Liu | Songqi Jiang
Skeleton-based human activity recognition is a key research topic in the fields of deep learning and computer vision. However, existing approaches are less effective at capturing short-term sub-action information at different granularity levels and long-term motion correlations, which affect recognition accuracy. To overcome these challenges, an innovative multi-grained temporal clip transformer (MTC-Former) is proposed. Firstly, based on the transformer backbone, a multi-grained temporal clip attention (MTCA) module with multi-branch architecture is proposed to capture the characteristics of short-term sub-action features. Secondly, an innovative multi-scale spatial–temporal feature interaction module is proposed to jointly learn sub-action dependencies and facilitate skeletal motion interactions, where long-range motion patterns are embedded to enhance correlation modeling. Experiments were conducted on three datasets, including NTU RGB+D, NTU RGB+D 120, and InHARD, and achieved state-of-the-art Top-1 recognition accuracy, demonstrating the superiority of the proposed MTC-Former.
اظهر المزيد [+] اقل [-]الكلمات المفتاحية الخاصة بالمكنز الزراعي (أجروفوك)
المعلومات البيبليوغرافية
تم تزويد هذا السجل من قبل Directory of Open Access Journals