Transforming spatio-temporal self-attention using action embedding for skeleton-based action recognition

Ahmad, Tasweer; Rizvi, Syed Tahir Hussain; Kanwal, Neel

dc.contributor.author	Ahmad, Tasweer
dc.contributor.author	Rizvi, Syed Tahir Hussain
dc.contributor.author	Kanwal, Neel
dc.date.accessioned	2023-08-21T13:46:57Z
dc.date.available	2023-08-21T13:46:57Z
dc.date.created	2023-07-27T14:32:39Z
dc.date.issued	2023-09
dc.identifier.citation	Ahmad, T., Rizvi, S.T.H., Kanwal, N. (2023) Transforming spatio-temporal self-attention using action embedding for skeleton-based action recognition. Journal of Visual Communication and Image Representation, 95 (103892)	en_US
dc.identifier.issn	1047-3203
dc.identifier.uri	https://hdl.handle.net/11250/3085109
dc.description.abstract	Over the past few years, skeleton-based action recognition has attracted great success because the skeleton data is immune to illumination variation, view-point variation, background clutter, scaling, and camera motion. However, effective modeling of the latent information of skeleton data is still a challenging problem. Therefore, in this paper, we propose a novel idea of action embedding with a self-attention Transformer network for skeleton-based action recognition. Our proposed technology mainly comprises of two modules as, i) action embedding and ii) self-attention Transformer. The action embedding encodes the relationship between corresponding body joints (e.g., joints of both hands move together for performing clapping action) and thus captures the spatial features of joints. Meanwhile, temporal features and dependencies of body joints are modeled using Transformer architecture. Our method works in a single-stream (end-to-end) fashion, where MLP is used for classification. We carry out an ablation study and evaluate the performance of our model on a small-scale SYSU-3D dataset and large-scale NTU-RGB+D and NTU-RGB+D 120 datasets where the results establish that our method performs better than other state-of-the-art architectures.	en_US
dc.language.iso	eng	en_US
dc.publisher	Elsevier Ltd.	en_US
dc.rights	Navngivelse 4.0 Internasjonal	*
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/deed.no	*
dc.title	Transforming spatio-temporal self-attention using action embedding for skeleton-based action recognition	en_US
dc.type	Peer reviewed	en_US
dc.type	Journal article	en_US
dc.description.version	publishedVersion	en_US
dc.rights.holder	© 2023 The Author(s).	en_US
dc.subject.nsi	VDP::Teknologi: 500::Informasjons- og kommunikasjonsteknologi: 550	en_US
dc.source.pagenumber	13	en_US
dc.source.volume	95	en_US
dc.source.journal	Journal of Visual Communication and Image Representation	en_US
dc.identifier.doi	10.1016/j.jvcir.2023.103892
dc.identifier.cristin	2163804
dc.source.articlenumber	103892	en_US
cristin.ispublished	true
cristin.fulltext	original
cristin.qualitycode	2

Tilhørende fil(er)

Filnavn:: 1-s2.0-S1047320323001426-main.pdf
Størrelse:: 2.086Mb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Publikasjoner fra CRIStin [4377]
Vitenskapelige publikasjoner (TN-IDE) [251]

Vis enkel innførsel

Med mindre annet er angitt, så er denne innførselen lisensiert som Navngivelse 4.0 Internasjonal