DanceConv: dance motion generation with convolutional networks

Kritsis, Kosmas; Gkiokas, Aggelos; Pikrakis, Aggelos; Katsouros, Vassilis

Παρακαλώ χρησιμοποιήστε αυτό το αναγνωριστικό για να παραπέμψετε ή να δημιουργήσετε σύνδεσμο προς αυτό το τεκμήριο: https://hdl.handle.net/123456789/996

Τύπος:	Άρθρο σε επιστημονικό περιοδικό
Τίτλος:	DanceConv: dance motion generation with convolutional networks
Συγγραφέας:	[EL] Κρίτσης, Κοσμάς[EN] Kritsis, Kosmas [EL] Γκιόκας, Άγγελος[EN] Gkiokas, Aggelos [EL] Πικράκης, Άγγελος[EN] Pikrakis, Aggelos [EL] Κατσούρος, Βασίλειος[EN] Katsouros, Vassilis
Ημερομηνία:	22/04/2022
Περίληψη:	Automatically synthesizing dance motion sequences is an increasingly popular research task in the broader field of human motion analysis. Recent approaches have mostly used recurrent neural networks (RNNs), which are known to suffer from prediction error accumulation, usually limiting models to synthesize short choreographies of less than 100 poses. In this paper we present a multimodal convolutional autoencoder that combines 2D skeletal and audio information by employing an attention-based feature fusion mechanism, capable of generating novel dance motion sequences of arbitrary length. We first validate the ability of our system to capture the temporal context of dancing in a unimodal setting, by considering only skeletal features as input. According to 1440 rating answers provided by 24 participants in our initial user-study, we show that the optimal performance was presented by the model that was trained with input sequences of 500 poses. Based on this outcome, we train the proposed multimodal architecture with two different approaches, namely teacher-forcing and self-supervised curriculum learning, to deal with the autoregressive error accumulation phenomenon. In our evaluation campaign, we generate 1800 sequences and compare our method against two state-of-the-art approaches. Through qualitative and quantitative experiments we demonstrate the improvements introduced by the proposed multimodal architecture in terms of realism, motion diversity and multimodality, reducing the Fréchet Inception Distance (FID) metric value by 0.39. Subjective results confirm the effectiveness of our approach to synthesize diverse dance motion sequences, reporting a 6% increase in style consistency preference according to 1800 answers provided by 45 evaluators.
Γλώσσα:	Αγγλικά
Σελίδες:	19
DOI:	10.1109/ACCESS.2022.3169782
EISSN:	2169-3536
Θεματική κατηγορία:	[EL] Επιστήμη ηλεκτρονικών υπολογιστών, θεωρία και μέθοδοι[EN] Computer science, theory and methods
Λέξεις-κλειδιά:	Autoregressive model; CNN; curriculum learning; highway network; Sequence Generation
Κάτοχος πνευματικών δικαιωμάτων:	© The Author(s) 2022
Όροι και προϋποθέσεις δικαιωμάτων:	Under a Creative Commons License Attribution-NonCommercial-NoDerivatives 4.0 International
Ηλεκτρονική διεύθυνση του τεκμηρίου στον εκδότη:	https://ieeexplore.ieee.org/document/9762306
Ηλεκτρονική διεύθυνση περιοδικού:	https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639
Τίτλος πηγής δημοσίευσης:	IEEE Access
Τόμος:	10
Σελίδες τεκμηρίου (στην πηγή):	44982 - 45000
Σημειώσεις:	Funding: Greece and the European Union (European Social Fund—ESF) through the Operational Program “Human Resources Development, Education and Lifelong Learning 2014–2020” in the context of the project “Analysis and Processing of Motion and Sound Data for Real-Time Music Creation” (Grant Number: MIS 5047232)
Εμφανίζεται στις συλλογές:	Ερευνητικές ομάδες

Αρχεία σε αυτό το τεκμήριο:

Το πλήρες κείμενο αυτού του τεκμηρίου δεν διατίθεται προς το παρόν από το αποθετήριο