Multimodal system analyzes speech, gesture, and facial cues from group interactions to predict individual expertise and leadership roles in collaborative learning. By training machine learning models on synchronized audio-visual data, the approach identifies patterns of influence and knowledge sharing. Results show the model predicts leadership emergence with over 80% accuracy, guiding interventions for effective team facilitation.">
In this study, we investigate low level predictors from audio and writing modalities for the separation and identification of socially dominant leaders and experts within a study group. We use a multimodal dataset of situated computer assisted group learning tasks: Groups of three high-school students solve a number of mathematical problems in two separate sessions. In order to automatically identify the socially dominant student and expert in the group we analyze a number of prosodic and voice quality features as well as writing-based features. In this preliminary study we identify a number of promising acoustic and writing predictors for the disambiguation of leaders, experts and other students. We believe that this exploratory study reveals key opportunities for future analysis of multimodal learning analytics based on a combination of audio and writing signals.