Visible to the public Biblio

Filters: Author is Schuller, Björn  [Clear All Filters]
2017-10-18
Valstar, Michel, Baur, Tobias, Cafaro, Angelo, Ghitulescu, Alexandru, Potard, Blaise, Wagner, Johannes, André, Elisabeth, Durieu, Laurent, Aylett, Matthew, Dermouche, Soumia et al..  2016.  Ask Alice: An Artificial Retrieval of Information Agent. Proceedings of the 18th ACM International Conference on Multimodal Interaction. :419–420.

We present a demonstration of the ARIA framework, a modular approach for rapid development of virtual humans for information retrieval that have linguistic, emotional, and social skills and a strong personality. We demonstrate the framework's capabilities in a scenario where `Alice in Wonderland', a popular English literature book, is embodied by a virtual human representing Alice. The user can engage in an information exchange dialogue, where Alice acts as the expert on the book, and the user as an interested novice. Besides speech recognition, sophisticated audio-visual behaviour analysis is used to inform the core agent dialogue module about the user's state and intentions, so that it can go beyond simple chat-bot dialogue. The behaviour generation module features a unique new capability of being able to deal gracefully with interruptions of the agent.

2017-03-07
Pohjalainen, Jouni, Fabien Ringeval, Fabien, Zhang, Zixing, Schuller, Björn.  2016.  Spectral and Cepstral Audio Noise Reduction Techniques in Speech Emotion Recognition. Proceedings of the 2016 ACM on Multimedia Conference. :670–674.

Signal noise reduction can improve the performance of machine learning systems dealing with time signals such as audio. Real-life applicability of these recognition technologies requires the system to uphold its performance level in variable, challenging conditions such as noisy environments. In this contribution, we investigate audio signal denoising methods in cepstral and log-spectral domains and compare them with common implementations of standard techniques. The different approaches are first compared generally using averaged acoustic distance metrics. They are then applied to automatic recognition of spontaneous and natural emotions under simulated smartphone-recorded noisy conditions. Emotion recognition is implemented as support vector regression for continuous-valued prediction of arousal and valence on a realistic multimodal database. In the experiments, the proposed methods are found to generally outperform standard noise reduction algorithms.