Biblio
We present Diana, an embodied agent who is aware of her own virtual space and the physical space around her. Using video and depth sensors, Diana attends to the user's gestures, body language, gaze and (soon) facial expressions as well as their words. Diana also gestures and emotes in addition to speaking, and exists in a 3D virtual world that the user can see. This produces symmetric and shared perception, in the sense that Diana can see the user, the user can see Diana, and both can see the virtual world. The result is an embodied agent that begins to develop the conceit that the user is interacting with a peer rather than a program.
Conversational systems are computer programs that interact with users using natural language. Considering the complexity and interaction of the different components involved in building intelligent conversational systems that can perform diverse tasks, a promising approach to facilitate their development is by using multiagent systems (MAS). This paper reviews the main concepts and history of conversational systems, and introduces an architecture based on MAS. This architecture was designed to support the development of conversational systems in the domain chosen by the developer while also providing a reusable built-in dialogue control. We present a practical application in the healthcare domain. We observed that it can help developers to create conversational systems in different domains while providing a reusable and centralized dialogue control. We also present derived lessons learned that can be helpful to steer future research on engineering domain-specific conversational systems.
This paper presents a computational model for managing an Embodied Conversational Agent's first impressions of warmth and competence towards the user. These impressions are important to manage because they can impact users' perception of the agent and their willingness to continue the interaction with the agent. The model aims at detecting user's impression of the agent and producing appropriate agent's verbal and nonverbal behaviours in order to maintain a positive impression of warmth and competence. User's impressions are recognized using a machine learning approach with facial expressions (action units) which are important indicators of users' affective states and intentions. The agent adapts in real-time its verbal and nonverbal behaviour, with a reinforcement learning algorithm that takes user's impressions as reward to select the most appropriate combination of verbal and non-verbal behaviour to perform. A user study to test the model in a contextualized interaction with users is also presented. Our hypotheses are that users' ratings differs when the agents adapts its behaviour according to our reinforcement learning algorithm, compared to when the agent does not adapt its behaviour to user's reactions (i.e., when it randomly selects its behaviours). The study shows a general tendency for the agent to perform better when using our model than in the random condition. Significant results shows that user's ratings about agent's warmth are influenced by their a-priori about virtual characters, as well as that users' judged the agent as more competent when it adapted its behaviour compared to random condition.
Conversational agents assist traditional teaching-learning instruments in proposing new designs for knowledge creation and learning analysis, across organizational environments. Means of building common educative background in both industry and academic fields become of interest for ensuring educational effectiveness and consistency. Such a context requires transferable practices and becomes the basis for the Agile adoption into Higher Education, at both curriculum and operational levels. The current work proposes a model for delivering Agile Scrum training through an assistive web-based conversational service, where analytics are collected to provide an overview on learners' knowledge path. Besides its specific applicability into Software Engineering (SE) industry, the model is to assist the academic SE curriculum. A user-acceptance test has been carried out among 200 undergraduate students and patterns of interaction have been depicted for 2 conversational strategies.
Engineering a successful conversational AI agent is a tough process, and requires the consideration of achieving an effective communication between its various endpoints. In this paper, we present our perspective for designing an efficient conversational agent according to our belief that the existence of a centralized learning module that is capable of analyzing and understanding humans' behaviour from day one, and acting upon this behaviour is a must.
The advances in natural language processing and the wide use of social networks have boosted the proliferation of chatbots. These are software services typically embedded within a social network, and which can be addressed using conversation through natural language. Many chatbots exist with different purposes, e.g., to book all kind of services, to automate software engineering tasks, or for customer support. In previous work, we proposed the use of chatbots for domain-specific modelling within social networks. In this short paper, we report on the needs for flexible modelling required by modelling using conversation. In particular, we propose a process of meta-model relaxation to make modelling more flexible, followed by correction steps to make the model conforming to its meta-model. The paper shows how this process is integrated within our conversational modelling framework, and illustrates the approach with an example.
A conversational agent to detect anomalous traffic in consumer IoT networks is presented. The agent accepts two inputs in the form of user speech received by Amazon Alexa enabled devices, and classified IDS logs stored in a DynamoDB Table. Aural analysis is used to query the database of network traffic, and respond accordingly. In doing so, this paper presents a solution to the problem of making consumers situationally aware when their IoT devices are infected, and anomalous traffic has been detected. The proposed conversational agent addresses the issue of how to present network information to non-technical users, for better comprehension, and improves awareness of threats derived from the mirai botnet malware.
Building natural and conversational virtual humans is a task of formidable complexity. We believe that, especially when building agents that affectively interact with biological humans in real-time, a cognitive science-based, multilayered sensing and artificial intelligence (AI) systems approach is needed. For this demo, we show a working version (through human interaction with it) our modular system of natural, conversation 3D virtual human using AI or sensing layers. These including sensing the human user via facial emotion recognition, voice stress, semantic meaning of the words, eye gaze, heart rate, and galvanic skin response. These inputs are combined with AI sensing and recognition of the environment using deep learning natural language captioning or dense captioning. These are all processed by our AI avatar system allowing for an affective and empathetic conversation using an NLP topic-based dialogue capable of using facial expressions, gestures, breath, eye gaze and voice language-based two-way back and forth conversations with a sensed human. Our lab has been building these systems in stages over the years.
There are many challenges when it comes to deploying robots remotely including lack of operator situation awareness and decreased trust. Here, we present a conversational agent embodied in a Furhat robot that can help with the deployment of such remote robots by facilitating teaming with varying levels of operator control.
This one-day workshop intends to bring together both academics and industry practitioners to explore collaborative challenges in speech interaction. Recent improvements in speech recognition and computing power has led to conversational interfaces being introduced to many of the devices we use every day, such as smartphones, watches, and even televisions. These interfaces allow us to get things done, often by just speaking commands, relying on a reasonably well understood single-user model. While research on speech recognition is well established, the social implications of these interfaces remain underexplored, such as how we socialise, work, and play around such technologies, and how these might be better designed to support collaborative collocated talk-in-action. Moreover, the advent of new products such as the Amazon Echo and Google Home, which are positioned as supporting multi-user interaction in collocated environments such as the home, makes exploring the social and collaborative challenges around these products, a timely topic. In the workshop, we will review current practices and reflect upon prior work on studying talk-in-action and collocated interaction. We wish to begin a dialogue that takes on the renewed interest in research on spoken interaction with devices, grounded in the existing practices of the CSCW community.
The purpose of this paper is to examine the influence of lexical entrainment while communicating with a conversational agent. We consider two types of cognitive information processing:top-down processing, which depends on prior knowledge, and bottom-up processing, which depends on one's partners' behavior. Each works mutually complementarily in interpersonal cognition. It was hypothesized that we will separate each method of processing because of the agent's behavior. We designed a word choice task where participants and the agent described pictures and selected them alternately and held two factors constant:First, the expectation about the agent's intelligence by the experimenter's instruction as top-down processing; second, the agent's behavior, manipulating the degree of intellectual impression, as bottom-up processing. The results show that people select words differently because of the diversity of expressed behavior and thus supported our hypothesis. The findings obtained in this study could bring about new guidelines for a human-to-agent language interface.
Our operational context is a task-oriented dialog system where no single module satisfactorily addresses the range of conversational queries from humans. Such systems must be equipped with a range of technologies to address semantic, factual, task-oriented, open domain conversations using rule-based, semantic-web, traditional machine learning and deep learning. This raises two key challenges. First, the modules need to be managed and selected appropriately. Second, the complexity of troubleshooting on such systems is high. We address these challenges with a mixed-initiative model that controls conversational logic through hierarchical classification. We also developed an interface to increase interpretability for operators and to aggregate module performance.
Recent developments in robotics and virtual reality (VR) are making embodied agents familiar, and social behaviors of embodied conversational agents are essential to create mindful daily lives with conversational agents. Especially, natural nonverbal behaviors are required, such as gaze and gesture movement. We propose a novel method to create an agent with human-like gaze as a listener in multi-party conversation, using Hidden Markov Model (HMM) to learn the behavior from real conversation examples. The model can generate gaze reaction according to users' gaze and utterance. We implemented an agent with proposed method, and created VR environment to interact with the agent. The proposed agent reproduced several features of gaze behavior in example conversations. Impression survey result showed that there is at least a group who felt the proposed agent is similar to human and better than conventional methods.
We present a system for identifying interesting social media posts on Twitter and delivering them to users' mobile devices in real time as push notifications. In our problem formulation, users are interested in broad topics such as politics, sports, and entertainment: our system processes tweets in real time to identify relevant, novel, and salient content. There are three interesting aspects to our work: First, instead of attempting to tame the cacophony of unfiltered tweets, we exploit a smaller, but still sizeable, collection of curated tweet streams corresponding to the Twitter accounts of different media outlets. Second, we apply distant supervision to extract topic labels from curated streams that have a specific focus, which can then be leveraged to build high-quality topic classifiers essentially "for free". Finally, our system delivers content via Twitter direct messages, supporting in situ interactions modeled after conversations with intelligent agents. These ideas are demonstrated in an end-to-end working prototype.
Companies want to offer chat bots to their customers and employees which can answer questions, enable self-service, and showcase their products and services. Implementing and maintaining chat bots by hand costs time and money. Companies typically have web APIs for their services, which are often documented with an API specification. This paper presents a compiler that takes a web API specification written in Swagger and automatically generates a chat bot that helps the user make API calls. The generated bot is self-documenting, using descriptions from the API specification to answer help requests. Unfortunately, Swagger specifications are not always good enough to generate high-quality chat bots. This paper addresses this problem via a novel in-dialogue curation approach: the power user can improve the generated chat bot by interacting with it. The result is then saved back as an API specification. This paper reports on the design and implementation of the chat bot compiler, the in-dialogue curation, and working case studies.