Visible to the public Biblio

Filters: Author is Shubham, Kumar  [Clear All Filters]
2023-05-12
Shubham, Kumar, Venkatesan, Laxmi Narayen Nagarajan, Jayagopi, Dinesh Babu, Tumuluri, Raj.  2022.  Multimodal Embodied Conversational Agents: A discussion of architectures, frameworks and modules for commercial applications. 2022 IEEE International Conference on Artificial Intelligence and Virtual Reality (AIVR). :36–45.
With the recent advancements in automated communication technology, many traditional businesses that rely on face-to-face communication have shifted to online portals. However, these online platforms often lack the personal touch essential for customer service. Research has shown that face-to- face communication is essential for building trust and empathy with customers. A multimodal embodied conversation agent (ECA) can fill this void in commercial applications. Such a platform provides tools to understand the user’s mental state by analyzing their verbal and non-verbal behaviour and allows a human-like avatar to take necessary action based on the context of the conversation and as per social norms. However, the literature to understand the impact of ECA agents on commercial applications is limited because of the issues related to platform and scalability. In our work, we discuss some existing work that tries to solve the issues related to scalability and infrastructure. We also provide an overview of the components required for developing ECAs and their deployment in various applications.
ISSN: 2771-7453
2022-11-02
Shubham, Kumar, Venkatesh, Gopalakrishnan, Sachdev, Reijul, Akshi, Jayagopi, Dinesh Babu, Srinivasaraghavan, G..  2021.  Learning a Deep Reinforcement Learning Policy Over the Latent Space of a Pre-trained GAN for Semantic Age Manipulation. 2021 International Joint Conference on Neural Networks (IJCNN). :1–8.
Learning a disentangled representation of the latent space has become one of the most fundamental problems studied in computer vision. Recently, many Generative Adversarial Networks (GANs) have shown promising results in generating high fidelity images. However, studies to understand the semantic layout of the latent space of pre-trained models are still limited. Several works train conditional GANs to generate faces with required semantic attributes. Unfortunately, in these attempts, the generated output is often not as photo-realistic as the unconditional state-of-the-art models. Besides, they also require large computational resources and specific datasets to generate high fidelity images. In our work, we have formulated a Markov Decision Process (MDP) over the latent space of a pre-trained GAN model to learn a conditional policy for semantic manipulation along specific attributes under defined identity bounds. Further, we have defined a semantic age manipulation scheme using a locally linear approximation over the latent space. Results show that our learned policy samples high fidelity images with required age alterations, while preserving the identity of the person.