Biblio
Filters: Author is Sefati, Shahin [Clear All Filters]
Conversational Content Discovery via Comcast X1 Voice Interface. Proceedings of the 12th ACM Conference on Recommender Systems. :489–489.
.
2018. The global market for intelligent voice-enabled devices is expanding at a fast pace. Comcast, one of the largest cable provides in the US with about 30 million users, has recently reinvented the way that customers can discover and access content on an entertainment platform by introducing a voice remote control for its Xfinity X1 entertainment platform. Spoken language input allows the customer to express what they are interested in on their terms, which has made it significantly more convenient for the users to find their favorite TV channel or movie compared to the traditional limits of a screen menu navigated with the keys of a TV remote. The more natural user experience via voice interface results in voice queries that are considerably more complex to handle compared to channel numbers typed in or movie titles selected on screen and this poses a challenge for the platform to understand the user intent and find the appropriate action for millions of voice queries that we receive every day. This also makes it necessary to adapt the underlying content recommendation algorithms to incorporate the richer intent context from the users. We describe some of the key components of our voice-powered content discovery platform that addresses specifically these issues. We discuss how we leverage multimodal data including voice queries and large database of metadata to enable a more natural search experience via voice queries for finding relevant movies, TV shows or even a specific episode of a series. We describe the models that encode semantic similarities between the content and their metadata to allow users to search for places, people, topics using keywords or phrases that do not explicitly appear in the movie/show titles as is traditionally the case. We describe how this category of voice search queries can be framed as a recommendation problem. Even though voice input is extremely powerful to capture the intent of our customers, the freedom to say anything makes it also more difficult for a voice remote user to know the range of possible queries that are supported by our system. We show how we can leverage millions of voice queries that we receive every day to build and train a deep learning-based recommender system that produces different types of recommendations such as educational suggestions and tips for voice commands that the platform support. Finally, it is important to consider that the true potential of the voice-powered entertainment experience is the result of the fusion of intents expressed in language with navigation of content on the screen via the remote navigation buttons. For all the applications and features discussed in this talk, our recommendation systems are adapted to provide the most relevant suggestions no matter if the voice interface is initiating the action, navigating through the results rendered on the TV screen and narrowing down the set of results by allowing the user to ask follow-up queries or select buttons.