Speakers
Alane Suhr / UC Berkeley
The increasing capability of LLMs makes them appealing for adoption in labor-intensive human tasks. For example, significant efforts have recently focused on developing agents -- systems that map observations and instructions to executable actions -- and their benchmarks in real-world tasks like web navigation. In this talk, I will discuss recent work in training and improving such models through interactions with human users, and developing better evaluations for these agents, which in turn can be used to automatically improve agent performance without requiring any demonstration data or human annotation. However, in developing systems like this, and in applying LLMs and other large pre-trained models to real-world problems, we should be aware of their fundamental limitations; for example, their sensitivity to design considerations like prompt formatting. I will detail recent work where we find that LLMs can be incredibly sensitive to arbitrary design decisions, like choices of separators or multiple choice labels.
Speaker Bio: Alane Suhr recently joined EECS and BAIR at UC Berkeley as an Assistant Professor. Alane's work focuses on building language-using systems that communicate with and learn from human users in collaborative, situated interactions. Prior to joining Berkeley, Alane completed a PhD in Computer Science at Cornell University / Cornell Tech and spent a year afterwards as a Young Investigator at the Allen Institute for AI.
Anne Cocos / Netflix
Quantifying similarity between entities to be ranked (e.g. movie titles) is a core capability of recommendation systems. The meaning of similarity can take several forms. Many systems rely on an engagement-based measure of similarity calculated via collaborative filtering -- if users co-engage with a pair of entities frequently enough, their embeddings should be similar. But leaning on co-engagement similarity alone can degrade system performance in recommending new or unpopular entities for which engagement data is scarce. In this talk I will present our work on SemanticGNN, a graph-based model which integrates semantic factors like genre, maturity level, and themes to enhance similarity measurements. SemanticGNN addresses challenges of relation imbalance and scaling that are common in training graph-based models for real world use cases. Our approach uses a relation-aware attention graph neural network (GNN) to balance diverse relation types in our graph. Additionally we develop a task-specific graph partitioning scheme to scale training to millions of nodes and billions of edges on multiple GPUs. Deployed within Netflix, this model has shown up to 35% improved performance on similarity judgment tasks.
Speaker Bio: Anne Cocos is an applied machine learning research scientist who has built ML-powered products and teams across multiple domains. She is currently a Senior Research Scientist at Netflix where she develops algorithms for Netflix search and recommendation systems. In previous roles she has led machine learning at an early-stage geospatial data startup, overseen the development of GlaxoSmithKline's biomedical knowledge graph for drug discovery, and run pediatric biomedical research projects at The Children's Hospital of Philadelphia. Anne completed her PhD in computer science at the University of Pennsylvania, where she was supported by the Google Ph.D. Fellowship and the AI2 Key Scientific Challenges award. Before focusing on artificial intelligence, Anne served as an intelligence officer in the U.S. Navy.
Michael Bendersky / Google DeepMind
Generating content that is adapted to personal style and context is an emerging research area that has been steadily gaining attention from the research community. In this talk, I will review some of our ongoing work on improving the capabilities of large language models (LLMs) to generate personalized content. I will first introduce the general framework for personalized generation, and then delve deeper into how LLMs can be more effectively aligned with personal context. Finally, I will discuss the evaluation challenges inherent in personalized generation systems, and propose potential solutions to these challenges.
Speaker Bio: Michael Bendersky is an Engineering Director at Google DeepMind. He is currently managing a team whose mission is improving algorithms, models, and metrics for information discovery across Google products. Michael holds a Ph.D. from the University of Massachusetts Amherst, and is a Distinguished Member of the ACM. Michael co-authored over 100 publications in the areas of information retrieval, natural language processing, machine learning, and artificial intelligence. He co-authored two books in the 'Foundations and Trends in Information Retrieval' series: 'Information Retrieval with Verbose Queries', and 'Search and Discovery in Personal Email Collections'
Negin Rahimi / University of Massachusetts Amherst
Information access systems provide substantial information to many human activities and decision making with different levels of sensitivity. Existing information seeking systems, including those that employ large language models (LLMs), face challenges in appropriately providing users with unbiased, diverse, and well-explained responses to explore the answer space.
In this talk, I will present a summary of our works on interpretable information retrieval. I will provide a more detailed description of our efforts to enhance the intrinsic interpretability of models for providing users with diverse and unbiased results. Through enhanced interpretability, we have achieved comparable or even higher effectiveness. I conclude by highlighting the mutual benefits between interpretable information retrieval and generative AI systems.
Speaker Bio: Negin Rahimi is an Assistant Professor in UMass Amherst's Manning College of Information and Computer Sciences, where she is part of the Center for Intelligent Information Retrieval. Her research aims to build interactive and unbiased intelligent systems for information access through learning, user interaction, and enhanced interpretability. Her research is supported by Google Research Scholar, Adobe, and NSF awards including NSF CAREER award.
Surya Kallumadi / Lowe's
Talk Abstract TBD
Speaker Bio: Surya Kallumadi is a director of applied science at Lowe's, where he leads the machine learning initiatives in core search, recommendations, and personalization. Prior to that, he was with the Search Science team at The Home Depot, leading the search ranking initiative. In the past he worked with the data science teams at Flipkart and eBay in the fields of search and query understanding.
Tejaswi Tenneti / Instacart
In this presentation, we delve into two innovative approaches designed to improve grocery search results by enhancing both relevance and discoverability. Our focus is on the development and application of a new product relevance classification model, alongside the strategic integration of LLMs to improve discoverability of novel products.
Enhancing Search Relevance via ESCI Classification model:
We use the ESCI classification model, which classifies the relationship between user queries and catalog products into seven clear groups: Exact Match, Strong Substitute, Weak Substitute, Close Complement, Remote Complement, Irrelevant, and Offensive. This model helps us handle search results better by allowing for more specific retrieval and smarter ranking strategies. It simplifies the way we show search results, clearly highlighting exact matches and strategically displaying the substitutes and complements to create a better shopping experience for users.
Improving Product Discoverability with LLMs:
To tackle the limitations of our previous Query Understanding (QU) models, particularly the challenges with sparse data for tail and non-grocery queries, we integrated Large Language Models into our search framework. LLMs have significantly advanced our capability to interpret queries and content, generating inspirational content that not only enriches the search results page but also drives conversions through enhanced relevance and user engagement.
Conclusion:
By leveraging the precise categorization capabilities of the ESCI model and the contextual understanding provided by LLMs, we were able to anticipate and meet consumer needs more effectively. This has led to improved engagement and incremental revenue.
Speaker Bio: Tejaswi Tenneti is currently a Director of Machine Learning at Instacart, the north american leader in online grocery. Prior to Instacart, Tejaswi was a tech lead in machine learning teams at Apple and Oracle where he worked on various applications related to Search and Recommendations for local maps data and Enterprise. Tejaswi holds a BS from IIIT, Allahabad and an MS from Stanford University specializing in AI