Towards a Visual AI Assistant

Thursday, March 17, 2022 - 10:20 AM PST  
Visual Question Answering (VQA) is field in Computer Vision where the AI would aim to answer questions based on an image. Just like how you search Google with text or ask Siri questions using audio, VQA does the same for images. It combines OCR, object detection, image captioning and other major AI tasks into one multi-modal AI. This really complex problem has been quietly making big leaps in the past couple of years, coming close to the beating humans. Advances in this field along with the ones being made in OCR, Object Detection and image captioning will come together make a Visual AI Assistant possible in the near future. Such an assistant can further enhance the independence of blind and visually impaired people all over. This session will also attempt to showcase a small demo/video for participants to get a sense of the possibilities. The overall structure of the presentation will roughly be as follows: - Provide a general idea of what VQA is with some examples. - Give a simple, high-level overview of the different AIs involved in VQA and how they "talk" to each other. - Talk about the very cutting edge today. Showcase an example of how far things have come. - Dive into the concept of a visual assistant and why they might be not be a sci-fi idea alone. - Showcase an example or demo with a VQA-based visual assistant. - Discuss some concerns with such a visual assistant -- privacy, accuracy, etc. Discuss how these issues are being tackled. The entire presentation will be made accessible with alt-text for images on the slides, closed captions, etc.  
Imagine a future where you could ask questions about your surroundings and a Visual AI Assistant answers them effortlessly for you. It's not sci-fi. This session will introduce to all the AI advances happening today that will bring in visual assistants into a reality much sooner than you think.  
General Track  
