A research team from the University of Surrey and Stanford University has developed a new method that teaches Artificial Intelligence (AI) to understand human line sketches, even those drawn by non-artists. The model's performance in recognizing scene sketches is close to that of humans.
Dr. Yulia Gryaditskaya, a lecturer at the Centre for Vision, Speech and Signal Processing (CVSSP) at the University of Surrey and the People's Centre for Artificial Intelligence (PAI), said: "Sketches are a powerful visual communication language. They can sometimes be more expressive and flexible than verbal language. Developing tools that understand sketches is a step towards more powerful human-computer interaction and more efficient design workflows. For example, images can be searched for or created through sketches." People of all ages and backgrounds use drawing to explore new ideas and communicate. However, AI systems have always struggled with understanding sketches. AI must learn to understand images, which typically requires a time-consuming and labor-intensive process of collecting labels for each pixel in an image, followed by AI learning from these labels.
However, the research team taught AI by combining sketches with textual descriptions. It learned to group pixels and match them with categories in the descriptions. As a result, the AI demonstrated a richer and more human-like understanding than ever before. It could correctly identify and label objects like kites, trees, and giraffes with 85% accuracy, outperforming other models that rely on labeled pixels. In addition to identifying objects in complex scenes, it could also determine which object each stroke was meant to depict. This new method is not only applicable to informal sketches drawn by non-artists but also to sketches drawn without explicit training.
Dr. Judith Fan, an assistant professor of psychology at Stanford University, said: "Drawing and writing are among the most typical human activities, long used to capture people's observations and thoughts. This work has made exciting progress in enabling AI systems to understand the essence of what people are trying to convey, whether through pictures or words." The research is part of the People's Centre for Artificial Intelligence at the University of Surrey, particularly its SketchX initiative. SketchX uses AI to try to understand how we see the world through the way we draw.
Professor Song Yizhe, co-director of the People's Centre for Artificial Intelligence and head of SketchX, said: "This research is a prime example of how AI can enhance fundamental human activities like sketching. By understanding rough sketches with near-human accuracy, this technology has great potential to enhance people's natural creativity, regardless of artistic talent."
Paper link: https://arxiv.org/abs/2312.12463