If computers could tackle the difficult tasks of processing and understanding images, they could revolutionize how people shop, make movies, and drive—or rather not drive—cars. With artificial intelligence (AI), computers can actually ‘understand’ images. Within AI, one of the most promising methods of teaching computers to ‘see’ is deep learning, which gets its name from the many stages where the computer must make decisions when processing an image.
At McGill, deep learning has been applied to computer vision and digital image processing, among other applications. James Clark is a professor in the Department of Electrical and Computer Engineering and general chair of the biennial International Conference of Computer Vision (ICCV) that will take place in Montreal in 2021. Clark outlined cutting-edge deep learning research at McGill and explained what computer vision and image processing entail.
“Image processing is trying to improve the image or make it look different,” Clark said in an interview with The McGill Tribune. “Whereas in computer vision, we are trying to understand what is in the scene.”
Most people are likely more familiar with image processing, which is very commonly used in the film industry. Contemporary blockbuster movies often employ green screens, allowing objects in the foreground to be imposed on backgrounds that are filled in during post-production. Recent research at McGill’s Centre for Intelligent Machines has developed methods to move foreground objects without a green screen. For example, a person in front of a waterfall could be pasted onto an office background. The process involves making a rough cutout of a foreground object and cleaning up the edges of the object to blend it into the new scene. Clark noted that pasting objects can be challenging but that AI can play a role in alleviating such difficulties.
“Even with the green screen, it is a difficult problem,” Clark said. “Often, you would have artists go in and fix up the edges [.…] Now, you can have these AI techniques that do recognition and cutting out and then refining.”
The recognition stage, a computer vision task that has become dramatically easier in recent years, is a key step in allowing computers to see. A common way to quantify this approach is to simply ask a computer to identify the main object in an image based on a large set of images that it has already seen. In 2010, humans could still recognize and categorize images much better than computers. Just two years later, with the application of deep learning techniques to a well-known image recognition challenge, computers were suddenly better than humans. Clark sees this as a turning point for computer vision.
“Basically after three years of [deep learning techniques], the machines were better than humans at five per cent [error rate],” Clark said. “If it’s useful for this very difficult problem, it’s useful for a lot of problems.”
Researchers are using deep learning as a starting point to develop new techniques for niche applications. For example, Clark recently developed a deep learning network that uses the social cues of eye movement to detect the important objects in an image.
As the field grows over the next few years, McGill students may soon get a taste of these technologies firsthand. Clark is working with the Faculty of Management to create a lab for testing AI in retail. While an official announcement has not yet been made, Clark is excited for what could come out of the old Bronfman Cafeteria space.
“There will be an element of the Amazon Go stores,” Clark said. “We’ll do a lot of things [….] Stay tuned.”