Exploring Multimedia Analytics: An Interview with Prof. Marcel Worring
Every minute, more than 200,000 new images are posted to Instagram, an average of 72 hours of video footage are uploaded to YouTube and more than 300,000 photos are shared through WhatsApp. Sharing photos, watching videos and interpreting visual elements like memes have become second nature to any regular internet user. Most of this visual data can be extremely interesting from a scientific and commercial perspective. Police officers, for instance, rely on visual information to solve crimes. Scientists can leverage images to understand our cultural practices or events that occur in the natural world. But the amount of visual data we produce today is staggering. Processing and analyzing this data is extremely challenging. We spoke to Prof. Marcel Worring, one of the leading experts worldwide in video and multimedia analytics, to learn more about the emerging field of visual data analysis, how it works, and how sectors like law enforcement are leveraging it to solve critical problems.
Marcel, could you tell us more about yourself and the institutions you currently work for?
I’m currently a Full Professor at Amsterdam Business School, where I offer courses on Data Science for Business Analytics, and I’m an Associate Professor at UvA’s Informatics Institute. I’m also part of the management team at Amsterdam Data Science (also one of the partners in The Analytics Academy), which is an initiative of several universities that brings leading data science experts together to promote data science research, innovation and education.
How did you end up in the field of visual data analysis?
I’ve been studying images for a long time. During my PhD, for instance, I focused on analyzing biological and medical images. I first analyzed these on an image by image basis. Then, I started looking at collections of images and eventually I started looking into video data. Currently, my research extends to multimedia analytics.
What is multimedia analytics?
Multimedia analytics is a rather new term that integrates multimedia analysis, multimedia mining, information visualization, and multimedia interaction. When you’re carrying out multimedia analysis you look at what’s in the video/image, what the video/image looks like, where was it filmed or taken, etc…but you can’t rely on a computer to do this alone. In most cases, you need an expert if you really want to understand the data. You have to combine the expertise of a human with the processing capabilities of a machine. Bringing the two together is important to have really good insight into what you’re actually observing.
How does it work?
We use computer models to carry out basic analyses. Most of these models operate from a set of examples (for instance, this image represents a dog or a car, etc). You can process a lot of videos and images and extract objective observations using these systems, which is quite useful if you have to sift through thousands of hours of video. Once you’ve processed the data, you leave the subtleties, such as what is actually happening and what kind of patterns you can identify, to an expert.
For instance, if a surveillance camera captures somebody trying to get into a car, the computer will tell you that there is a car and a person involved but only a human can tell you if that person is trying to steal something or simply looking for his/her keys.
What kind of tools do you need in order to do it?
You’ll need deep learning tools to perform multimedia analysis. Most of these tools are freely available and are already equipped with training examples. However, if you’re looking at highly specialized data or want to improve the quality of the system, you will need to tune the models and gather lots of new examples.
Which sectors are already using multimedia analytics?
It’s being used increasingly in industries like marketing, cultural heritage, and law enforcement. For example, a new project aims to help art historians understand what kinds of patterns you can observe overtime in artistic expression. How the use of color has evolved, what things have remained constant, and how different painting styles have developed.
Are sites like YouTube able to recognize video content using these techniques?
Yes, Google is doing video content analysis, just like other people are working on algorithms to analyze surveillance data or social media data. In all these cases, research is bringing multimedia analytics to the next level and improving accuracy.
What about law enforcement? Can you tell us more about the projects you’ve worked on with the police?
I’ve worked on several projects with the police, mainly on surveillance, fighting child abuse and identifying people that radicalize on the internet.
To fight child abuse or pornography, for instance, you need to analyze visuals to see whether there is something illegal going on. But if you have a lot of video material to look at, you need to speed up the process by working with a computer model to identify critical data sets and filter out material that is not relevant. Then you must bring in an expert to look at the subtleties, like identifying if there is a person below 18 in the footage – which can be very difficult.
In other cases, key patterns can be more easily identified by a computer. For instance, if the police has a picture of the room where the suspect was arrested, you can run an analysis to identify if the same room shows up in the video footage. A system can do this in less than an hour, but it would take days or weeks for a human to do it.
Likewise, while analyzing potential terrorists or the computer of a suspect you can use models to analyze additional material to aid in your investigation, such as social media posts, whatsapp messages, and similar data in order to find evidence for prosecution, build a profile of the subject and get a better understanding of who they should follow more closely.
Will we be able to perform these types of analyses in real time with surveillance cameras soon? How far has the field come?
Yes. In fact, you could say we can already do that. However, the algorithms often work well only when the footage was captured during the day and the problem is that several crimes happen at night. If the video was filmed at night, there is poor visibility and it’s harder for a computer to detect what’s going on. The real time aspect is also difficult because as I mentioned earlier, you still need a person to understand the subtleties. Cameras can detect if something is going on but an expert needs to be involved to understand exactly what.
In terms of how far we’ve come, 5 years ago I would’ve said we can’t do much.
Results were not really usable. Yet, in the last 5 years, deep learning has enabled us to do much more and we are now at a point where multimedia analytics is actually feasible and useful. The field holds a lot of promise and with technology advancing at such a rapid pace, it may come of age sooner than we think.
The post Exploring Multimedia Analytics: An Interview with Prof. Marcel Worring appeared first on This Complex World.