In The Mind of Algorithms: A Conversation with UiPath’s Machine Learning Team
It’s everywhere. It’s all around you. It’s in your smartphone, in your e-mail, in your Amazon and your Netflix, in your car and in your favorite supermarket. It’s in Google’s CAPTCHA, in the stock market and probably behind the recent presidential vote. It’s in genomic sequencing, in particle physics and astronomy. What is it?
It’s Machine Learning. And it’s changing the world as we know it.
You give me data, I give you (instant) gratification
Here’s a question: why would someone ever want to keep in their house a machine that collects information about them 24/7 for purposes that are arguably beyond their knowledge and control?
It’s a trade. You entrust me with your data, and in return I give you answers to your questions, product recommendations and dating suggestions tailored to your interests, optimized driving routes, spam filters, or a new credit card.
As our digital footprint deepens, most of the data we continuously generate is being collected, processed and transformed into useful products or services. Just as Google’s algorithms determine to a great extent what information you find, Amazon can largely influence what products you buy.
Machine Learning (ML) algorithms have an extraordinary capacity to process vast amounts of data and find patterns in it. And the more data there is, the more they learn. For many applications—from vision to speech to robotics, and in different areas of business—from retail to finance to manufacturing, Machine Learning is becoming the new driving force.
To give you a rough knowledge of this technology, a conceptual model to better navigate the expert field currently taking our own industry, automation, to new heights, today we’ll introduce you to UiPath’s team of Machine Learning developers. Stefan, Virgil, and Dragos are leading the research and development of Machine Learning here at UiPath.
Guys, what is Machine Learning?
Virgil: Machine Learning is a subfield of Artificial Intelligence (AI) that enables systems to learn from data. It has at its core Deep Neural Networks, as does most of the current state of the art AI.
Dragos: Deep Learning—the part of ML that we are using—focuses exclusively on multi-level Neural Networks. Basically it involves a network of information that takes pieces of knowledge, combines them in various ways, and finally builds them up towards sensible, high-level meaning.
In the past, AI was composed of lots of very specific algorithms invented for all sorts of problems, from finding contours in pictures to very specialized things like detecting faces. A big part of the job was engineering all this domain knowledge into the algorithms.
Now, thanks to recent advancements in Neural Networks research and hardware computing power, it became feasible to leave this reverse engineering task to a Neural Network and assist its learning process in various ways. The biggest advantage is that Neural Network training, like pedagogy if you will, is almost universally transferable across domains. For example, teaching maths is not that different from teaching chemistry (same teaching method, different curriculum). Similarly, here at UiPath we can use the same state-of-the-art methods that others use for OCR engines, speech recognition, self-driving cars, etc.
Gartner predicts that Machine Learning will reach mainstream adoption in two to five years from now:
“Machine Learning is one of the hottest concepts in technology at the moment, given its extensive range of effects on business. A sub-branch of Machine Learning, called Deep Learning, which involves Deep Neural Nets, is receiving additional attention because it harnesses cognitive domains that were previously the exclusive territory of humans: image recognition, text understanding and audio recognition.”
So what is currently embedded in our Platform in terms of Machine Learning?
Stefan: So actually in the product we have integrated different OCR components. We are using OpenCV to process images, and we also support text analysis based on Microsoft, Google and IBM components. Our image recognition engine uses powerful algorithms that are optimized to find images on screen in under 100 milliseconds. This makes it possible to automate even the most complex applications, available through Citrix and other virtual environments. In fact, it takes almost the same amount of time to build an integration that involves Citrix as it takes to automate a regular desktop application.
And what are we planning to develop going forward?
Stefan: There are three main directions. The first one is related to the way UiPath interacts with the target application—the application which we are trying to automate.
The current detection engine is based on different Accessible API’s. That’s why our screen scraping engine is strongly connected with the execution environment. We plan to incorporate ML especially Deep Learning in our product such that the system will be able to understand any screen, similar to the way humans can understand it. In this way our core detection engine will become invariant to the execution platform. This will also lead to the ability to continuously train our engine by assisting a human user.
The second direction is to offer more Cognitive activities related to natural language parsing and image processing.
And the third direction is to also offer businesses the possibility to build, train and customize different Machine Learning models for performing different tasks, mainly classification and detection.
So with all these enhancements, automation will gradually come closer to emulating and augmenting the power of the human brain.
Stefan: Yes. It has always been the specialty of humans to read and listen to words or capture images. But with the advent of Machine Learning, Natural Language Processing, Neural Networks, Deep Learning and so forth, being able to read text, understand voice and recognize images is also becoming the domain of machines. And the application of these technologies in business will open up possibilities that were previously unimagined.
One of the first, most effective outcomes of applying Machine Learning to RPA will be a newly gained ability of robots to handle complex processing exceptions autonomously. By learning from historical data, they could predict exceptions and prevent anomalies, eliminating the time, effort and cost needed to handle them. All of this will greatly extend the scope of automation to include many activities that involve human judgement.
Using learning algorithms, an RPA robot could make processing decisions contextually, while considering millions of data points from past experience and delivering more accurate predictions. In a claims processing scenario, for example, the robot would automatically review the claim file, eliminate duplicate entries, assess eligibility and then deliver adjudication decisions with human-level precision.
What sparked your interest for this domain?
“(…)In most of the computer science subfields, a scientist or a programmer without basic AI knowledge will be like a blind painter.”
Virgil: The raw power of Deep Neural Networks, the fact that it employs a lot of math and because it’s a mandatory skill for any future computer scientist. In most of the computer science subfields, a scientist or a programmer without basic AI knowledge will be like a blind painter.
Dragos: I guess the thing that awed me the most was this framework of representing knowledge. As a layman, seemingly trivial concepts like “what is a pen” were very fuzzy and ungraspable when I tried vizualizing them. There’s no obvious way of quantitatively representing a pen, so you could say, “look, this picture’s of a pen because this or that.”
Trying to make sense when your pen is just an array of numbers is even more mind-boggling. You start imagining various rules, that get very complicated very quickly, you get scared and think how could anybody ever do this? But that’s what people actually did for a long time (and are interesting on their own). So naturally, I got very excited at discovering methodical ways that can attack these sorts of problems. Also, it’s amazing that we live in a time when we can put them to good use!
The hottest job in Silicon Valley
According to Tim O’Reilly, data scientist is the sexiest job today. Machine Learning experts are rare, forming an elite category that is frenetically being hunted by the big players the likes of Google, Facebook or Amazon. The McKinsey Global Institute estimates that:
“There will be a shortage of talent necessary for organizations to take advantage of big data. By 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions.”
They say Machine Learning is the ideal occupation, because learning algorithms do all the work but let you take all the credit. What have you got to say in your defense?
“The whole concept of learning needs a function that will measure how good or bad a prediction is. If you see a cat, it’s wrong to say it’s a dog. It’s very wrong to say it’s a truck.(…)”
Dragos: A 7 year old knows how to read, right? Imagine giving him a full fridge, and a cookbook. His beef wellington will be similar to the results of an off-the-shelf Neural Network that somebody threw data at. There are at least 3 important things that a Machine Learning developer does:
First, he has to know how different algorithms work, and why they work, so that he knows what tool to choose for the job. Second, he has to figure out how to make the best use of domain knowledge, and give possible “shortcuts” to the Neural Net—this can turn potentially unworkable problems feasible, because it heavily trims down the “number” of bad tries. This is both a problem understanding challenge, and also a technical challenge—you have to write good code for it. Third, once all the pieces are in place, you have to attend to the whole training process, because there are many more ways for it to go on a wrong path than not:
The whole concept of learning needs a function that will measure how good or bad a prediction is. If you see a cat, it’s wrong to say it’s a dog. It’s very wrong to say it’s a truck. This function—the cost function—turns out to be very hard to design, as it depends very much on the problem at hand (eg. the data distribution).
You have to find good learning rates at different stages, so that it doesn’t start out too slow, and so that it can learn fine details later on.
Make sure the model does not overfit the learning data, so that it will perform well in real-life situations.
Figure out edge cases, adversarial examples, and understand why they occur and how to protect against them.
As a final note, could you share your favorite resource on ML knowledge for all aficionados out there?
Dragos: This is an emerging field, with advancements being made every few weeks. Arxiv.org is the website where most of the research is published, and that’s what you’ll usually find the team sipping their coffee over. For maths and other bottom-up knowledge I love OCW and Stanford’s online courses. And of course, like any developer, we all have that Chrome window with 20 tabs of StackOverflow and Github threads.