Simplifying Data Science with AutoML

Blog: NASSCOM Official Blog

A recent NASSCOM survey found that 60 percent of enterprise executives believe that investments in AI are a priority, and 45 percent want to use the technology for strategic decision-making, but only 20 percent believe they have done so successfully. The report also highlights challenges with respect to shortage of talent, complex workflows, data quality, unexplainable AI black box models and lack of business expertise within data science teams. So, how can we close some of these existing gaps? What is the role of Automated Machine Learning, or AutoML in improving efficiency and accelerating business growth?

The current challenges with ML workflows

AutoML is an exciting new development as it provides methods and processes to make Machine Learning or ML available to non-ML experts and helps improve the efficiency of ML, largely impacting organizational growth and success.

Currently, users have to select and test individual ML models on their data, and fine-tune them tediously in order to select and deploy the best performing models. This makes data science difficult for functional experts to understand, test and develop by themselves. Currently, ML involves a lot of steps which relies on human ML experts to perform tasks such as:

Raw data ingestion
Preprocess and data cleaning
Feature selection and construction
Select a model family
Parameter optimization and tuning
Postprocess machine learning models

A lot of these tasks involve manual programming and a large set of experts to rely on. Machine learning analysis can also be extremely complex and what we need right now is smarter optimization techniques, to perform tasks. Due to the existing gaps, the need of the hour is for ML applications which are easy to use and which can be used without expert knowledge. The research area which targets progressive automation of machine learning or AutoML is the new development in which companies are being able to apply and leverage data science into their business workflows, by using AI to automate time-consuming aspects of ML applications.

What is AutoML?

What AutoML does is that it puts the power of machine learning in the hands of everyone – right from CXO’s to data experts. Now, everyone within the organisation can run complex data science models. It creates a new class of citizen data scientists who can create advanced ML models with tremendous support from automation at each step of the workflow.

AutoML helps automate as many of the steps, without compromising the accuracy of the results. It automates the entire data workflow by integrating with ML algorithms and systematically comparing different models, providing complete transparency to the user for predictive decision making. AutomML takes advantage of the strengths of both humans and computers; and helps with data identification, data preparation, feature engineering, pre-processing, human friendly insights, easy deployment, model management and monitoring.

It is a productivity tool, as it allows for time to focus on the creative aspects of the data science process such as, deciding how to properly frame a data science problem, how to incorporate their domain knowledge, how to interpret results and how to communicate their results to their team.

Popular AutoML tools and platforms

What do AutoML tools look like? There are a host of tools out there – right from open source tools to off-the-shelf packages and research prototypes to commercial tools, which can help automate some or all parts of the machine learning pipeline. TPOT, devol and H2O.ai AutoML are examples of open source tools, which largely help configure the ML pipeline, deep learning architecture search and basic data preparation over the ML algorithms.

Some of the commercial tools that exist have, in comparison, much more simpler and seamless interfaces – for example, Google AutoML, H2O.ai Driverless AI which provides better feature construction and DataRobot which with its web-based interface eliminates the reliance on manual workflows and even supports external open-source algorithms and 24*7 availability in the cloud, giving users the power of AI to drive better business outcomes.

Future of AutoML

In the future, it will only make data science jobs more accessible. As the demand for analysis will increase, the demand for AutoML, too will increase, because businesses will become more and more hungry for data. Data scientists will be needed to represent the problem, interpret results, and apply models effectively and correctly. Having said that, experts will need to be better educated and trained – upskilling will become paramount to be able to stay ahead with the changing times.

The era of manual scripting for ML is reaching a critical point – it is constantly changing and evolving. In the coming years, we will see AutoML handle even more aspects of the data cleaning process and scale to larger datasets as it does now, and also vastly improve deep learning. Going forward, AutoML as a practice will transform data science as we know it, as it will continue to enable data experts to focus on posing the right questions, collecting and curating the right data and thinking like a data scientist.

Author: Chetan Alsisaria – CEO & Co-Founder, Polestar Solutions & Services Pvt Ltd.

The post Simplifying Data Science with AutoML appeared first on NASSCOM Community |The Official Community of Indian IT Industry.