Blog Posts

Why and how should you learn “Productive Data Science”?

Blog: Think Data Analytics Blog

Efficiency in data science workflow

Data science and machine learning can be practiced with varying degrees of efficiency and productivity. Irrespective of the application area or specialization, a data scientist — beginner or seasoned professional — should strive to enhance his/her efficiency at all aspects of typical data science tasks,

Image sourcePixabay (Free image)

This means performing all of these tasks,

What should you expect to learn in this process?

Let’s imagine somebody is teaching a “Productive Data Science” course or writing a book about it — using Python as the language framework. What should the typical expectations be from such a course or book?

The course/book should be intended for those who wish to leapfrog beyond the standard way of performing data science and machine learning tasks and utilize the full spectrum of the Python data science ecosystem for a much higher level of productivity.

Readers should be taught how to look out for inefficiencies and bottlenecks in the standard process and how to think beyond the box.

Visit here: Top Data Analytics Companies

Automation of repetitive data science tasks is a key mindset that the readers will develop from reading this book. In many cases, they will also learn how to extend the existing coding practice to handle larger datasets with high efficiency with the help of advanced software tools that already exist in the Python ecosystem but are not taught in any standard data science.

This should not be a regular Python cookbook teaching standard libraries like Numpy or Pandas.

Rather, it should focus on useful techniques such as how to measure the memory footprint and execution speed of ML models, quality test a data science pipeline, modularize a data science pipeline for app development, etc. It should also cover Python libraries which come in very handy for automating and speeding up the day-to-day tasks of any data scientist.

Furthermore, it should touch upon tools and packages which help a data scientist tackling large and complex datasets in a far more optimal way than what would have been possible by following standard Python data science technology wisdom.

Some specific skills to master

Image sourcePixabay (Free image)

To put things in concrete terms, let us summarize some specific skills to master for learning and practicing Productive Data Science. I have also tried to throw in the links to some representative articles to go with each skill as a reference.

  1. How to write fast and efficient code for data science/ML and how to measure their speed and efficiency (see this article)
  2. How to build modularized and expressive data science pipelines to improve productivity
  3. How to write testing modules for data science and ML models
  4. How to handle large and complex datasets efficiently (which would have been difficult with traditional DS tools)
  5. How to fully utilize GPU and multi-core processors for all kinds of data science and analytics tasks, and not just for specialized deep learning modeling
  6. How to whip up quick GUI apps for the demo of a data science/ML idea or model tuning , or how to easily (and quickly) deploy ML models and data analysis code at an app-level.

An ideal book on this topic will…

Image sourcePixabay (Free image)
  1. Teach how to look out for inefficiencies and bottlenecks in the standard data science code and how to think beyond the box to solve those problems.
  2. Teach how to write modularized, efficient data analysis and machine learning code to improve productivity in a variety of situations — exploratory data analysis, visualization, deep learning, etc.
  3. Cover a wide range of side topics such as software testing, module development, GUI programmingML model deployment as web-app, which are invaluable skillsets for budding data scientists to possess and which are hard to find collectively in any one standard data science book.
  4. Cover parallel computing (e.g., Dask, Ray), scalability (e.g, Vaex, Modin), and GPU-powered data science stack (RAPIDS) with hands-on examples.
  5. Expose and guide the readers to a larger and ever-expanding Python ecosystem of data science tools that are connected to the broader aspects of software engineering and production-level deployment.

A concrete example: GPU-powered and distributed data science

While the use of GPUs and distributed computing is widely discussed in the academic and business circles for core AI/ML tasks, they have found less coverage in their utility for regular data science and data engineering tasks. However, using GPUs for regular day-to-day statistical analyses or other data science tasks can go a long way towards becoming the proverbial “Productive Data Scientist”.

For example, the RAPIDS suite of software libraries and APIs give you — a regular data scientist (and not necessarily a deep learning practitioner) — the option and flexibility to execute end-to-end data science and analytics pipelines entirely on GPUs.

Image source: Author created collage

When used even with a modest GPU, these libraries show remarkable improvement in speed over their regular Python counterparts. Naturally, we should embrace these whenever we can for Productive Data Science workflow.

Similarly, there are excellent open-source opportunities to go beyond the limits of the single-core nature of Python language and embrace the parallel computing paradigm without shifting away from the quintessential data scientist persona.

Image source: Author created collage

Summary

We discussed the utilities and core components of a Productive Data Science workflow. We imagined what an ideal course or book on this topic would offer to the readers. We touched upon some concrete examples and illustrated the benefits. Some related resources were also provided in the context of skills to master.

The post Why and how should you learn “Productive Data Science”? appeared first on ThinkDataAnalytics.

Leave a Comment

Get the BPI Web Feed

Using the HTML code below, you can display this Business Process Incubator page content with the current filter and sorting inside your web site for FREE.

Copy/Paste this code in your website html code:

<iframe src="https://www.businessprocessincubator.com/content/why-and-how-should-you-learn-productive-data-science/?feed=html" frameborder="0" scrolling="auto" width="100%" height="700">

Customizing your BPI Web Feed

You can click on the Get the BPI Web Feed link on any of our page to create the best possible feed for your site. Here are a few tips to customize your BPI Web Feed.

Customizing the Content Filter
On any page, you can add filter criteria using the MORE FILTERS interface:

Customizing the Content Filter

Customizing the Content Sorting
Clicking on the sorting options will also change the way your BPI Web Feed will be ordered on your site:

Get the BPI Web Feed

Some integration examples

BPMN.org

XPDL.org

×