More than 3 years have passed since last update.

Data Science, ML Projects with Scrum - A Conflict ??

Posted at 2021-09-27

Here, I'm trying to express my opinion on scrum-managed data science projects, machine learning projects.

Agile project management methods are widely used in the modern software development industry to develop software products. ML, data science projects are data-driven projects. So naturally, scrum seems to be inapplicable. If you do a google search, you can see many data scientist and machine learning engineers doesn't like scrum.

Agile - Scrum

A scrum is a product-oriented approach. Start from a primary deliverable product and then give it brush-ups, updates incrementally. This process is proven to be a suitable method for software development. It involves planning, sprints, scrum meetings, and time estimations.

Machine learning, Data Science, and Research.

There is a significant difference in data science vs. regular software engineering projects. ML projects can be a successor or can be a failure. It is difficult to predict from the beginning that this project will be a success, and this project will fail Since we don't know about our data, model accuracy, etc from the start. The only possible thing is trying.

We can divide the data science project into two major phases.

Research

Reading.
Data gathering.
Understanding the data.
Preprocessing.
Feature engineering.
Building model.
Model evaluation.

Delivery

Integration.
Testing.
Deployment.
Monitor.

In my opinion, all things mentioned in phase 1 do not fit into the scrum method. The thing is, we are not dealing with deliverables, and our future actions are not transparent or plannable. So it is difficult(cannot) to commit to something unknown and cannot be assessed. In the research stage, we cannot make progress visible. In that case, we don't have anything to show. We can only show progress after built the model. Sprint time frames also do not fit in this type of project.
We cannot compare building a model to integrate an external API. The path to point A to B is unclear and uncertain. So commitment and velocity measurements are not appropriate. I think only daily meetings are helpful. We can talk about progress and issues.

But after moving into the second phase, It's standard software engineering flow. Usually, data scientists build models, and then ML engineers deploy it. If the team is large, that approach is possible. But what happens in small teams? You made it, so run it. :D .
But that doesn't mean data scientists shouldn't know other than the building models. In modern it's worth learning few technologies to serve the model via API.

I'm lucky to be in a scrum team with other members who understand situations. Few of my friends living overseas are having some issues with the team due to this scrum data science mismatch. So better understanding among the team members is a critical factor when using scrum for machine learning projects.

*本記事は @qualitia_cdevの中の一人、@nuwanさんが書いてくれました。
*This article is written by @nuwan a member of @qualitia_cdev.

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up