In recent years, the topic of data science has attracted a lot of attention at many organizations. Frequently, however, there is a lack of clarity about how this discipline distinguishes itself from others, what the special features of a data science project are, and what expertise is required to complete such a project.
From April 2019 to February 2020, representatives from theory and practice – including experts from valantic Business Analytics and Prof. Michael Schulz of the Nordakademie – formed an open, virtual working group to develop a procedural model for data science projects, the Data Science Process Model or DASC-PM for short.
The project’s goal was not to develop new procedures, but rather to compile existing knowledge and structure it in a suitable form.
The working group’s results can be downloaded free of charge in the form of an 88-page whitepaper (German language version available only).
What is data science?
Based on the working group participants’ contributions, the following definition of data science is recommended:
Data science is an interdisciplinary field in which, using a scientific procedure, insights are extracted from sometimes complex data semi-automatically with existing or to-be-developed analytical processes and made useful taking into account their social effects.
The figure below illustrates the 4 principal steps for data science projects at a company or organization:
- Data has to be provided
- Data has to be analyzed
- Analysis results have to be made useful and
- Analysis results are used.
The solid arrows in the model depict the primary path for using the DASC-PM. The dotted arrows indicate potential connections back to previous phases, which may be necessary again and again thanks to the gaining of new insights in the course of the project.
An analytical undertaking is embedded in the domains. Within them, application cases are identified that justify a data science examination. A project order is formulated from one or several application cases; it is handled in the form of a data science project. If in this phase explicit tasks can be formulated from the domains, in other phases these frequently represent domain-specific framework conditions that influence the tasks. This is why the domains must be considered throughout.
Whitepaper addresses experts in the data science and business analytics sectors
The target audience for the whitepaper are people who participate directly or indirectly in data science projects. Basic knowledge of the complex of analytical information systems is assumed here. The procedural model should help communicate an understanding of the necessary tasks and relationships to all interest groups participating in data science projects. In addition, students can use this model to learn about this topic.
The procedural model is – like all models – a simplified version of reality. It should not be followed slavishly nor does it claim to depict every variant and eventuality of a procedure or methodology. It also does not offer instructions for the complete handling of each building block. Instead, the model provides a solid basis for carrying out data initiatives since it relies on more than just the experiences of a single company or research group.
That’s why DASC-PM is more than just a best practice approach. It’s a structured, sound, actionable presentation of one of the most relevant topics in business and science, namely, the planned, results-oriented use of data, data science.