Cross Industry Standard Process for Data Mining – CRISP-DM
Process Model which is just apt for Data Mining and Analytics Projects
Six Step Process Model, which is a structured approach in handling Data Science Projects
as well as Artificial Intelligence Projects
While all the steps are equally important, let us discuss each step, in further detail.
Get started with the first stage of CRISP-DM:
Step 1: Business Understanding
The Four Key Steps of Business Understanding Phase of CRISP-DM
a. Define Business Problem
b. Assess and Analyze Scenarios
c. Define Data Mining Problem
d. Project Plan
1.a. Define Business Problem
Understanding the business problem is extremely pivotal because garbage-in garbage-out.
If Data Scientists and/or AI experts fail in this step, then all the subsequent steps will be a futile attempt in solving business problem.
Recommended: While it is fair to assume that customers understand their business problem well, one has to ‘don the hat’ of a consultant and perform market research. Research on what are the challenges of the industry which the customer is operating in, do we have a better problem to solve than that proposed by customer.
Sources which will help you perform very good review:
Economist Intelligence Unit for International Market
CEIC – India Premium Database
Agriculture sector business problem solved using Artificial Intelligence.
III. Business Problem C: Yield of crop is not improving year on year
Business Objective: Maximize Yield
Business Constraints: Minimize Cost
Digital Marketing business problem solved using AI.
V. Business Problem E: Google Adwords Strategy is not effective
Business Objective: Maximize Click Through Rate
Business Constraints: Minimize Cost Per Click
Firstly, one needs to know about the As-Is state analysis from the perspective of:
Data presently available
o Secondary data sources
o Size of data available
o How much data would get generated on daily basis?
o Various formats in which data is stored
Human Resources
o All the cross functional human resources
o Experience in Domain, Programming, AI, etc.
Availability of Human Resources
o Full time employees
o Contract employees and their tenure
o Who are the employees serving notice period, etc.
Risks
o Political
o Social
o Economical
o Technological
Secondly one has to analyze on what is required in terms of:
Hardware & Software
o Configuration of computers, servers
o Is data stored on cloud or on premise
o Streaming vs Batch processing of data
Human Resources
o Chief Data Scientists, Data Security Experts, etc.
o Data Engineers, Data Analysts, Data Scientists, etc.
o Web Application, Mobile Application, UI, UX developers for deployment, etc.
Record Assumptions & Constraints of each requirement
o All assumptions
o All constraints with respect to
Time, Cost, Scope, Resources, Risk, Quality
Verify these assumptions & constraints in light of data available
Next, Perform Risk Management for:
Timelines
Human Resources
Data
Hardware
Software
Financial Aspects
Finally, Documenting and defining success criteria along with ROI is important to measure the
project success. This will ensure that every stakeholder is aware of what constitutes a success.
Success criteria can be tangible as well as intangible. However, to remove room for ambiguity,
Now let us move on to the next step.
Stage 1.c. is explained in the section below:
1.c. Define Data Mining Problem
It is always required to clearly stalk out Data Mining Problem from Business problem.
While we have done this as part of 1.a., now we formally document the Data Mining
problem from Business Problem.
Here are the bunch of points, which should be considered:
● Pre-analysis phase
● Input to this will be Success criteria and business problem along with risks, assumptions
& constraints
● Technical discussions with Data Scientists, Data Analysts, Data Engineers, Architects,
etc.
● Understand on what ML, Data Mining techniques and algorithms are suitable for the
given business problem to be solved
● High level design for end to end solution architecture along with integration into existing
customer infrastructure
● Success criteria from Data Science perspective, e.g. no overfitting with accuracy of >
75%. Depends on industry - Social sciences or Medical sciences
Finally, we arrive at the final sub-module of step 1 of CRISP-DM.
Stage 1.d. is explained in the section below:
1.d. Project Plan
While a project plan may contain multiple components, our focus should be on the
following key components:
● High Level Timelines
● Allocated Human resources
● Allocated Hardware and Software
● Risks and Risk Response plans
● High Level Deliverables along with Success Criteria for each of 6 phases of CRISP-
DM
● Highlight One-time activities and Iterative activities pictorially.
Смотрите видео Project Management Methodology for Handling Data Science & AI Projects - Bharani Kumar онлайн без регистрации, длительностью часов минут секунд в хорошем качестве. Это видео добавил пользователь Bharani Depuru 05 Ноябрь 2020, не забудьте поделиться им ссылкой с друзьями и знакомыми, на нашем сайте его посмотрели 904 раз и оно понравилось 41 людям.