The Concept
Replace the human data scientist with a robot for
predictive data mining.
The Implementation
Our robot lives on Amazon Web Services. We are
currently using 16 processors for data analysis. There is no charge,
no advertising. Completely FREE, in the interest of disrupting the
marketplace for analytical software!!
Compare our new way of doing with the current old way:
The old way |
The new way |
- Human data scientist obtains data from client
- Data scientist interviews client and studies sources and meaning of
data
- Data scientist prepares data in a form suitable for machine
learning algorithms
- Data scientist selects learning method and sets various parameters
- Data scientist revises parameters or data prep to try to improve
results
- Data scientist repeats steps for other methods
|
- Client prepares data in a standard data-base table format and
uploads to cloud
- Robot completes the analysis and reports results
|
The human analyst has the potential for major insight and that can
lead to superior results. Let's postulate that a robotic analyst can
be built. Then there are incredible advantages to this form of
processing data. Among them are the following
- Data-mining knowledge. Our experiences, those of our colleagues,
and results from the scientific literature are captured in the design
of the robot.
- Computational power. With multiple processors, a comprehensive predictive
analysis is much speedier.
- Dynamic knowledge acquisition. The best way to proceed is not
necessarily known in advance. With multiple processors, experiments
can be performed in parallel, and decisions made based on their
intermediate results.
- Big data. Standard formulations of predictive learning methods may
not be suitable for processing large volumes of data. With multiple
processors in the cloud, data can be "mapped" into multiple
samples and many partial solutions reduced to a unified solution.
- Compatibility with emerging technologies. Technology is moving to
the cloud and society is expecting increasing automation and
productivity. It's incomprehensible that someone other than an expert
analyst will play on a lonely computer, access software with all kinds
esoteric parameters, and sequentially do experiments.
- Gestalt perspective. A predictive analysis that tries many methods
and integrates their results is likely to achieve more informative and
superior results.
Getting the data ready for predictive analysis
Who does this? The traditional view is that the data scientist gathers
the data, gathers some knowledge of the domain, possibly interviews the
domain experts, and then proceeds to organize the data in a form
suitable for specific predictive methods.
We chose an alternate path. The actual mathematical formulations and
machine learning methods might be obscure to a client, yet the client
is far more familiar with the data. Following a few simple rules, data
can be prepared for analysis. You the client often have insight into
the meaning of the data, and can organize data in a form that lends
itself to more favorable predictive results. If we separate the task
of data preparation from the actual analysis, the data owner has the
advantage in preparation.