Data Driven series, a monthly event covering Big Data and data-dri..." />

Black Boxes and Unicorns: How Data Science is Automating Itself

Event Date Nov 23, 2015 Speaker Jeremy Achin

Insights from FirstMark’s Data Driven series, a monthly event covering Big Data and data-driven products and startups.

Good news for aspiring data scientists – you can ditch the prerequisite schooling for stats, programming and algorithms.

Or, at least, pick up those skills after you’ve become a practicing data scientist. That’s the vision DataRobot CEO Jeremy Achin has for the future of data science education. It’s a future where technology eats curriculum.

Big Data is now officially a big part of the business world, requiring leaders across all sectors to explore the implications of analyzing large sets of data. This reality is creating an unprecedented and somewhat alarming demand for practitioners of a scarce combination of skills.

McKinsey has produced the most widely cited numbers on the shortage of data scientists, saying that by 2018, the U.S. alone may face up to a 60% gap between supply and requisite demand of deep analytic talent. The study predicts a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts capable of turning the study of big data into decisions that benefit the business.

The Data Science Curriculum

As the CEO of DataRobot, which provides a predictive analytics platform to rapidly build and deploy predictive data models, Achin’s company stands to be impacted by the glut of data scientists. The solution, he says, is a combination of pragmatic education and levels of automation currently not thought possible.

Achin points to the popular definition of data science created by Drew Conway, which buckets the necessary skills into programming, math and statistics, and domain knowledge.

Programming skills include the ability to source, manipulate and explore data, as well as build and implement models. Math and statistics includes a foundational understanding of statistics, internals of algorithms and some practical knowledge and experience. And, domain knowledge assures that the individual understands the business and the data.

A 2013 report by Accenture takes the definition a bit further, stating that individuals must “master advanced statistical and quantitative methods and tools, along with the new computing environments, languages and techniques for managing and integrating large data sets. Data scientists must also possess industry knowledge and business acumen to create models and solve real-world problems. And they need excellent communication and data visualization abilities in order to explain their models and findings to others.”

It’s a tall order.

Swami Chandrasekaran, Executive Architect at IBM Watson, wrote a popular post on the long road to becoming a data scientist, including a graphic that illustrates well just how messy that journey can be.

Today when an aspiring data scientists starts their path they are required to learn statistics, programming and algorithms before developing any practical knowledge or gaining real world experience. Some students won’t make it through the stats class. Another group will struggle with programming. More will abandon their plan when they’re tasked with building models.

“By the time they get to the point where they start to actually apply some of what they learn, you’ve lost a lot of the students,” Achin said.

When the Path Starts at the Practical Level

It takes a long time before all of that knowledge can be put toward a real world application, Achin said. But, he believes automation using modern tools and computational power will take care of the statistics, programming and algorithms, enabling students to begin their education at the practical knowledge step.

“It doesn’t mean that statistics and programming and algorithms are not valuable, but it can happen afterwards,” he said. “You can become immediately useful relying on some of the more modern techniques.”

Similar sentiment has echoed from the halls of Cambridge, home of the Automatic Statistician, a project backed by a $750,000 grant from Google that aims to reduce the skills necessary to practice data science. According to a release announcing the gift, the project explores an open-ended space of possible statistical models to discover a good explanation of the data, and then produces a detailed report with figures and natural-language text. The Cambridge group has developed an early version of this system that not only automatically produces a 10-15 page report describing patterns discovered in the data, but returns a statistical model with state-of-the-art extrapolation performance.

When Technology Eats Curriculum, We Gain Data Scientists

Just as Achin suggests, the continued advancement of technology that can reduce the rigor of extracting value from data will only make the profession of data science more accessible. The hope is that innovative technology will eradicate the need for those specialized skills, giving a broader set of people in an organization the ability perform the tasks generally assigned to a data scientist.