Insights from FirstMark’s Data Driven series, a monthly event covering Big Data and data-driven products and startups.
Good news for aspiring data scientists – you can ditch the prerequisite schooling for stats, programming and algorithms.
Or, at least, pick up those skills after you’ve become a practicing data scientist. That’s the vision DataRobot CEO Jeremy Achin has for the future of data science education. It’s a future where technology eats curriculum.
Big Data is now officially a big part of the business world, requiring leaders across all sectors to explore the implications of analyzing large sets of data. This reality is creating an unprecedented and somewhat alarming demand for practitioners of a scarce combination of skills.
McKinsey has produced the most widely cited numbers on the shortage of data scientists, saying that by 2018, the U.S. alone may face up to a 60% gap between supply and requisite demand of deep analytic talent. The study predicts a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts capable of turning the study of big data into decisions that benefit the business.
The Data Science Curriculum
As the CEO of DataRobot, which provides a predictive analytics platform to rapidly build and deploy predictive data models, Achin’s company stands to be impacted by the glut of data scientists. The solution, he says, is a combination of pragmatic education and levels of automation currently not thought possible.
Achin points to the popular definition of data science created by Drew Conway, which buckets the necessary skills into programming, math and statistics, and domain knowledge.
Programming skills include the ability to source, manipulate and explore data, as well as build and implement models. Math and statistics includes a foundational understanding of statistics, internals of algorithms and some practical knowledge and experience. And, domain knowledge assures that the individual understands the business and the data.
A 2013 report by Accenture takes the definition a bit further, stating that individuals must “master advanced statistical and quantitative methods and tools, along with the new computing environments, languages and techniques for managing and integrating large data sets. Data scientists must also possess industry knowledge and business acumen to create models and solve real-world problems. And they need excellent communication and data visualization abilities in order to explain their models and findings to others.”
It’s a tall order.
Swami Chandrasekaran, Executive Architect at IBM Watson, wrote a popular post on the long road to becoming a data scientist, including a graphic that illustrates well just how messy that journey can be.
Today when an aspiring data scientists starts their path they are required to learn statistics, programming and algorithms before developing any practical knowledge or gaining real world experience. Some students won’t make it through the stats class. Another group will struggle with programming. More will abandon their plan when they’re tasked with building models.
“By the time they get to the point where they start to actually apply some of what they learn, you’ve lost a lot of the students,” Achin said.
When the Path Starts at the Practical Level
It takes a long time before all of that knowledge can be put toward a real world application, Achin said. But, he believes automation using modern tools and computational power will take care of the statistics, programming and algorithms, enabling students to begin their education at the practical knowledge step.
“It doesn’t mean that statistics and programming and algorithms are not valuable, but it can happen afterwards,” he said. “You can become immediately useful relying on some of the more modern techniques.”
Similar sentiment has echoed from the halls of Cambridge, home of the Automatic Statistician, a project backed by a $750,000 grant from Google that aims to reduce the skills necessary to practice data science. According to a release announcing the gift, the project explores an open-ended space of possible statistical models to discover a good explanation of the data, and then produces a detailed report with figures and natural-language text. The Cambridge group has developed an early version of this system that not only automatically produces a 10-15 page report describing patterns discovered in the data, but returns a statistical model with state-of-the-art extrapolation performance.
When Technology Eats Curriculum, We Gain Data Scientists
Just as Achin suggests, the continued advancement of technology that can reduce the rigor of extracting value from data will only make the profession of data science more accessible. The hope is that innovative technology will eradicate the need for those specialized skills, giving a broader set of people in an organization the ability perform the tasks generally assigned to a data scientist.
Data Driven
A Cloud SQL Database Built for Survival with Cockroach’s Spencer Kimball
11.08.22 / Spencer Kimball, Cockroach Labs
Data Driven
Leveraging People Data at Scale with ADP’s Jack Berkowitz
11.08.22 / Jack Berkowitz, ADP
Data Driven
Fundamentals of Data Engineering with Co-Authors Joe Reis & Matt Housley
10.11.22 / Joe Reis & Matt Housley, Authors of Fundamentals of Data Engineering
Data Driven
Behavioral Data Creation for AI with Snowplow’s Alex Dean
10.11.22 / Alex Dean, Snowplow
Data Driven
Separating Data Hype From Substance with Mode’s Benn Stancil
09.21.22 / Benn Stancil, Mode
Data Driven
Modern Data Orchestration with Astronomer Co-Founders Pete DeJoy & Viraj Parekh
09.21.22 / Viraj Parekh & Pete DeJoy, Astronomer
Data Driven
Automated Data Discovery with Select Star’s Shinji Kim
07.13.22 / Shinji Kim, Select Star
Data Driven
A Novel Approach to Data Quality for the Modern Data Stack with Datafold’s Gleb Mezhanskiy
07.13.22 / Gleb Mezhanskiy, Datafold
Data Driven
Harnessing AI to Make Video Creation a Breeze with Runway’s Cris Valenzuela
07.13.22 / Cris Valenzuela, Runway