The Sexiest Job of the 21st Century, "Data Scientist": Myth or Reality?
What does this term mean to you when you hear the word "data scientist?"
Is it, as the Harvard Business Review suggested, the "set work" of the 21st century? Is it identifying a really smart person with specialized computer science degrees, practical math, mathematics, economics? Anyone who analyzes and derives profit from big data for business?
All these kinds of stuff and more can be a data scientist. This group of professionals look for patterns and trends in large data sets using a wide range of tools, methods and critical thinking to provide practical solutions to information-centred real-life challenges.
"Web studies and other approaches are used by data scientists to achieve sustainable growth," says Hugo Bowne Andersen in HBR. We even cleanse, plan, test structured or non-structured data for machine learning pipeline and personal data services to interpret and make better decisions regarding their companies and clients.
Even though you did not take advanced analytics or data science to class, knowing the data scientist’s method may also render the start-up more comprehensible:
1. Data scientists ask good questions
Any data science initiative must follow a range of standards for performance, goals, outcomes, length, etc. And, as Mr Le pointed out in the medium article "How to think in 12 steps like a data scientist?" it is useful to ask good questions in order to more fully understand a person's expectations.
"Good questions are concrete in their assumptions and good answers without too much cost represent a measurable success," Le wrote. In any business situation, enhancing your skills in asking good questions is useful. If you are on a journey to become more data-driven, it can help your early startup. At TowardsDataScience.com Mark Schindler discussed how "creating the landscape for a question" could be useful for developing a data strategy. He suggested placing your questions into three categories:
What questions could you answer right now?
For example, “How many downloads did you have in the past 30 days?”
What questions could you answer if you did a little digging with your current data?
For example, “What are the age demographics of your most frequent users in the past 30 days?”
What questions can’t you answer because you don’t have the data yet?
For example, “What is the average session length of your top and bottom quartile of users?”
This helpful exercise will help you to define your company and information and can lead you in the path (a chart of data) of new questions or theories, which are not yet confirmed or well established and you would like to investigate.
2. Data scientists know the classification and quality of data sources.
A 28-pages white paper, Think Like a Data Scientist, a workbook was created by Bill Schmarzo, CTO of Dell ETM Big Data. In this report, he analyzed the method of data scientists utilizing predictive or prescriptive analytics to find the answers to their goals. I especially appreciated the section called "Data sources Identify," which explains how a reader will find all kinds of new sources of data during the eight-step workbook exercise that would provide value in respect of a specific company initiative (increasing sales, sales, web traffic and conversion), as well as key business decisions. According to the white paper, the various sources of data are:
Historical operational and transaction systems data (ERP, financials, HR, supply chain, sales force automation and marketing, for which data is captured, but likely not available on readily accessible platforms.
Internal unstructured data sources like email conversations, consumer comments, clinical studies, research papers and notes from employee and customer interactions.
External data sources, including social media, newsfeeds, weather, traffic, economics, research papers, white papers and public domain data from government and college institutions. (like the Think Like a Scientist workbook).
Once you have identified a variety of data sources, the next step is to assess the business value that each source brings with respect to supporting certain key business decisions. You can set up a spreadsheet and plot the data sources as row headers in the first vertical column, then plot the key business decisions as horizontal column headers in the first row. In the example with Chipotle, some of the business decisions were:
Increasing store traffic
Increasing shopping bag revenue
Increasing promotional effectiveness
You can do this exercise yourself by putting in business use case questions relevant to your industry and startup.