Skip to main content
Harvard Business Review once quipped that the Data Scientist is the sexiest job of the 21st century. Data scientists combine programming, statistical knowledge and domain expertise to uncover insights (driving revenue, reducing costs, optimization, etc).
Having been in the trenches as a data professional and leader, I can speak candidly about what to expect.

Data Cleansing

Data scientists spend 80% of their time data cleansing. That means identifying data sources, pulling data from the database de jour, eliminating records, excluding extraneous data, creating lookups, formatting and massaging the data so it can be aggregated and rendered. Then you need to validate your results with another system or with the business.
Why the gauntlet of tasks? Poor data quality and lack of concrete business metadata are caused by a lack of strong data governance practices.

You end up being a Data Analyst, not a Data Scientist

 One of the biggest complaints that data scientists bemoan is that data scientists are often treated like data analysts. Companies hire overqualified talent and instead of DS work, they give them low-brow data analysis or spreadsheet tasks. This is the typical life of a data professional:
  1. Get a Data Request from a stakeholder or business analyst. Emails, meetings and IM. Most likely a lot of back and forth on the scope and requirements.
  2.  Pull the data from the database
  3. Cleanse data and reconcile data to validate results
  4.  Summarize and Aggregate the Data
  5.  Generate a report, spreadsheet or dashboard
  6. Create a deck to present findings (optional)

You will most likely not work on AI or Machine Learning models

AI/ML is used in specialized situations – unless you work with a company that is technologically forward or the job specification is an AI/ML role.
Keep in mind that you cannot train or build a useful model if you don’t have enough data or variables. Chances are you are going to be using good old exploratory data analysis.

Meetings

Expect a lot of meetings for requirements, project management, follow-up, conflict resolution, and team meetings. Most meetings are a waste of time.

Documentation Hell

You will do a lot of documentation in code, building requirement documents and technical documentation. These are typically uploaded to knowledge management systems (KMS) like Sharepoint or Confluence. You may have to deal with incident tracking or triaging with tools like Jira. General documentation is not well organized when it is shared. It could end up on a shared drive, in an attachment, in a KMS, or just gone (people who left the company did not bother to share it).

Communication

You will need to master the ability to communicate with people concisely over email and verbally when you are providing updates or addressing requirements. Miscommunications happen more often than you think. Be sure to document assumptions.

Many hats

You will have to play many roles besides writing elegant code and coming up with striking insights. You will be a data scientist, business analyst, project manager, and software engineer rolled up into one.

Requestors don’t know what they want

Requestors will ask you to define the business logic rather than doing it themselves.

You will be asked to give an extensive dataset, and they will play with it in excel. Your role as DS is to intercede and auger in on the requestors’ exact requirements and craft a solution for them. Try to avoid making assumptions and put yourself in the mind of the requestor.

Unrealistic expectations

You will likely be pressured into tight deadlines. A task will sit on the desk of the requestor for a week before it hits your inbox and you will be requested to have it done in a short period of time.

Be realistic and push back on requests like this. Break down the time it will require and make it clear that there is a tradeoff between expediency and quality.

Mundane Tasks

When you walk into your DS job, you might think you will be working on a cutting-edge project pushing your mind to the limit. The truth is most of the work is mundane: reusing and modifying old scripts, writing routine code and building workarounds.

Your solution will likely be shelved

 I have seen most reports and processes shelved for another solution or new technology. Sadly a lot of the work is archived, hardly used and mostly rebuilt.

Your skills will atrophy

When you are in school, you have the luxury of learning a wide variety of concepts regarding data science, from data analysis to a swiss army knife of models. At work, you will generally be locked down to a specific tech stack: cloud provider, a programming tool/language and defined duties, particularly in the larger companies. After a few years, your skills are going to start going stale. That is why you must be constantly learning outside of your job. Most companies will not provide you with extra training.

You will need to sell your ideas

You might think that your job is to come up with the best solution to the problem and move on. In fact, you will need to sell your ideas internally to get buy-in from other groups. This can be a political process, and you might have to present your work multiple times in different ways before it is accepted.
Sheeraz

Author Sheeraz

More posts by Sheeraz