Roadmap To Data Analysis

It's become a common question on how to become a data analyst. I don't think there's one roadway or path to follow to data analytics but I'll try to share some things I wish I had known earlier.

Always remember that grass is greener where it is watered and no matter how rosy or fancy the roadmap without a commitment you just collect certificates with no value added to you.

With so much being said already, let us skip to the good part.

Where Do I start?

This is very common, especially for people coming from a non-technical background. The data field is very wide and one has to identify their niche before being overwhelmed by resources.

Some of the major titles you'll hear around are:

  1. Data Analyst:

    They mainly collect, clean and analyze data. they identify trends and support decision-making processes in an organization.

    Their work mainly entails using structured and unstructured data, performing analysis, creating visualizations and generating reports.

    Tools normally used include Excel, SQL, Python/R and data visualization software.

    They specialize as Business Data Analysts, Financial Data Analysts, healthcare Data analysts, Marketing Data Analyst, Data Quality Analyst, Social Media Daa Analyst, Web Data Analyst, Retail Data Analysts, etc.

  2. Data Scientist:

    They are experts in data analysis, machine learning and statistical modeling.

    They usually design and implement machine learning models, conduct data experiments and develop algorithms

    Tools normally used include: python and R

  3. Business Intelligence(BI) Analyst:

    Focuses on transforming data into actionable insights.

    they create dashboards, reports and data visualizations. Transform data into business-friendly insights.

    Tools used: Tableau, Power BI, QlikView.

  4. Data Engineer:

    They build and maintain the infrastructure and pipelines necessary to collect, store and preprocess data.

    Data Engineers develop ETL(Extract, Transform, Load) processes, design databases and manage data warehousing systems.

    Tools used: Apache Hadoop, Apache Spark, SQL Server.

  5. Machine Learning Engineer:

    Usually focuses on deploying machine learning models into production systems.

    They are the intermediary between data science and software development.

    Build efficient machine learning pipelines, develop APIS for deployment and optimize models.

  6. Database Administrator:

    responsible for managing and maintaining databases ensuring integrity, security and performance.

    usually work with database management systems.

  7. Data Architect:

    Design the overall structure and organization of data within an organization.

    They create models, design data warehouses and establish data governance policies.

  8. Statistician:

    Statisticians apply statistical methods to design experiments, collect data, and interpret results for various fields, including academia, government, and industry.

    Statisticians design surveys, conduct hypothesis testing, and analyze data to provide insights and make data-driven recommendations.

  9. Quantitative Analyst:

    Quants analyze financial data, build risk models, and develop trading algorithms. They require strong quantitative skills and often use programming languages like Python and R.

  10. Analytical Engineer:

    Combines engineering expertise with analytical skills to solve complex problems, optimize processes, and make data-driven decisions within engineering and technical domains. This role bridges the gap between traditional engineering disciplines and data science.

I bet there's a lot for all of us and at least one can have a path from here.

Where Do I Get Resources?

Some of the fundamental skills include :

Excel

This usually performs a very good basis for understanding data, data cleaning and visualizations.

Resources:

DataCamp

Coursera

YouTube

Datakliq

Idris Alugo

Databases

Most commonly SQL which is a relational database.

Get to understand this as it is basically like the backbone.

Resources

  1. Youtube:

Programming with Mosh

Alex the Analyst

Kevin Stratvert

CS50 SQL

  1. Interactive Online Platforms:

    • SQLZoo

    • LeetCode

    • SQLBolt

Programming languages

Not particularly a must for your first role.

Includes Python and R.

I'd be a bit biased since am an R user. R is usually a better fit for one interested in Statistical analysis, data visualization, academics or research work.

Python on the other hand has the upper hand since it is a general programming language and can be used for much more than just analysis. It is also the dominant language for machine learning and AI. Data engineering that is ETL and is also common for building pipelines.

Both have a broad community.

Resources

DataCamp or DataQuest

John Hopkins R

R programming 101

R Book

Python for Everybody

Cisco Python

Statistics and Probability

We are aiming to create insights and them being accurate is such a huge step in making data-driven decisions.

Resources

Harvard Stat 110

Statistics

Statistics and Machine Learning in R

Data visualization tools

Learn either Tableau or Power BI.

Resources

Power BI Doumentations

Tableau Public

Maven Data Analytics.

Coursera:

Microsoft Power BI Data Analyst

Tableau Business Intelligence Analyst

Where do I get resources?

  1. DataCamp or DataQuest

  2. Coursera has several professional certificates you can enroll in

  3. LinkedIn Learning

  4. Youtube

  5. CS50 courses

  6. Udacity

  7. W3 Schools

  8. Books

Then What next?

Do not be stuck in a tutorial loop. It is normal to feel like you don't know much yet but the more projects you build the better you become at your craft.

Remember the law of use and disuse.

Join communities and build your network. At least out of the 8 billion people, you'll get at least one person to walk with.

Be consistent in your journey. It feels easier to stay in a comfort zone but push yourself and have people who push you.

Do not be pressured by other people's progress but use it as a guideline to where you'd want to get.

Data Sources

Google Dataset Search https://datasetsearch.research.google.com/

Maven Analytics https://www.mavenanalytics.io/data-playground

Tableau Dataset https://public.tableau.com/app/resources/sample-data

Kaggle Dataset https://www.kaggle.com/datasets

Datahub https://datahub.io/collections

Data world

https://data.world/datasets/free

WHO https://www.who.int/data/sets

Data Gov

https://data.gov/

Makeover Monday

https://www.makeovermonday.co.uk/data/

Data DNA

https://onyxdata.co.uk/data-dna-dataset-challenge/datadna-dataset-archive/

Other resources

https://dev.to/k_ndrick/data-science-for-beginners-2023-2024-complete-roadmap-23dn

https://dev.to/dkkinyua/data-science-roadmap-2023-2024-for-beginners-3n

https://www.linkedin.com/posts/datasciencereality_harvard-university-is-offering-free-world-activity-7096480758038487040-0UbT?utm_source=share&utm_medium=member_android

https://www.linkedin.com/posts/iamarifalam_freecertification-linkedinforcreators-linkedin-activity-7105041308573954048-Q7Az?utm_source=share&utm_medium=member_android