It's become a common question on how to become a data analyst. I don't think there's one roadway or path to follow to data analytics but I'll try to share some things I wish I had known earlier.
Always remember that grass is greener where it is watered and no matter how rosy or fancy the roadmap without a commitment you just collect certificates with no value added to you.
With so much being said already, let us skip to the good part.
Where Do I start?
This is very common, especially for people coming from a non-technical background. The data field is very wide and one has to identify their niche before being overwhelmed by resources.
Some of the major titles you'll hear around are:
Data Analyst:
They mainly collect, clean and analyze data. they identify trends and support decision-making processes in an organization.
Their work mainly entails using structured and unstructured data, performing analysis, creating visualizations and generating reports.
Tools normally used include Excel, SQL, Python/R and data visualization software.
They specialize as Business Data Analysts, Financial Data Analysts, healthcare Data analysts, Marketing Data Analyst, Data Quality Analyst, Social Media Daa Analyst, Web Data Analyst, Retail Data Analysts, etc.
Data Scientist:
They are experts in data analysis, machine learning and statistical modeling.
They usually design and implement machine learning models, conduct data experiments and develop algorithms
Tools normally used include: python and R
Business Intelligence(BI) Analyst:
Focuses on transforming data into actionable insights.
they create dashboards, reports and data visualizations. Transform data into business-friendly insights.
Tools used: Tableau, Power BI, QlikView.
Data Engineer:
They build and maintain the infrastructure and pipelines necessary to collect, store and preprocess data.
Data Engineers develop ETL(Extract, Transform, Load) processes, design databases and manage data warehousing systems.
Tools used: Apache Hadoop, Apache Spark, SQL Server.
Machine Learning Engineer:
Usually focuses on deploying machine learning models into production systems.
They are the intermediary between data science and software development.
Build efficient machine learning pipelines, develop APIS for deployment and optimize models.
Database Administrator:
responsible for managing and maintaining databases ensuring integrity, security and performance.
usually work with database management systems.
Data Architect:
Design the overall structure and organization of data within an organization.
They create models, design data warehouses and establish data governance policies.
Statistician:
Statisticians apply statistical methods to design experiments, collect data, and interpret results for various fields, including academia, government, and industry.
Statisticians design surveys, conduct hypothesis testing, and analyze data to provide insights and make data-driven recommendations.
Quantitative Analyst:
Quants analyze financial data, build risk models, and develop trading algorithms. They require strong quantitative skills and often use programming languages like Python and R.
Analytical Engineer:
Combines engineering expertise with analytical skills to solve complex problems, optimize processes, and make data-driven decisions within engineering and technical domains. This role bridges the gap between traditional engineering disciplines and data science.
I bet there's a lot for all of us and at least one can have a path from here.
Where Do I Get Resources?
Some of the fundamental skills include :
Excel
This usually performs a very good basis for understanding data, data cleaning and visualizations.
Resources:
DataCamp
Coursera
YouTube
Databases
Most commonly SQL which is a relational database.
Get to understand this as it is basically like the backbone.
Resources
- Youtube:
Interactive Online Platforms:
SQLZoo
LeetCode
SQLBolt
Programming languages
Not particularly a must for your first role.
Includes Python and R.
I'd be a bit biased since am an R user. R is usually a better fit for one interested in Statistical analysis, data visualization, academics or research work.
Python on the other hand has the upper hand since it is a general programming language and can be used for much more than just analysis. It is also the dominant language for machine learning and AI. Data engineering that is ETL and is also common for building pipelines.
Both have a broad community.
Resources
DataCamp or DataQuest
Statistics and Probability
We are aiming to create insights and them being accurate is such a huge step in making data-driven decisions.
Resources
Statistics and Machine Learning in R
Data visualization tools
Learn either Tableau or Power BI.
Resources
Power BI Doumentations
Tableau Public
Maven Data Analytics.
Coursera:
Microsoft Power BI Data Analyst
Tableau Business Intelligence Analyst
Where do I get resources?
DataCamp or DataQuest
Coursera has several professional certificates you can enroll in
LinkedIn Learning
Youtube
CS50 courses
Udacity
W3 Schools
Books
Then What next?
Do not be stuck in a tutorial loop. It is normal to feel like you don't know much yet but the more projects you build the better you become at your craft.
Remember the law of use and disuse.
Join communities and build your network. At least out of the 8 billion people, you'll get at least one person to walk with.
Be consistent in your journey. It feels easier to stay in a comfort zone but push yourself and have people who push you.
Do not be pressured by other people's progress but use it as a guideline to where you'd want to get.
Data Sources
Google Dataset Search https://datasetsearch.research.google.com/
Maven Analytics https://www.mavenanalytics.io/data-playground
Tableau Dataset https://public.tableau.com/app/resources/sample-data
Kaggle Dataset https://www.kaggle.com/datasets
Datahub https://datahub.io/collections
Data world
https://data.world/datasets/free
WHO https://www.who.int/data/sets
Data Gov
Makeover Monday
https://www.makeovermonday.co.uk/data/
Data DNA
https://onyxdata.co.uk/data-dna-dataset-challenge/datadna-dataset-archive/
Other resources
https://dev.to/k_ndrick/data-science-for-beginners-2023-2024-complete-roadmap-23dn
https://dev.to/dkkinyua/data-science-roadmap-2023-2024-for-beginners-3n