Introduction
Definition: Process of allowing computers to learn from data without being explicit programming.
It encompasses diverse applications from image recognition to fraud detection.
Application of ML
Predictive Modelling
Natural language processing
Automatic vehicles
Health Care: predictive analysis of diseases, drug discovery, personalized medical care, Imaging interpretation.
Types of Machine Learning
Supervised Learning:
Requires training data with independent variables and a dependent variable.
Needs labelled data.
Includes: Regression model, classification model
Example: Data that contains animals have labels for example cats, dogs, elephants, etc. The model learns from the labelled data and can later make predictions on unlabelled data.
Unsupervised Learning
Learns from unlabelled data.
Requires training data with independent variables only.
No need for labelled data that can supervise the algorithm when learning from the data
Includes: Clustering; outlier detection
Example: The model learns from unlabelled data and can classify similar classes in one category like sorting.
Semi-supervised Learning
Combines labelled and unlabelled data.
The algorithm learns from a small amount of labelled data and a large amount of unlabelled
Example: We have some data where we have animals some are labelled lets say cats and dogs the model on the other hand classifies the rest of the animals.
Reinforcement Learning
An agent learns to make decisions by receiving feedback from its environment.
Core Concepts
Algorithm: step-by-step instructions for computers to learn and make decisions from data
Datasets: a collection of information used by computers to learn and make predictions
Training: Involves teaching a machine learning algorithm by exposing it to data and adjusting its parameters.
Supervised Learning
Involves training models on labelled datasets where the model learns to map input features to corresponding outputs.
There are 2 main categories:
Classification Supervised: Deals with predicting categorical target variables.
E.g. classifying emails as spam
Common algorithms: logistic regression, Support vector machine, Random Forest, Decision Tree, K-nearest neighbour, Naive Bayes
Regression: Involves predicting continuous target variables.
E.g. forecasting sales
common algorithm: linear, Polynomial, Ridge regression,
Machine Learning Lifecycle
Import data: Load the data that consists of the target variable(dependent variable and the features(input)
Clean data: This is the preprocessing phase which involves handling missing data, feature scaling and encoding categorical variables.
split the data into training/test sets: Can either have separate data for testing or split a small percentage of the data provided to be used for testing.
Create a model: Choose an appropriate machine learning algorithm based on the problem at hand and the characteristics of the dataset.
Train the model
Make predictions
Evaluate: evaluate the performance of the trained model using appropriate evaluation metrics. Common metrics include accuracy, precision, recall, F1-score (for classification), and mean squared error, R-squared (for regression).
Use cross-validation techniques (e.g., k-fold cross-validation) to obtain more reliable estimates of the model's performance and ensure it generalizes well to unseen data.
Continuous improvement
Model Deployment:
- Once the model has been trained and evaluated satisfactorily, it can be deployed to make predictions on new, unseen data.
Libraries
Numpy
Pandas
Matplotlib
Scikit-Learn
Conclusion
In conclusion, machine learning represents a transformative approach to data analysis and decision-making, with applications spanning across diverse domains such as healthcare, finance, autonomous vehicles, and more.
By allowing computers to learn from data without explicit programming, machine learning enables the extraction of valuable insights, the prediction of future trends, and the automation of complex tasks.
Further Reading
Machine Learning Tutorial Python -1: What is Machine Learning?