Chapter 1 | A Complex Web of Concepts
I’ve been on my Machine Learning & Deep Learning study journey for a while now, and in this time I have utilized a wide variety of resources in order to try to build a strong foundation in the fundamentals which I hope will pay off in my future learning experiences.
When undertaking study in any novel or unfamiliar discipline, there will be some parts you’re already familiar with, but the vast majority of what you are learning will be new and complex. You may also notice that some things just make sense to you and you pick them up right away, and some other things you will find really difficult and will need to spend an extensive amount of time on it just to understand it. While you are on this meandering learning journey, something you might realize is that every now and again you learn something super foundational, the consequences of which ripple out and inform your knowledge and widen your perspective on some of the surrounding topics you’ve also been learning. I like to call these things ‘breakthroughs’ as they are very significant in regard to the overall learning journey and you really get the feeling that you can view the whole field you’re learning about with a bit more confidence and from a bit of a higher spot on the hill.
This blog post will explore in detail one of the breakthroughs I had, which allowed some other surrounding and peripheral ideas and concepts I’d been learning about to begin to click. More specifically, this breakthrough occured after I had internalized and fully understood the hierachical structure and overlapping landscape of the Data Science, Artificial Intelligence, Machine Learning, and Deep Learning fields. To the layman, a lot of these fields/terms get substituted for one another and misused by people, and I don’t blame them, It certainly isn’t the easiest thing in the world to understand, but I’m hoping at the end of this blog post you too will have experienced the breakthrough I did as well, and you can garner an intuition for how everything fits together.
Chapter 2 | Starting at the Beginning
2.1 | Data Science
It’s impossible to even begin to understand the complexities involved in Machine Learning & Deep Learning before we understand what Data Science is and why it has and continues to be such a crucial and prominent discipline. At its core, Data Science works on one primary assumption. That data, in the most general sense, when manipulated in some particular way, can output some information. Whether the outputted information is easy to understand, or incredibly complex, whether it takes years to extract or mere seconds, or whether it’s ultimately useful to us or not are all second order factors which the human can influence based on their expertise, technical skills, theoretical knowledge and practical implementation.
It’s worth noting that the term “Data Science” was first coined in 1974 by Peter Naur, and it was only in 1996 that the International Federation of Classification Societies officially categorized the term as its own distinct discipline. Before 1974, the term “Statistician” was primarily used to refer to people involved in analyzing data to uncover patterns, relationships, and trends, generally using more manual methods or simpler computational tools to achieve this.
All throughout history, people have had the idea to apply what they knew about mathematics to real world observations they experienced. However, the bottle-neck for almost all of history was computation. Isaac Newton in 1665 invented Calculus, and a mind which was capable of pioneering such an important and consequential area of mathematics was also capable of thinking about the different ways in which it could be applied to the world. It’s not that data didn’t exist before, but rather, the means to process and analyze it were severely limited. With the advent of modern computing power and the development of software libraries and algorithms, particularly in the realm of Data Science, it allowed the people who had the theoretical insight of how to implement these things, to actually implement it in a practical and scalable manner. We now have the capability to extract insights from data on an unprecedented scale.
We will dive into the specifics about the different types of data which can be processed, the most valuable information we can extract from the data, the algorithms which are responsible for extracting this information, and finally which software or software libraries are mainly used in industry to actually achieve this, all in the Machine Learning & Deep Learning sections. For now, just interalize the idea that data is all around us, it can literally be found everywhere, and that Data Science is the most broad or general term which refers to the concept of gathering some data, applying some manipulation to it, in order to extract some information from the data, which ideally is either useful to yourself or someone else. This basic idea, in essence, creates the foundations for Machine Learning and Deep Learning which takes this concept to its theoretical limits.
2.2 | Artificial Intelligence
Before we jump into the deep end, we first need to discuss Artificial Intelligence (AI). Artificial Intelligence can be thought of as an umbrella term which pretty much refers to any computational system which has been designed in such a way to complete tasks that which were originally within the purview of human intelligence. Artifical Intelligence is generally broken down into 4 sub-components:
1. Reasoning
Reasoning refers to the logical deductions and inductions made by an AI system. This includes abilities like:
- Making inferences or predictions based on available data
- Determining causality between events
- Mathematical proof and logic, e.g. propositional logic
Reasoning enables AI systems to expand their understanding through logical thinking.
2. Knowledge Representation
Knowledge representation refers to how data and facts are encoded and stored for an AI to utilize. This includes knowledge models like:
- Ontologies that categorize concepts in a domain
- Semantic networks modeling relationships between entities
- Rules that encode expert knowledge or policies
The choice of representation impacts what an AI system can infer and derive from its knowledge.
3. Search & Planning
Search & planning refers to strategies for exploring options and possible decisions to arrive at solutions. Key techniques here include:
- Heuristic search for identifying solutions or paths
- Optimization algorithms that maximize objective functions
- Scheduling and prioritizing actions
Search enables identification of solutions, while planning translates solutions into executable sequences of actions.
4. Learning
Learning refers to the ability of AI systems to automatically improve at tasks through data, without being explicitly programmed for that task. Types of learning include:
- Machine Learning algorithms that learn from data
- Deep Learning Neural Networks that learn from large data sets
Learning fuels advancements in areas like computer vision, language processing, and prediction, whether inductive, deductive, or abductive. We will be exploring the aforementioned subfields in greater detail in the following section.
2.3 | Machine Learning
2.3.1 Introduction to Machine Learning
Machine Learning as discussed previously is a subfield of Learning, which itself is a subfield of Artificial Intelligence.
So what exactly is Machine Learning? Machine Learning refers to the application of computationally boosted algorithms to any type of data, in order to obtain some output which contains some information about the data, which ultimately should be useful to either yourself or someone else.
And the good news is that there just aren’t an infinite amount of “useful” or “helpful” potential outputs, and over time we have honed in on some of the most important and consequential ones. They are as follows:
Classification: Classification refers to categorizing input data points into specific classes or groups based on learned distinguishing characteristics. For example, labeling images with object categories like “cat”, “dog”, etc.
Regression: Regression outputs a continuous numerical value predicting a quantity based on input data. For instance, predicting house prices or weather temperatures using historical data.
Clustering: Clustering groups input data points by similarity, without reliance on pre-existing labels. For example, segmenting customers based on common attributes.
Ranking: Ranking sorts input data points relative to one another, generally based on some relevance score. Ranking is common in search engines and product recommendations.
Anomaly Detection: Anomaly Detection identifies abnormal input data instances that are statistical outliers based on the majority distribution. Useful for catching credit card fraud.
Density Estimation: Density estimation fits a probability distribution to model and summarize properties of the input data for statistical analysis and sampling.
The algorithms actually used in industry in order to process and transform raw data into the aforementioned outputs are complex and plentiful. The actual algorithms themselves have their basis derivied from mathematical fields such as Statistics, Calculus, and Linear Algebra, and the way in which they are practically implemented is through code.
Fortunately, we don’t need to reinvent the wheel every time we are trying to process some form of data. Over time, the best and most efficient algorithms and data processing solutions have persisted, and inefficient, sub-optimal and needlessly complicated solutions have been weeded out. Fantastic Python libraries such as Sci-Kit Learn and PyTorch provide all the best and most common data processing algorithms and solutions in an intuitive, non-verbose and easy to understand way. Although, you can undertake additional study into the nitty-gritty details of the real foundational mathematics that is taking place within these algorithms, if you understand what they are doing on a high level, it is possible to effectively utilize them in your own personal projects using real life data you’ve collected.
Additionally, it’s possible and maybe even probable that some of the aforementioned fields and algorithmic types might frighten you, but the good news is that unless you are intending on becoming a career academic or a high-esteemed researcher in the field, you will realize pretty quickly that you don’t need to know everything in order to become competent in whichever subfield you choose, and through practising actually implementing these systems in your code, you will learn the bits which are most important, and only to the extent which is practically required.
2.3.2 Types of Machine Learning Algorithms
This seems like a fantastic time to introduce the different types of Machine Learning Algorithms. Different algorithms unsurprisingly have been designed to be good at different things as they are leveraging different types of Mathematics which may be really powerful for one particular problem, but absolutely useless for a different problem. And so, as a result of this, we choose which algorithms and argorithmic types we will use based on the problem we are attempting to solve.
1. Supervised Learning
A clear and simple example of this would be if we are attempting to predict the price of a house, given features like area, number of bedrooms, and bathrooms. In this scenario, by feeding a Linear Regression algorithm with data about millions of houses and their actual price, it learns the weight or importance of each feature in regard to determining the house price. The goal is to minimize the difference between the predicted price and the actual price in the training data. Once trained, the model’s performance is validated using separate test data to ensure it can make accurate predictions on new, unseen data. Care must be taken to avoid overfitting, where the model becomes too adapted to the training data and fails to generalize well to new data.
The example above is an illustration of a Supervised Learning Algorithm. Supervised Learning is a type of Machine Learning where an algorithm learns from labeled training data, enabling it to make predictions or decisions without human intervention. Common algorithms used in Supervised Learning include Linear Regression, Logistic Regression, Decision Trees, Gradient Boosting, and Random Forests. The essence of Supervised Learning lies in the algorithm’s ability to adapt based on the feedback of how accurate its predictions are, covering both Regression and Classification problems. In short, Supervised Learning is mostly used for Regression and Classification purposes.
2. Unsupervised Learning
This then brings us to Unsupervised Learning. Unsupervised Learning is a subset of Machine Learning where algorithms delve into unlabeled data to discover hidden patterns or structures. Unlike Supervised Learning, which employs known outcomes to steer the learning process, Unsupervised Learning endeavors to find inherent relationships within the data, often grouping alike data points together. Common applications include clustering, where data is categorized into distinct groups based on similarities, Dimensionality Reduction, where high-dimensional data is simplified into a more digestible form without significant information loss, and Anomaly Detection, wherein unusual or outlier data points are identified.
2.4 | Deep Learning
Alright, so now that we have a bit of an idea about what Machine Learning is, It’s time to discuss Deep Learning, and how it is different to Machine Learning. Well, Deep Learning is actually just a subset or variant of Machine Learning, but since the approaches it takes to solving the problems are of a different, more complex style, and it is working in a bit more of a mysterious way, we give it its own special category.
2.4.1 | Shallow Learning vs Deep Learning
It’s here that we need to make the distinction between Shallow Learning Algorithms & Deep Learning Algorithms. Shallow Learning Algorithms or Shallow Learning Approaches to solving problems are when we use the more traditional, cut and dry mathematical algorithms that we explored before to solve the problem. We can know beforehand based on the nature of our initial training data and the algorithm we are employing, what the final output or solution will be. Shallow Learning algorithms take the training data at face value and apply the mathematical functions we provide to produce an output. In contrast, Deep Learning algorithms have much more complexity under the hood. Deep Neural Networks contain many hidden layers that transform and extract features from the training data through nonlinear activations. We do not explicitly define all the transformations - the Neural Network learns relevant patterns itself through techniques such as backpropagation and gradient descent.
2.4.2 | Are Deep Learning Models Supervised or Unsupervised?
It is important to note that Deep Learning models can be leveraged for both Supervised and Unsupervised Learning tasks. Much of practical deep learning uses labeled data in a Supervised Learning approach - for example, classifying images by training a Convolutional Neural Network (CNN) on millions of images with known categories. The error derivatives from comparing predictions to known labels enables the Neural Network to optimize its weights.
Deep learning can also be implemented in an Unsupervised Learning context without label information. Autoencoders for example, provide an Unsupervised approach by learning to encode inputs into a latent space and then decode them back to match the original input. Networks trained this way discover useful feature representations of the data. Generative Adversarial Networks (GANs) are another popular Unsupervised deep learning technique in which two neural nets compete with one another. One generates synthetic data points, and the other tries to discriminate between real and fake data. Over this Adversarial back-and-forth, the generator learns to produce new data points mimicking the distribution of real samples.
2.4.3 | Most Common Deep Learning Algorithms
Even though Deep Learning generally uses Neural Networks for their Algorithms, there are still different types or variants of Neural Networks which are better for specific use cases. The most common 5 are as follows:
Convolutional Neural Networks (CNNs): CNNs are specialized Neural Networks for processing grid-like topology data such as images. They utilize convolution mathematical operations to identify visual patterns and features.
Recurrent Neural Networks (RNNs): RNNs are useful for sequential data like text and speech. They have cyclical connections that allow information to persist across sequences.
Autoencoders: Autoencoders compress input data into a lower-dimensional code and then reconstruct the outputs from this code in an Unsupervised manner to reproduce the original input. Useful for Feature Detection and Dimensionality Reduction.
Generative Adversarial Networks (GANs): GANs are an Unsupervised technique consisting of competing generator and discriminator Neural Networks. The generator tries to create synthetic data resembling real samples while the discriminator evaluates real vs fake.
Multilayer Perceptrons (MLPs): MLPs are a classic feedforward Artificial Neural Network (ANN) architecture with multiple layers between input and output, using backpropagation and gradient descent for training with labeled data. It can perform various Supervised Learning tasks.
Chapter 3 | Conclusion
Wow, that got a bit technical didn’t it? I’m sorry for that, we will have a lot more time in future posts to explore the nuances, subtleties and finer details behind all of this. For now, I just wanted to dip your toes into the deep end, and go from start to finish from a high-level perspective to give a more comprehensive overview of the entire field.
It has been a pleasure writing this, and I hope you were able to follow along on the journey. I know I didn’t get into the crazy specifics in this post, but I will make sure to write dedicated posts covering that in the future.
Also, this is my first ever blog post, so I am not entirely sure what the overall theme will be of this blog, but I’m hoping to share my learning journey, study notes, what I’ve been working on and any project developments here.
With that, I want to thank you for taking the time to read this post, and I look forward to writing for you again soon.