In part 1 of the ‘What actually is AI’ series, we explored the history of AI – where it started, the issues it faced, and how we got to where we are today. To keep yourself up to date, have a read through part 1 here.
In this part, I am going to explore some of the algorithms behind the AI systems that are taking the world by storm. This article is not intended to be a comprehensive look into AI algorithms, but rather to help you understand the fundamentals of what is happening when you interact with AI.
In order to understand how these works we are going to narrow our focus from Artificial Intelligence generally to machine learning. Machine learning is a subfield of AI that aims to develop algorithms and models enabling computers to learn from data and make predictions based on this data. Predictions are central to machine learning, as it produces the most likely answer or response to an input, which is vital when examining how AI can transform work environments and society.
Machine Learning
With traditional programming, developers write explicit instructions for the computer to follow. You see this every day when you are interacting with websites. A simple example of this would be logging into social media. A developer will have written a set of instructions to follow when you go to log in…
- Input username and password
- Check against username and passwords stored in the database
- Return authorisation – user exists and the password is correct
- Log the user in and redirect to the dashboard
Machine learning on the other hand employs a range of algorithms to carry out processes, where they can learn and improve from large amounts of data. These algorithms are used often in online recommendation systems. For example on Netflix, Amazon, and more recently Twitter (with its algorithm-based ‘trending’ system).
These recommendation systems analyse huge amounts of data – your browsing history, past purchases, ratings, interactions with websites etc and cross-reference them against other users. By leveraging these techniques, recommendation systems can identify patterns and similarities between users and items. For instance, a machine learning model may predict that a user is likely to purchase snowboarding tickets if they have previously viewed snowboarding content on youtube, purchased a new pair of goggles, or are in an area near ski resorts based on GPS data.
Supervised and unsupervised learning
Supervised and unsupervised learning are used extensively in machine learning. Supervised learning takes in data with known input – output pairs. For example thousands of images (input) with their corresponding animal names (output). The algorithm will then learn to recognise similarities and generalise from this data, allowing it to make predictions on new, unseen images.
Unsupervised learning works similarly but without known output. It aims to come to conclusions and make predictions based on patterns it learns itself. We touched on this in part one where Google created a neural network that learned to identify cats autonomously.
In simple terms, this is how ChatGPT has been trained to provide human-like answers. ChatGPT was trained on huge amounts of data and over time began to learn how to predict the next word, after learning patterns in grammar, syntax and semantic relationships. The model was then fine tuned with human input afterwards (supervised learning).
Reinforcement learning
Reinforcement learning is a machine learning technique that aims to replicate how humans and animals learn through trial and error. For example, learning to ride a bicycle involves initial attempts, feedback, adjustments, and iterative refinement until mastery is achieved.
This works similarly to reinforcement learning in machine learning where computers will try out various techniques to reach a goal until they begin learning what works and what doesn’t.
Training a robot – Markhov Decision Processes
Similar to how humans learn to ride a bicycle, robots can use machine learning techniques to learn how to navigate a maze. Markhov Decision Processes (MDPs) provide a mathematical framework for modelling decision-making problems in which an agent (the robot) interacts with an environment (the maze). Below, I will outline the key components of MDPs. Don’t worry if you don’t fully understand the maths; you can still grasp the fundamental concepts by reading through!
An MDP is a mathematical framework defined by a tuple (S, A, P, R, 𝛾)
(𝛾 is read gamma)
In order to train our robot, we can follow three distinct phases:
Phase 1: define the MDP
- S is a set of states representing possible conditions or situations. In our case this could be making progress through the maze, hitting a wall, or going the wrong way.
- A is a set of actions the agent (our robot) can take in each state: going straight, left, right, or backward.
- P denotes the transition probabilities, determining the likelihood of moving from one state to another after performing an action. For example, the probability of moving to a neighbouring cell when the robot takes a specific action.
- R is the reward function, which provides the immediate rewards received after taking an action in the state. This can be good or bad. For example, going in the correct direction yields positive rewards, and hitting a wall would provide a negative reward. This concept aligns with reinforcement learning, as described earlier.
- 𝛾 is the discount factor. This looks at the importance of future rewards compared to immediate rewards. In our case, future rewards are prioritised as we are looking to reach the end of the maze. However imagine you are playing blackjack, you might have 20 – the second-best hand. However, you could get 21 if the dealer pulls an ace – but there is risk associated with this. The discount factor takes this into account when deciding whether to continue.
Phase 2: Value Iteration
Value iteration is a dynamic programming method used to find the optimal value function or policy in an MDP – providing guidance to the robot on how to select actions based on the current state and its estimated cumulative rewards.
The value function – V(s) represents the expected cumulative rewards starting from a particular state. Through value iteration, the value function is updated iteratively based on the Bellman equation which is shown below.
Here’s how it works
- Start with an initial value (can be a guess) for the value of each state.
- Update the value of each state based on the values of its neighbouring states.
- Repeating this process, updating the values of all states, until they stop changing significantly
When the values of the states stop changing significantly, we have found the optimal values that reflect the expected cumulative rewards for each state. These values will guide the robot when making decisions. By selecting actions that lead to higher rewards, it can navigate the maze more effectively.
Phase 3: Train the robot
After initialising the value function we can let the robot loose to explore the maze. We could allow it to do this randomly, choosing random actions and observing the resulting states, and rewards (positive and negative) and updating the value function based on the Bellman equation.
We would do this until the robot has learned enough data about the maze for the value function to output the most desirable solution (known as convergence).
Final words
In conclusion, we have looked at some of the key algorithms used in Artificial Intelligence and hopefully, you feel comfortable with some of the fundamentals behind AI algorithms. The key point to take away is that these algorithms and techniques empower AI to learn, adapt, and make informed decisions.
As AI advances, it holds the potential to revolutionise various industries, improve efficiency, and reshape our interactions with technology and society as a whole. The impact of this change will be explored in part 3 – Our Future and AI, where I will explore how AI will affect our future, its capabilities, and ethical concerns.