Back to Blog

A practical guide to machine learning

A practical guide to machine learning

Almost every industry nowadays claims to be using the power of artificial intelligence (AI), or more specifically, machine learning (ML) to solve a range of complex real-world challenges. Many businesses have found that, when compared to traditional approaches, machine learning provides better ways to predict real-world outcomes and analyze large data sets with incredible accuracy and speed, uncovering insights that would otherwise remain hidden.

In addition, machine learning has the ability to capture the desired behaviour of experts (humans or other machines) and replay the learned behavior through the software at much larger scale, usually to provide support for non-experts.

Whether it’s countering the unauthorised use of credit cards, assisting doctors in identifying cancers, or driving cars autonomously for millions of miles, machines are getting incredibly good at mimicking the cognitive functions of learning and problem solving that are typically associated with the human mind.

While machine learning adoption has been recently accelerated by the explosion of big data and cloud computing, choosing an appropriate machine learning technique can still be a daunting task, particularly when there are so many options available.

When designing a machine learning system, data scientists have several options available, depending on whether the computer is given any feedback during the learning phase; here are some of the most popular:

  • Supervised learning: using training input data and feedback from humans, the algorithm produces a desired output for the given set of inputs. Supervised learning is applied to object or pattern recognition, spam detection, or ranking.
  • Reinforcement learning: the desired output is generally not known therefore feedback is given through rewards (and penalties) in a dynamic environment; this works well in applications such as self-driving vehicles or controlling robotic arms
  • Unsupervised learning: the algorithm makes predictions based on input data without being given any explicit outputs or feedback. The system is essentially left to its own devices to make sense of the inputs, often uncovering hidden patterns in the data. Unsupervised learning is suited for recommendation systems or feature extraction.

Based on the types described above, data scientists must then choose a more specific mechanism for creating and training a machine learning system; the list below includes some approaches currently used in the industry.

Neural networks and deep learning

Among the different approaches to creating ML systems, neural networks (NNs) are quickly gaining ground in areas related to object recognition and classification, speech recognition or natural language processing.

A neural net is inspired loosely by the structure and functional aspects of the human brain: it contains a collection of interconnected software neurons organized in layers. Different layers may perform different kinds of transformations on their inputs. Data travels from the first (input) layer to the last (output), possibly after traversing a series of intermediate (hidden) layers multiple times.

The architecture of a artificial neuron — the building block of neural networks

The network is typically asked to solve a problem, which it attempts to do over and over, each time strengthening the connections that achieve a successful result and diminishing those that are failure-prone.

In the context of neural networks, deep learning is a type of ML technique where a cascade of multiple layers for feature extraction and transformation are used. Each successive layer uses the outputs from the previous layer as inputs.

Simple vs deep learning neural networks

One example of deep learning in action is the Horizon 2020-sponsored SecondHands project which aims to design a collaborative robot to assist human technicians in industrial maintenance tasks. Since technicians will likely use their voice to issue commands to the robot, SecondHands uses a Deep Neural Network with 5 layers and 1600 weights per layer for the acoustic modeling in order to automatically recognise human speech.

Convolutional neural networks (CNNs)

Convolutional neural networks are a class of deep neural networks where information travels in only one direction (hence why they are also named feedforward networks): from the input nodes, through the hidden nodes (if any), and to the output nodes. The output values from each layer are generated by multiplying the input values with the value of a filter; this is a mathematical operation known as convolving the input and the filter.

Convolutional neural networks at work

Using the SecondHands project for reference, the computer vision pipeline used two deep CNNs for object recognition. The robot therefore gains knowledge of the objects to operate on and their position in the environment. The CNNs also help the robot deal with and respond to the non-deterministic output of its actions in a real environment; for example, if the robot has to place a spray bottle on a table and the bottle falls, it can relocate it and try repeating the operation again.

Recurrent neural networks (RNNs)

Recurrent neural networks are a class of neural networks where connections between units form a directed cycle. This makes them suited to tasks such as handwriting recognition, speech recognition, robot control or music composition.

In recurrent neural networks, information can be passed back and forth between nodes

One popular type of RNNs are the long short-term memory (LSTM) networks. Their name comes from their ability to create a model for the short-term memory that can last for a long period of time.

For example, when it came time to implement a neural network to categorize emails arriving in the contact center, Ocado Technology’s data scientists compared different CNNs and RNNs. They found recurrent architectures such as GRUs and LSTMs harder to train and very close in terms of performance to CNNs (but not better). Although somewhat surprising, the findings were a reflection on the simplicity of the problem: the categorization was directly linked with the presence (or absence) of particular phrases in the emails meaning there were no long-term dependencies between the data.

RNNs on the other hand can be used to model customer frequency and therefore establish purchasing patterns. Another example of a RNN-based machine learning system is the natural language understanding (NLU) pipeline implemented in the SecondHands project. When the technician communicates a command to the robot, the recurrent network can recognize multiple actions in the verbal request and identify the objects or locations mentioned.

Bayesian networks

Fundamental to Bayesian models is the notion of modularity — a complex machine learning system is built by combining simpler parts glued together by different probabilities. They can be used for a wide range of tasks defined by uncertainty such as prediction, anomaly detection, diagnostics, automated insight or reasoning.

Bayesian networks link differnet probabilities to make predictions

Bayesian networks are helpful when data scientists know that multiple functions exist to accomplish a desired result but are unsure of the exact makeup of those functions. Bayesian networks are then used to model the expected distribution of the values instead of a single value.

Gaussian processes go a step further, allowing the modelling of several expected distributions. They can be used to recognize handwritten digits or calculate the force and moment of a robot arm based on its position, velocity and acceleration.

In grocery retail applications, Gaussian processes are very useful to forecast demand for a certain product or category of products.

Ensemble learning

Ensemble learning is an ML technique where multiple learning algorithms are strategically deployed to solve the same problem. In contrast to ordinary machine learning approaches where only one model is used to make all predictions, ensemble methods try to construct a set of hypotheses and combine them to improve the overall predictive performance.

In ensemble learning, the results from multiple classifiers and data sets are combined to establish a final outcome

An advantage of ensemble learning is the ability to correct the errors of its members based on the diversity of the methods that make up the ensemble. However, a downside of using this technique is that it tends to require more computational power since several models are being run simultaneously.

Ensemble learning facilitates solutions to key computational issues such as face recognition and is now being applied in areas as diverse as object tracking and bioinformatics.


In data analysis, clustering is the grouping of objects in such a way that objects in the same set (called a cluster) are more similar to each other than to those in other sets. The entire data set is represented by a data matrix, where each object represents a column and its properties represent a row.

This interpretation of clustering leads to the understanding of biclustering as an ML method that simultaneously clusters both rows and columns of a matrix.

Biclustering is particularly useful for customer segmentation; for example, identifying culinary preferences (vegan/vegetarian, pescetarianism, etc.) or establishing dietary restrictions or intolerances to certain compound such as gluten or lactose.

Alex Voica, Head of Technology Communications

Scroll Up