Ten useful tips for starting a new machine learning project at your company.

posted by Marcin Druzkowski on 10 Apr 2017

Lessons learned from deploying an NLP project in production for the Ocado contact centre.

A few months ago, the data science team at Ocado Technology embarked on a project to categorize and prioritize customer emails coming into the Ocado contact center. You can read more about the history of our project in this post: Building ML model is hard. Deploying into real business is even harder.

Today we’d like to offer other engineers and scientists some useful tips and lessons we learned along the way after deploying our machine learning (ML) project into production.

Tip 1: Understand the domain of the problem

At the beginning of every machine learning project, you need to sit down with your business stakeholders and understand what they are trying to achieve.

It will be very hard to build a properly working solution without understanding the scope of the problem, and you may even find that an ML model is the wrong solution to the problem they are trying to solve.

Talk with your business colleagues and don’t be afraid to provide open feedback.

Tip 2: Define your success metrics

After you familiarize yourself with the domain of the problem, you should then discuss how to measure the success of the project. A good idea is to come up with two different sets of metrics. You can start with the business related metrics, such as achieving a financial gain and/or improving workers’ productivity. Secondly, think about machine learning related metrics that help you build and validate models properly. Try to think what is the relationship between these two indicators.

Tip 3: Prepare for change

When you are building your machine learning project, you need to take into account that the business will change in the future. Priorities will change, problems will change. Everything flows.

Try to build a flexible solution and let the business decide what happens next. Be agile.

Tip 4: Don’t forget about security and legal obligations

All machine learning models use some data under the hood. It’s your responsibility to keep and process this data safely and according to the law, especially if you deal with confidential data like customer addresses and emails.

Tip 5: Enrich your data

Data quality has a huge impact on the final accuracy of your model. Invest money and time to gather high quality data. Think how you can enrich your dataset.

In our case, our model used the Wikipedia corpus to learn the English language and initialize the embedding layer.

Tip 6: Create a simple model first.

Unfortunately, all data scientists overcomplicate their models.

It’s much better to build something simple that works; so always start with the simplest model. Simple models are easier to debug, easier to explain and easier to deploy.

It might be that you will not need a sophisticated model at all.

Tip 7: People do not trust machine learning, people trust other people

To build trust, be honest about your model’s accuracy. Inform others what are the limits of your model and what can be done better in the future. Transparency is the easiest way to build trust.

Tip 8: Treat your ML project like any other software project

Write tests. Make code reviews. Manage technical debt. All software engineering standards should be met. Your machine learning model is just software. No excuses.

Tip 9: Deployment to production is not the end of the project. It’s the beginning of giving value.

Ensure that you can easily answer the questions below:

1) Who will support and maintain your model? 2) What is the procedure in case of an emergency? 3) Based on which dashboard or monitoring can you assess the current quality of model?

Tip 10: If nine tips are not enough...

Take a look at these 43 rules of Machine Learning Engineering from Google.

Marcin Druzkowski

categories: Ocado Technology News