Artificial Intelligence model deployment, a problem or a challenge?

artificial intelligence deployment - a problem or a challenge

Short overview of pain points moving Deep Learning models into production environment.

Executive Summary

About 50 conversations with AI specialists working in different industries share their position about obstacles with AI models in production. Many treasured insights encouraged me to write the following article, so I hope you enjoy it.


Two years ago I finished my Master @ Imperial College London. I had a chance to be a part of BICI Lab working on advanced deep learning algorithms for healthcare. Getting plausible final results I decided to share my work with business people in the form of a sample app where people could test the solution using their data, not just read the paper (which was hard to understand even for people in the field of AI). I realized that it is not an easy task and even for big companies it is an issue.

Since that time I decided to pursue the passion of fast and easy AI model deployment. Investigating different solutions and approaches I tried to reach people dealing with this topic daily.

What is AI deployment?

Some of you might ask what exactly is AI model deployment? To clarify, “AI model deployment” is a process that starts when a model is trained and ends up with a model implemented in some mobile app or a webpage. Behind this process, there is a lot of iterative and very often arduous work.

Google engineers have pointed out in one of their papers that the ML code in only a small fraction of the whole ecosystem. The process of deployment plays a big role. It consists of many components like resource management, system monitoring, versioning, serving infrastructure.

Only a small fraction of real-world ML systems are composed of the ML code. The process of deployment is practically 50% of it (right side).

Well, so what is the hardest part of the process of model deployment?

From the architecture point of view one of the biggest problems that AI engineers face nowadays is the maintenance of models running in the production environment. Collecting events about behavior is one thing but extracting valuable information from it is still a tricky part. Sometimes it is hard to keep models up to date with business assumptions.

Since Machine Learning workflow is a fast-paced environment, versioning of models and datasets causes a lot of effort. Especially in a situation when we need to restore some previous configuration. Keeping up to date all versions clearly and understandably is a key to perform fast actions.

When it comes to algorithms, the next hard part is model optimization. Many popular Machine Learning and Data Science frameworks, like sci-kit-learn are simply not prepared for the production environment. Engineers need to spend a lot of time understanding the code created by scientists and optimize each part according to requirements.

It is also important to mention that most of those frameworks are written in python in which multithreading is not multithreading, due to the GIL (for heavy tech details please visit the following article). This problem is called parallelism. All those models very often end up rewritten to C language or some hardware-specific language, only weights left the same.

Going further into hardware there is a problem with hardware compatibility when we want to convert our model from a specific framework to edge or mobile. Lack of hardware-specific libraries slows down product delivery sometimes even for a couple of weeks.

Last but not least, from the business point of view there is a lack of MLOps. The people who stand between Machine Learning team and operational engineers. People able to manage the production lifecycle and understand distributed systems. Because Machine Learning is emerging in practically all areas of our lives there will be growing demand for Engineers who specialize in Machine-Learning-To-Production areas.

What don’t you love about the solutions you’ve tried?

The cost structure provided by many platforms is not clear and understandable. It is very hard to predict and plan expenditure. Sometimes even simple estimation causes a lot of problems, which causes many platforms to lose customers.

The more options have a platform, the better documented it should be to give a full overview of all possibilities and advantages over the competition. However, here we also have a gap — a weak code documentation. There is lack of tutorials, examples, and potential use cases with a short explanation.

On the market, there is a lack of platforms able to deploy models for mobile applications, where hardware specification plays an important role, since models are running on the device not on the cloud. Many platforms provide a lot of tools for general model deployment, however, only a couple of them are domain-specific platforms e.g for like healthcare or energy sector.

Popular optimization-for-hardware libraries can handle only a small fraction of popular frameworks, which is inconvenient when we deal with big infrastructures and models.

It is also important that switching from one platform to another also plays a big role. We have to always keep in mind how much time it will take to find a new tool vs how much time it will take to transfer the current solution to the new tool vs how much time do we have to the project deadline vs whether the new tool has all features we need.

All those remarks play an important role. However, we have to remember that at the end of the day for the customer the most important thing is the fact that it works. Speed and efficiency are no guarantee of success so it’s important to deliver.

Problem or a challenge?

To sum up we might ask, Is the whole AI model deployment process a problem — a matter of doubt and uncertainty, or a challenge — an opportunity for success and growth?

In my opinion, it is a challenge worth exploring more in detail. My final thoughts are as follows.

Currently there is a small concern who should be responsible for deployments and later maintenance. In most cases ML Engineers / AI developers are not responsible for AI model deployment. Backend engineers are ones who deal with it.

Future platforms have to be transparent in costs and able to version the model. There is a need for the software able to optimize models for specific hardware e.g. mobiles, edge, or websites.

Finally, led by inspirational Leonardo’s da Vinci quote “simplicity is the ultimate sophistication” I need to admit that there is also a place for simple and easy to use tools centered around one particular field of industry or task allowing fast prototyping and testing.