When it comes to machine learning models, there are many differences between creation environments (where the model was built), and production environments (where the model will be used, monitored and have it’s life cycle managed). The creation environment is oriented towards a specific set of people working on the model, with specific system, data and outputs configurations. But the production environment may be quite different – with other people, systems and requirements applied to the model. Understanding these differences allows organizations to be efficient in both environments, and know how to best navigate for the full life cycle of their critical machine learning assets. Let’s take a deeper look at the differences between creation and production environment in order to increase the effectiveness of our deployment process.
The first difference between the creation and production environments is the person that is responsible for the model in each of these environments. Generally, in the creation environment, data scientists are in control of building and testing the model. Once the model has been created, more often than not it’s handed to a partner org (perhaps IT, or analytic engineering) and operations to be put into production. In the production environment, it is the responsibility of IT and operations to ensure that the analytic asset is running as accurately and efficiently as possible. The handoff from data science to IT has been known to cause problems in organizations, and care should be taken to ensure the roles and responsibilities at this critical handoff are understood, documented and enforced. Learn how to increase efficiency in the handoff from data science to IT in our recent blog.
Another key difference between these two environments is basic purpose of each, and the systems that will employed. For example, in the creation environment teams are focused on the tools and packages used to build and run the model. The model creators may iterate on the modeling techniques, libraries and input data features often, looking for and then settling on the best outcome for their application. As a result, the tooling is oriented to discovery, trial and error and prototyping. However, as the model moves into production the focus changes. Now teams are working on a more rigid outlook, enabling scheduling flows, rigorous testing, monitoring and continuous integration techniques. The tooling in production tends to be more systematic, process oriented and scalable. These differences are to be expected, but critical to acknowledge and manage for over the life of the machine learning asset.
Data, and the sources or locations of data, is a third critical difference between the creation and production environments. The structure of the data, as specified by the data science team will be the same, but the data is often delivered in different ways. As a simple example, the creation systems may point to flat local files, managed by the model creation team – but the production system may take live, streaming data straight from sources in the field. Perhaps even more important to consider is the outputs of models. In production, teams may look for a wide degree of flexibility here to feed various application types, reporting and dashboards. The creation team generally does not get involved in this “down stream” integration from the model. Rather, teams of people who specialize in managing the connection to live business applications are brought in, and manage the asset’s impact on the day to day business.
Although the model itself and data schema stay the same, there are many aspects of the model that change from creation to production. This is why it’s so important to have systems which enables core abstractions, allowing for flexibility where required and enabling other aspects to be locked down. At Open Data Group, our Model Development Life Cycle (MDLC) approach provides a comprehensive framework to consider these, and other challenges for enabling machine learning to transform the enterprise. Learn more about the MDLC approach here.