According to “How Data Science Teams Leverage Machine Learning and Other Advanced Analytics” by Gartner research, many companies still struggle to get data science initiatives up and running and drive them forward, often felled by common obstacles such as poor data quality or cultural resistance. Today as data science pipelines continue to expand – companies are investing time, money and human resources to turn their analytic development life cycle dreams into a reality.
One useful framework for building an analytic development life cycle comes from considering how software projects are generally run today, and applying those same principles to data science assets. As industry leaders, Open Data Group has come to understand what it means to design an ideal analytic development life cycle for our customers by applying those same principles. In addition, we have identified some of the common pitfalls to avoid that will impede your analytic journey. Most issues come from the different focus and ownership of each stage, which may reside with different people and a lack of well-defined processes. Some of the more overlooked considerations include:
- Data compliance – Data will come from different sources across the analytic development life cycle journey. For example, Dev may leverage smaller, curated data sets while production is exposed to the more natural and wild data. It’s possible each stage may even find different name mappings of the same data. It’s important to require the model creator (e.g. the data science team) to define the schema for their model.
- Model code is execution ready – another large issue is that the Dev environment may leverage different tooling and software from the downstream process steps. It’s important to give the model creation team a requirement to PROVE the model is execution ready, in the production execution framework chosen by the organization.
- Model is safe – the organization should provide processes and tooling to ensure to every extent possible the model is safe for execution. Teams should be able to enforce library restrictions, have registered images of containers, and provide clear documentation of the underlying assets to prevent unwanted access or operations from impacting mission critical systems.
- Model is governed – as analytic assets become more like IP, and as they solve for business cases that may be regulated, organizations must develop the ability to track and maintain model lineage and metadata. Questions like “what data was used to train this model?” and “which libraries were used in producing these scores?” should be easily and readily answered.
- Model is monitored – business teams will expect to know if and when an issue occurs with their mission critical analytic. Systems of data capture, monitoring systems and data logging should be in place to allow for intelligent action, alerts and automated responses when an issue is found.
In the end, it starts to become apparent that a standard, agnostic way to execute, monitor and manage each analytic asset is required. New approaches, such as combining FastScore, by Open Data Group with a well thought out analytic development life cycle, allows for the journey of analytic models to move and organically scale with the production needs of the organization. The right deployment approach will be agnostic to other choices in the enterprise and will be capable of enabling the analytic development life cycle that is required for long-term value creation from the analytics investment.