A brief introduction to AI governance: A model developer's guide

Building and deploying strong, robust artificial intelligence (AI) models is complex, challenging work on its own. If you are like many data science and engineering leaders that I have spoken to lately, you are having conversations with other teams about governance of your systems.

It's hard to do that and do your job of getting models into production. So let's talk about what you can do as a technical organization to make AI governance easier both for your team and your business partners, who are key stakeholders in the governance process.

Key Design Principles

At a high level, to ensure that we have models that are governable, and can be assured, we want to make sure model artifacts exhibit the following three principles:

Context: After the initial exploratory stages of model development, the business reasons, scope, risks, limitations, and data modeling approaches are well-defined and fully documented prior to a model going into production.
Verifiability: Every business and technical decision and step in the model development process should be able to be verified and interrogated. An ML model pipeline should never be a fully “black box” even if a black box algorithm is used. Understanding where the data came from, how it was processed, and what regulatory considerations exist are paramount for building a verifiable model. Model code should be constructed and documented in a way that’s comprehensible to someone who hasn't looked at the code before. The model should be built so that reperforming individual transactions is possible, using containerized architectures, serialization (via pickle, equivalent), and preprocessing techniques that are deterministic (e.x. Scikit Learning one-hot encoding with a random seed and serialized).
Objectivity: The gold standard of governance is when any ML application can be reasonably evaluated and understood by an objective individual or party not involved in the model development. If an AI system is built with the prior two principles of Context and Verifiability, it’s far more likely that your business partners can act effectively as that second-line and third-line objective party to evaluate it and greenlight your work to go into production.

Key capabilities to incorporate into your models

Due to the ever-evolving landscape of open source libraries, vendors, and approaches to building models, as well as the shortage of qualified AI model builders, there is a significant lack of industry best practices for how to create deployable, maintainable, and governable AI models.

When developing AI or any models with governance in mind, the most important considerations are reperformance, version control, interpretability, and ease of deployment and maintainability.

Reperformance

Reperformance is the ability to reperform or reproduce a transaction or a model training and obtain identical results. In science, much has been spoken about the "reproducibility crisis", and the AI community is not immune from this criticism.

Creating complex modeling systems that are reproducible is definitely possible, and putting in the upfront effort to do so yields less headache for developers down the road in more robust, dependable deployments, as well as fewer questions from auditors and other reviewers.

Some key best practices to keep in mind are:

Pipeline objects should be used to encapsulate the pre-processing, i.e. scaling, on-hot encoding, etc functions, model, i.e. random forest, and postprocessing techniques, if applicable into one object. This pipeline object should be saved in a common serialization such as pickle or joblib.
Pre/post-processing logic that is not in the pipeline object should reside in a single .py file.
Use of random seeds for model training, fitting, and processing to ensure consistent, repeatable results.
Use a version control system, such as Git for all code storage.
Document, document, document.

Document your data lineage, provide a data dictionary, understand exactly where your data came from, and what it does.
Document how your model performs and why specific decisions were made on feature selection, engineering, and model training.

Interpretability

Making a modeling system that is easy to understand for people who don’t know much about AI is a key part of making an AI system that can be governed. To create an interpretable model, the following are key considerations and best practices:

Simpler is often better, and you should avoid using more complex models without trying a simpler model first. In other words, don't use a deep neural network if a logistic regression model performs almost as well. In cases when a more complex model is chosen, you should document and justify why such a model is required for the business use case.
Employ a common explainability technique such as Anchors or SHAP to your model. Ensure that the model supports local as well as global interpretability of individual transactions.
Ensure that your model has been evaluated by an objective party for accuracy, business context, non-technical audience understandability, and that it performs as expected when individual inputs are passed in.

Deployment Maturity

As a technical organization, the final dimension of governability lies in maturing your approach to how you deploy models into production. Following standard engineering, and API development best practices will go a long way towards governable AI. You will additionally want to focus on deploying a scalable model that is robust when facing adversarial inputs and shocks in request volume. Here are some specific practices to employ that might be relevant for your team:

Deploy and productionize all models using a standard process, such as in a container orchestration system. In particular, you should have a thorough peer review process with a special eye toward ensuring that software engineers have an opportunity to harden code created by data scientists, who may not have the same degree of experience with hardening code for production.
Encapsulate pre- and post-processing code in pipeline objects or single files for reproduction and auditability. Model serving should be separated into a server file that loads the pipeline object (or model and processing pickle files) and a python file that has the preprocessing, model prediction, and post-processing logic.
All model inputs, results, explainability, and relevant metadata are logged in sufficient detail for post mortems and traceability of model transactions.
Adhere to a standard REST API deployment paradigm, ideally with a containerized solution with safeguards in place. You should avoid dynamic processes in the pre- and post-processing logic. If a call is deterministic, your model is not reproducible and therefore cannot be governed effectively.
Ensure that your application architecture and security are front and center when building a sustainable and trustworthy AI system. Your model, code, artifacts, and systems should adhere to the Principle of Least Privilege, and any other relevant security practices for your organization. You should also have strong access and security controls (IT General Controls) in place to protect the system from tampering.
Monitoring processes are appropriate and sufficient to provide timely identification if the model behaves unexpectedly. Model concept drift and feature drift are a pervasive problem in deployed machine learning models. Having monitoring in place to detect when it begins to occur is absolutely essential for a long term successful ML deployment.
The model has been thoroughly and routinely tested – manually by an independent, non-technical party using standardized controls – to ensure that the model is performing as expected and is resistant to adversarial inputs. Periodic manual one-off testing and validation of the model are crucial to ensure that the model is operating as intended.

Conclusion

A lot of work and focus over the past decade has been in pushing the boundaries of data processing and modeling algorithms. In industry, the biggest gap recently has not been a lack of ability to build a Random Forest model in Python, but how to properly govern and deploy such a model, especially higher risk models in highly regulated environments.

Focusing on these engineering best practices will go a long way in providing the technical basics needed to build high quality models that can be governed effectively. Most importantly, objective evaluators inside and outside the organization will have the ability to implement multiple lines of defense for your organization, drive model risk management, and enable audits that are a lower burden on technical teams. The end result of building more governable AI will be that your technical teams will be freed to focus on forward progress for their models because they have gained the trust of their business partners.

Originally published on December 14, 2021 in InfoWorld. Updated September 19, 2024 for inclusion of generative AI