Home > Learning > Modules > Monitor

‍Model Monitoring & Reporting‍

The ModelOp Model Monitoring & Reporting Module provides comprehensive quality, risk, and process monitoring throughout the entire model lifecycle—enabling automated testing, continuous performance validation, ethical bias detection, and executive-level visibility across all AI/ML initiatives.

Monitoring Architecture

Core Concepts

ModelOp treats monitors as models themselves, enabling reuse, robust governance, and full auditability of the critical systems ensuring your decisioning assets perform optimally within governance thresholds.

Monitors as Associated Models: Monitors are registered as associated models that can be attached to one or more business model implementations, running on scheduled or triggered intervals.

Decision Tables for Thresholds: ModelOp uses industry-standard decision tables (DMN files) to define operational thresholds. This approach accommodates multiple factors—technical, statistical, business, and metadata—to precisely determine when models operate out of bounds.

Flexible Metrics Selection: Choose from out-of-the-box monitors or create custom monitoring functions using the ModelOp monitoring SDK Python package.

MLC Integration: Monitors integrate with model lifecycles, enabling automated comparison against model-specific thresholds and triggering remediation actions when deviations occur.

Learn more about Monitoring & Reporting Key Concepts

Selecting Evaluation Metrics

Metric selection should reflect the underlying business problem and occur early in model development.

Classification Metrics

Binary Classification with Imbalanced Classes: Accuracy is inadequate when class frequencies are severely unbalanced. Use F1 score or Fβ score (β > 1) to reward true positives and penalize false negatives.

Example: Compliance monitoring where violations are rare but critical to detect.

Multi-Class Classification: Select metrics appropriate for multiple outcome categories, considering class prevalence and misclassification costs.

Regression Metrics

Mean Absolute Error (MAE): Preferred when a few large errors with mostly small errors is acceptable.

Max Error: Use when no error should exceed a particular threshold.

Root Mean Squared Error (RMSE): Interpolates between MAE and max error approaches.

Interpretability Metrics

SHAP Values: Per-record explanations justifying individual predictions, integrated into scoring functions.

ROC Curve: True positive to false positive ratio visualization.

AUC (Area Under ROC Curve): Single-number summary of model discrimination ability.

Beyond statistical metrics, consider inference speed, interpretability, and operational constraints when promoting models to production.

Learn more about Monitoring & Reporting Key Concepts

Out-of-the-Box Monitors

ModelOp ships with comprehensive pre-built monitors covering traditional ML, NLP, and LLM models:

Traditional ML Monitors

Performance metrics (accuracy, precision, recall, F1, AUC, RMSE)
Stability monitoring (Characteristic Stability Index)
Drift detection (data drift, concept drift)
Normality testing
Linearity assessment
Autocorrelation analysis

NLP Monitors

PII detection
Sentiment analysis
Top words by parts of speech
SBERT similarity

LLM Monitors

Prompt file validation
PII detection in responses
Sentiment analysis
Fact checking
Accuracy assessment
SBERT similarity
Rails compliance validation
Bias detection in responses

All monitors can be customized via the ModelOp monitoring Python package.

Learn more about Out-of-the-Box Tests & Monitors

Access ModelOp Monitoring Package Documentation (subscribed customers only)

Preparing Models for Monitoring

Required Assets

Model-Specific Data Sets: Training/baseline data, test data, holdout data, and production data sets with timestamps enabling trend analysis over time.

Extended Schema: Defines input features and outputs with roles, types, and attributes for each field. Critical for enabling most tests and monitors.

Learn more about Adding/Managing Model Assets

Learn more about Adding/Managing Model Schemas

Adding Monitors to Models

Monitor Addition Wizard

Import Monitor: Ensure desired monitor is imported and snapshotted in ModelOp Center (viewable in Inventory > Monitors)
Navigate to Implementation: Select the business model snapshot for monitoring
Launch Wizard: Click "Add Monitor" in the Monitoring tab
Choose Monitor: Select specific monitoring model from available options
Select Version: Choose monitor snapshot (version) for reusability and auditability
Add Input Assets: Upload files, reference external storage (S3, HDFS, Azure Blob, Google Cloud Storage), specify SQL statements, use REST endpoints, or select existing model assets
Configure Thresholds (Optional): Add decision table (DMN file) for automated threshold evaluation. Generate new DMN or select existing file for editing via DMN builder
Set Schedule (Optional): Configure automated monitoring job frequency using wizard or cron expressions. Supports integration with enterprise schedulers (Control-M, AutoSys, SSIS) via API
Define Parameters (Optional): Pass dynamic input parameters for scheduled monitoring jobs
Review and Save: Confirm configuration and save monitor

Dynamic Parameters for Monitoring

Use placeholder notation for time-based data retrieval:

REST Assets: Add ${PARAMETER_NAME} to Query Params or Form Data

SQL Assets: Add ${PARAMETER_NAME} to SQL Parameters

URL Assets: Add $(PARAMETER_NAME) to URL (note parentheses)

Pre-loaded Variables:

${QUARTER_START}, ${QUARTER_END}
${MONTH_START}, ${MONTH_END}, ${MONTH} (full word)
${WEEK_START}, ${WEEK_END}
${DAYS_AGO_30}
${TODAY} or ${CURRENT_DAY}
Custom date format: ASSET_REPLACE_FORMAT variable

Learn more about Adding a Monitor

Running Monitors

On-Demand Execution (UI)

Navigate to business model snapshot page and click the play button next to the desired monitor. Monitor job initiates immediately with redirection to job page.

Scheduled Execution

Monitors run automatically based on configured schedules using MLC processes listening for scheduled signals.

Custom Scheduling (Enterprise Schedulers)

Integrate with Control-M, Airflow, AutoSys, or other enterprise schedulers:

Obtain business model snapshot UUID (from snapshot page URL)
Obtain monitoring model snapshot UUID (from Inventory > Monitors)
Configure REST call to ModelOp Center:
- Obtain valid auth token
- POST to <MOC_INSTANCE_URL>/mlc-service/rest/signal
- Include MLC signal name and model UUIDs in request body

Example CURL Command:

bash

curl 'https://mocinstance.modelop.center/mlc-service/rest/signalResponsive' \ -H 'Accept: application/json, text/plain, */*' \ -H 'Content-Type: application/json' \ -X POST \ -H 'Authorization: Bearer <token>' \ --data-raw '{"name":"com.modelop.mlc.definitions.Signals_Run_Associated_Model_Jira","variables":{"DEPLOYABLE_MODEL_ID":{"value":"<business-model-uuid>"},"ASSOCIATED_MODEL_ID":{"value":"<monitor-uuid>"}}}'

Monitor Execution Process

Once triggered, the MLC executes:

Prepares monitoring job with required artifacts
Creates and executes monitoring job
Parses results into viewable test results
Compares results against decision table thresholds
Takes action (notifications, JIRA/ServiceNow tickets)

Learn more about Running a Monitor

Monitoring Results and Notifications

Test Results Display

Visual Elements:

Summary Metrics: Key metrics as key:value pairs for quick reference
Bias/Disparity Metrics: Group-level fairness measurements
Group Metrics: Performance by protected class (TPR, TNR, FPR, FNR, precision, etc.)
Charts and Graphs: Line graphs, bar charts, tables, scatter plots as configured in monitor output

Raw Results: Expandable/collapsible JSON representation accessible via Raw Results tab. Download complete JSON file from Jobs tab > job details > Outputs.

Notifications

Notifications appear under corresponding model test results with severity indicators. For configured ticketing systems (Jira, ServiceNow), ERROR-severity issues automatically create tickets with direct links.

Notification Types:

Alerts: Test failures, model errors, runtime issues requiring response
Tasks: User actions like approvals or acknowledgments
System Notifications: Status updates, runtime information

Learn more about Running a Monitor

Champion/Challenger Model Comparison

Compare performance of different models or versions to determine optimal production candidates.

Comparison Workflows

Compare Distinct Model Implementations:

Navigate to Inventory > Implementations
Enable "Champion/Challenger" toggle
Select models using checkboxes
Click "Compare Models"

Compare Versions of Same Implementation:

Open specific model implementation page
Click "Enable champion challenger" in Snapshots section
Select versions and click "Compare Models"

Comparison Wizard

Add Models (Optional): Include additional models for comparison
Select Snapshots: Choose specific versions using "+" button
Select Test Results: Choose test results for comparison
View Side-by-Side: Analyze comparative metrics in comparison screen

This capability supports evaluating different methodologies, approaches, frameworks, or languages against each other or validating that candidate models outperform currently deployed versions.

Learn more about Champion/Challenger Model Comparison

Building Custom Monitors

Metrics Function

Define custom metrics via the Metrics Function in model source code. Specify using # modelop.metrics smart tag or select in Command Center after registration.

The Metrics Function executes against batch records and yields JSON: {"metric_1": <value_1>, ..., "metric_n": <value_n>}. Values populate Test Results visuals in UI.

Python Custom Monitor Example

python

# modelop.metricsdef metrics(x): # Load model artifacts lasso_model = lasso_model_artifacts['lasso_model'] # Preprocess and generate predictions actuals = x.flagged predictions = generate_predictions(x) # Calculate metrics confusion_matrix = sklearn.metrics.confusion_matrix(actuals, predictions) fpr, tpr, thres = sklearn.metrics.roc_curve(actuals, predictions) auc_val = sklearn.metrics.auc(fpr, tpr) f2_score = sklearn.metrics.fbeta_score(actuals, predictions, beta=2) # Format results roc_curve = [{'fpr': x[0], 'tpr': x[1]} for x in list(zip(fpr, tpr))] labels = ['Compliant', 'Non-Compliant'] cm = matrix_to_dicts(confusion_matrix, labels) test_results = dict( roc_curve=roc_curve, auc=auc_val, f2_score=f2_score, confusion_matrix=cm ) yield test_results

R Custom Monitor Example

# import librarieslibrary(tidymodels) library(readr)# modelop.initbegin <- function() { # Initialization steps}# modelop.metricsmetrics <- function(data) { df <- data.frame(data) get_metrics <- metric_set(f_meas, accuracy, sensitivity, specificity, precision) output <- get_metrics(data = df, truth = as.factor(label_value), estimate = as.factor(score)) mtr <- list(PerformanceMetrics=output) emit(mtr) }

Custom Monitor Output Formats

Line Graphs (timeseries, generic, decimal):

json

{ "time_line_graph": { "title": "Example Line Graph - Timeseries Data", "x_axis_label": "X Axis", "y_axis_label": "Y Axis", "data": { "data1": [["2023-02-27T20:10:20", 100], ["2023-03-01T20:10:20", 200]], "data2": [["2023-02-28T20:10:20", 350], ["2023-03-02T20:10:20", 250]] } } }

Bar Charts (vertical, horizontal):

json

{ "generic_bar_graph": { "title": "Example Bar Chart", "x_axis_label": "X Axis", "y_axis_label": "Y Axis", "rotated": false, "data": { "data1": [1, 2, 3, 4], "data2": [4, 3, 2, 1] }, "categories": ["cat1", "cat2", "cat3", "cat4"] } }

Tables:

json

{ "generic_table": [ {"data1": 1, "data2": 2, "data3": 3}, {"data1": 2, "data2": 3, "data3": 4}, {"data1": 3, "data2": 4, "data3": 5} ] }

Additional Formats: Scatter plots, pie charts, donut charts

Learn more about Building a Custom Test or Monitor

Learn more about Monitor Output Structure

Defining Required Assets

Create required_assets.json in your monitor repository to enforce asset acquisition:

json

[ { "usage": "INPUT_ASSET", "assetRole": "BASELINE_DATA", "metaData": {"description": "First Training Dataset"} }, { "usage": "ADDITIONAL_ASSET", "name": ".*\\.pkl", "filename": "weights/weights.pkl", "metaData": {"description": "Weights File"} } ]

Usage Types:

INPUT_ASSET: Passed to metric function as argument
ADDITIONAL_ASSET: Materialized in runtime, available for model loading

Optional Fields: assetRole, name (regex), filename, description

Ethical Bias Monitoring

Bias Detection Approach

Organizations require visibility into prediction formation, particularly regarding unfair or partial results for protected classes. Bias monitors run routinely against labeled and scored data batches, alerting when production metrics deviate beyond pre-set thresholds.

Example: German Credit Data: Gender (reported as status_sex attribute) is a protected attribute. Simply excluding gender from training doesn't guarantee unbiased models—gender may correlate with other attributes like income.

Aequitas Integration

ModelOp leverages Aequitas for bias and fairness metrics given labeled/scored data sets and protected attributes:

Group Metrics: TPR, TNR, FPR, FNR, FOR, FDR, NPV, precision, PPR, PPREV, prevalence by protected class groups

Disparity Metrics: Ratios comparing groups to reference groups (e.g., FPR disparity = female FPR / male FPR)

Example: If females have FPR=0.32 and males have FPR=0.16, the FPR disparity is 2.0, indicating females are twice as likely to be falsely labeled as high-risk.

Bias Monitor Assumptions

Requirements:

Extended schema asset for input data
Binary classification model type
Categorical protected class features
Input data with label (ground truth), score (model output), and protected class columns

Numerical Protected Classes: While not supported out-of-the-box, bias monitors can be modified to handle numerical protected classes (e.g., continuous age) by specifying cutoffs to create categorical buckets.

Bias Monitor Execution

Init Function: Extracts extended input schema from job JSON
Parameter Setting: Determines protected classes, label column, score column from schema
Metrics Calculation: Runs Aequitas Bias and/or Group tests for each protected class with automatic reference group selection
Result Compilation: Appends bias tests to returned results

Monitor Variants:

Group Metrics Only: TPR, TNR, FOR, FDR, FPR, FNR, NPV, precision, PPR, PPREV, prevalence
Disparity Metrics Only: All disparity ratios
Combined: Both group and disparity metrics

Learn more about Running a Monitor

Documentation Generation

Automated Documentation

ModelOp streamlines compliance documentation by automatically populating templates with model metadata, metrics, and test results—saving time while ensuring enterprise-wide consistency.

Supported Document Types

Model Cards: Markdown (.MD) format using industry-standard templates (Hugging Face extended)
Microsoft Word Documents: (.DOCX) format with custom enterprise templates

Available Information

Use Case/Model Information: Complete storedModel object metadata

Snapshot Information: Complete deployableModel object metadata

Model Test Results: All monitoring and test result data

ModelOp Tag Structure

Insert placeholders in templates using format: <<modelop.object.field>>

Base Tag Structures:

Use Case/Model: <<modelop.storedModel>>
Snapshot: <<modelop.snapshot>>
Model Test Result: <<modelop.modelTestResult>>

Access Methods:

Simple: <<modelop.storedModel.name>>
Nested: <<modelop.storedModel.modelMetaData.custom.element>>
Indexed: <<modelop.modelTestResult.metric[0]>>
Mapped: <<modelop.modelTestResult.metric(key)>>
Combined: <<modelop.modelTestResult.metric.nested[index](key)>>

Generating Tables

Create custom monitor output with generic_table field
Insert 1x1 table in Word template
Add tag: <<modelop.modelTestResult.testResults.(generic_table)>>

Result: Automatic table generation from test results

Generating Graphs

Line Graphs (timeseries, generic, decimal):

Configure monitor output with appropriate structure
Add tag: <<modelopgraph.linegraph.generic_line_graph>>

Bar Charts (vertical, horizontal):

Configure monitor output with bar chart structure
Add tag: <<modelopgraph.bargraph.generic_bar_graph>>

Learn more about Documentation Generation

Executive Dashboard

Dashboard Overview

The Executive Dashboard provides enterprise-wide visibility into business value, operational/IT health, risk metrics, and data science KPIs for all production models across diverse platforms, languages, and frameworks.

Key Terminology:

Model in Production: Snapshot in state=DEPLOYED to runtime with inProduction=true flag

Open Priority Issue: Notification with severity=CRITICAL or HIGH having attached ticket (JIRA, ServiceNow) with elevated priority

Dashboard Sections

Summary Section:

Cumulative Value: Sum of dollar amounts from Business KPI fields across all production models
Daily Inferences: Sum of daily inferences across all production models
Models in Production: Count of deployed models meeting production criteria
Open Priority Issues: Count of high-severity notifications with active tickets

Individual Model Status:

Deployed Models by Business KPI: Model name and production snapshot
Business Unit: Model organization metadata
Business KPI: Cumulative business value calculated by Dashboard Model
Open Priority Issues: Count and 30-day trend
Heatmap: Red/yellow/green/gray status for key health indicators:
- Characteristic Stability (CSI)
- Performance Monitor (AUC, RMSE)
- Ethical Fairness (proportional parity)
- Data Drift (Kolmogorov-Smirnov p-values)
- Output Integrity
- Concept Drift
Daily Inferences: Inference count over period

Issues Section:

Issues by Business Unit: 30-day timeline grouped by organization
Issues by Type: Percentage breakdown by issue category

Running the Dashboard

Scheduled Execution: Built-in scheduler triggers Dashboard Model regularly. View/modify schedules via Dashboard gear icon > Scheduler.

On-Demand Execution: Click "Fire a Signal" button in scheduler to trigger immediate Dashboard Model run across all production models.

Execution Process:

Identify all models "deployed in production"
For each production model:
- Pass model data and assets to Dashboard Model
- Generate metrics and heatmap
- Apply DMN thresholds (dashboard_model.dmn)
- Create notifications for failures/errors

Dashboard Configuration

Prerequisites:

Dashboard Model deployed
Dashboard scheduler configured
Dashboard threshold file (DMN) present
Business models configured with:
- Extended schema (drift fields, labeled fields, score fields)
- Business value metadata
- Baseline and production comparator data assets
- Production deployment status

Threshold Configuration: Modify dashboard_model.dmn in Dashboard Model git repository. Decision table defines input metrics (including MAX/MIN aggregations) and output monitor names with color-coded statuses.

Learn more about Executive Dashboard

Integration with ModelOp Modules

Monitoring and Reporting integrates comprehensively with:

Inventory Module: Monitors added to use cases, implementations, and snapshots
Orchestration Module: MLC-triggered automated testing and monitoring
Operations Module: Job execution and runtime management
Compliance Module: Automated report generation and audit trails

‍

ModelOp Center

Govern and Scale All Your Enterprise AI Initiatives with ModelOp Center

ModelOp is the leading AI Governance software for enterprises and helps safeguard all AI initiatives — including both traditional and generative AI, whether built in-house or by third-party vendors — without stifling innovation.
‍
Through automation and integrations, ModelOp empowers enterprises to quickly address the critical governance and scale challenges necessary to protect and fully unlock the transformational value of enterprise AI — resulting in effective and responsible AI systems.

eBook

Whitepaper

4/30/2024

Minimum Viable Governance

Must-Have Capabilities to Protect Enterprises from AI Risks and Prepare for AI Regulations, including the EU AI Act

Download This White Paper

To See How ModelOp Center Can Help You Scale Your Approach to AI Governance

Download