The ModelOp Model Monitoring & Reporting Module provides comprehensive quality, risk, and process monitoring throughout the entire model lifecycle—enabling automated testing, continuous performance validation, ethical bias detection, and executive-level visibility across all AI/ML initiatives.
Monitoring Architecture
Core Concepts
ModelOp treats monitors as models themselves, enabling reuse, robust governance, and full auditability of the critical systems ensuring your decisioning assets perform optimally within governance thresholds.
Monitors as Associated Models: Monitors are registered as associated models that can be attached to one or more business model implementations, running on scheduled or triggered intervals.
Decision Tables for Thresholds: ModelOp uses industry-standard decision tables (DMN files) to define operational thresholds. This approach accommodates multiple factors—technical, statistical, business, and metadata—to precisely determine when models operate out of bounds.
Flexible Metrics Selection: Choose from out-of-the-box monitors or create custom monitoring functions using the ModelOp monitoring SDK Python package.
MLC Integration: Monitors integrate with model lifecycles, enabling automated comparison against model-specific thresholds and triggering remediation actions when deviations occur.
Learn more about Monitoring & Reporting Key Concepts
Selecting Evaluation Metrics
Metric selection should reflect the underlying business problem and occur early in model development.
Classification Metrics
Binary Classification with Imbalanced Classes: Accuracy is inadequate when class frequencies are severely unbalanced. Use F1 score or Fβ score (β > 1) to reward true positives and penalize false negatives.
Example: Compliance monitoring where violations are rare but critical to detect.
Multi-Class Classification: Select metrics appropriate for multiple outcome categories, considering class prevalence and misclassification costs.
Regression Metrics
Mean Absolute Error (MAE): Preferred when a few large errors with mostly small errors is acceptable.
Max Error: Use when no error should exceed a particular threshold.
Root Mean Squared Error (RMSE): Interpolates between MAE and max error approaches.
Interpretability Metrics
SHAP Values: Per-record explanations justifying individual predictions, integrated into scoring functions.
ROC Curve: True positive to false positive ratio visualization.
AUC (Area Under ROC Curve): Single-number summary of model discrimination ability.
Beyond statistical metrics, consider inference speed, interpretability, and operational constraints when promoting models to production.
Learn more about Monitoring & Reporting Key Concepts
Out-of-the-Box Monitors
ModelOp ships with comprehensive pre-built monitors covering traditional ML, NLP, and LLM models:
Traditional ML Monitors
- Performance metrics (accuracy, precision, recall, F1, AUC, RMSE)
- Stability monitoring (Characteristic Stability Index)
- Drift detection (data drift, concept drift)
- Normality testing
- Linearity assessment
- Autocorrelation analysis
NLP Monitors
- PII detection
- Sentiment analysis
- Top words by parts of speech
- SBERT similarity
LLM Monitors
- Prompt file validation
- PII detection in responses
- Sentiment analysis
- Fact checking
- Accuracy assessment
- SBERT similarity
- Rails compliance validation
- Bias detection in responses
All monitors can be customized via the ModelOp monitoring Python package.
Learn more about Out-of-the-Box Tests & Monitors
Access ModelOp Monitoring Package Documentation (subscribed customers only)
Preparing Models for Monitoring
Required Assets
Model-Specific Data Sets: Training/baseline data, test data, holdout data, and production data sets with timestamps enabling trend analysis over time.
Extended Schema: Defines input features and outputs with roles, types, and attributes for each field. Critical for enabling most tests and monitors.
Learn more about Adding/Managing Model Assets
Learn more about Adding/Managing Model Schemas
Adding Monitors to Models
Monitor Addition Wizard
- Import Monitor: Ensure desired monitor is imported and snapshotted in ModelOp Center (viewable in Inventory > Monitors)
- Navigate to Implementation: Select the business model snapshot for monitoring
- Launch Wizard: Click "Add Monitor" in the Monitoring tab
- Choose Monitor: Select specific monitoring model from available options
- Select Version: Choose monitor snapshot (version) for reusability and auditability
- Add Input Assets: Upload files, reference external storage (S3, HDFS, Azure Blob, Google Cloud Storage), specify SQL statements, use REST endpoints, or select existing model assets
- Configure Thresholds (Optional): Add decision table (DMN file) for automated threshold evaluation. Generate new DMN or select existing file for editing via DMN builder
- Set Schedule (Optional): Configure automated monitoring job frequency using wizard or cron expressions. Supports integration with enterprise schedulers (Control-M, AutoSys, SSIS) via API
- Define Parameters (Optional): Pass dynamic input parameters for scheduled monitoring jobs
- Review and Save: Confirm configuration and save monitor
Dynamic Parameters for Monitoring
Use placeholder notation for time-based data retrieval:
REST Assets: Add ${PARAMETER_NAME}
to Query Params or Form Data
SQL Assets: Add ${PARAMETER_NAME}
to SQL Parameters
URL Assets: Add $(PARAMETER_NAME)
to URL (note parentheses)
Pre-loaded Variables:
${QUARTER_START}
,${QUARTER_END}
${MONTH_START}
,${MONTH_END}
,${MONTH}
(full word)${WEEK_START}
,${WEEK_END}
${DAYS_AGO_30}
${TODAY}
or${CURRENT_DAY}
- Custom date format:
ASSET_REPLACE_FORMAT
variable
Learn more about Adding a Monitor
Running Monitors
On-Demand Execution (UI)
Navigate to business model snapshot page and click the play button next to the desired monitor. Monitor job initiates immediately with redirection to job page.
Scheduled Execution
Monitors run automatically based on configured schedules using MLC processes listening for scheduled signals.
Custom Scheduling (Enterprise Schedulers)
Integrate with Control-M, Airflow, AutoSys, or other enterprise schedulers:
- Obtain business model snapshot UUID (from snapshot page URL)
- Obtain monitoring model snapshot UUID (from Inventory > Monitors)
- Configure REST call to ModelOp Center:
- Obtain valid auth token
- POST to
<MOC_INSTANCE_URL>/mlc-service/rest/signal
- Include MLC signal name and model UUIDs in request body
Example CURL Command:
bash
curl 'https://mocinstance.modelop.center/mlc-service/rest/signalResponsive' \
-H 'Accept: application/json, text/plain, */*' \
-H 'Content-Type: application/json' \
-X POST \
-H 'Authorization: Bearer <token>' \
--data-raw '{"name":"com.modelop.mlc.definitions.Signals_Run_Associated_Model_Jira","variables":{"DEPLOYABLE_MODEL_ID":{"value":"<business-model-uuid>"},"ASSOCIATED_MODEL_ID":{"value":"<monitor-uuid>"}}}'
Monitor Execution Process
Once triggered, the MLC executes:
- Prepares monitoring job with required artifacts
- Creates and executes monitoring job
- Parses results into viewable test results
- Compares results against decision table thresholds
- Takes action (notifications, JIRA/ServiceNow tickets)
Learn more about Running a Monitor
Monitoring Results and Notifications
Test Results Display
Visual Elements:
- Summary Metrics: Key metrics as key:value pairs for quick reference
- Bias/Disparity Metrics: Group-level fairness measurements
- Group Metrics: Performance by protected class (TPR, TNR, FPR, FNR, precision, etc.)
- Charts and Graphs: Line graphs, bar charts, tables, scatter plots as configured in monitor output
Raw Results: Expandable/collapsible JSON representation accessible via Raw Results tab. Download complete JSON file from Jobs tab > job details > Outputs.
Notifications
Notifications appear under corresponding model test results with severity indicators. For configured ticketing systems (Jira, ServiceNow), ERROR-severity issues automatically create tickets with direct links.
Notification Types:
- Alerts: Test failures, model errors, runtime issues requiring response
- Tasks: User actions like approvals or acknowledgments
- System Notifications: Status updates, runtime information
Learn more about Running a Monitor
Champion/Challenger Model Comparison
Compare performance of different models or versions to determine optimal production candidates.
Comparison Workflows
Compare Distinct Model Implementations:
- Navigate to Inventory > Implementations
- Enable "Champion/Challenger" toggle
- Select models using checkboxes
- Click "Compare Models"
Compare Versions of Same Implementation:
- Open specific model implementation page
- Click "Enable champion challenger" in Snapshots section
- Select versions and click "Compare Models"
Comparison Wizard
- Add Models (Optional): Include additional models for comparison
- Select Snapshots: Choose specific versions using "+" button
- Select Test Results: Choose test results for comparison
- View Side-by-Side: Analyze comparative metrics in comparison screen
This capability supports evaluating different methodologies, approaches, frameworks, or languages against each other or validating that candidate models outperform currently deployed versions.
Learn more about Champion/Challenger Model Comparison
Building Custom Monitors
Metrics Function
Define custom metrics via the Metrics Function in model source code. Specify using # modelop.metrics
smart tag or select in Command Center after registration.
The Metrics Function executes against batch records and yields JSON: {"metric_1": <value_1>, ..., "metric_n": <value_n>}
. Values populate Test Results visuals in UI.
Python Custom Monitor Example
python
# modelop.metrics
def metrics(x):
# Load model artifacts
lasso_model = lasso_model_artifacts['lasso_model']
# Preprocess and generate predictions
actuals = x.flagged
predictions = generate_predictions(x)
# Calculate metrics
confusion_matrix = sklearn.metrics.confusion_matrix(actuals, predictions)
fpr, tpr, thres = sklearn.metrics.roc_curve(actuals, predictions)
auc_val = sklearn.metrics.auc(fpr, tpr)
f2_score = sklearn.metrics.fbeta_score(actuals, predictions, beta=2)
# Format results
roc_curve = [{'fpr': x[0], 'tpr': x[1]} for x in list(zip(fpr, tpr))]
labels = ['Compliant', 'Non-Compliant']
cm = matrix_to_dicts(confusion_matrix, labels)
test_results = dict(
roc_curve=roc_curve,
auc=auc_val,
f2_score=f2_score,
confusion_matrix=cm
)
yield test_results
R Custom Monitor Example
r
# import libraries
library(tidymodels)
library(readr)# modelop.init
begin <- function() {
# Initialization steps
}# modelop.metrics
metrics <- function(data) {
df <- data.frame(data)
get_metrics <- metric_set(f_meas, accuracy, sensitivity, specificity, precision)
output <- get_metrics(data = df, truth = as.factor(label_value),
estimate = as.factor(score))
mtr <- list(PerformanceMetrics=output)
emit(mtr)
}
Custom Monitor Output Formats
Line Graphs (timeseries, generic, decimal):
json
{
"time_line_graph": {
"title": "Example Line Graph - Timeseries Data",
"x_axis_label": "X Axis",
"y_axis_label": "Y Axis",
"data": {
"data1": [["2023-02-27T20:10:20", 100], ["2023-03-01T20:10:20", 200]],
"data2": [["2023-02-28T20:10:20", 350], ["2023-03-02T20:10:20", 250]]
}
}
}
Bar Charts (vertical, horizontal):
json
{
"generic_bar_graph": {
"title": "Example Bar Chart",
"x_axis_label": "X Axis",
"y_axis_label": "Y Axis",
"rotated": false,
"data": {
"data1": [1, 2, 3, 4],
"data2": [4, 3, 2, 1]
},
"categories": ["cat1", "cat2", "cat3", "cat4"]
}
}
Tables:
json
{
"generic_table": [
{"data1": 1, "data2": 2, "data3": 3},
{"data1": 2, "data2": 3, "data3": 4},
{"data1": 3, "data2": 4, "data3": 5}
]
}
Additional Formats: Scatter plots, pie charts, donut charts
Learn more about Building a Custom Test or Monitor
Learn more about Monitor Output Structure
Defining Required Assets
Create required_assets.json
in your monitor repository to enforce asset acquisition:
json
[
{
"usage": "INPUT_ASSET",
"assetRole": "BASELINE_DATA",
"metaData": {"description": "First Training Dataset"}
},
{
"usage": "ADDITIONAL_ASSET",
"name": ".*\\.pkl",
"filename": "weights/weights.pkl",
"metaData": {"description": "Weights File"}
}
]
Usage Types:
- INPUT_ASSET: Passed to metric function as argument
- ADDITIONAL_ASSET: Materialized in runtime, available for model loading
Optional Fields: assetRole
, name
(regex), filename
, description
Ethical Bias Monitoring
Bias Detection Approach
Organizations require visibility into prediction formation, particularly regarding unfair or partial results for protected classes. Bias monitors run routinely against labeled and scored data batches, alerting when production metrics deviate beyond pre-set thresholds.
Example: German Credit Data: Gender (reported as status_sex attribute) is a protected attribute. Simply excluding gender from training doesn't guarantee unbiased models—gender may correlate with other attributes like income.
Aequitas Integration
ModelOp leverages Aequitas for bias and fairness metrics given labeled/scored data sets and protected attributes:
Group Metrics: TPR, TNR, FPR, FNR, FOR, FDR, NPV, precision, PPR, PPREV, prevalence by protected class groups
Disparity Metrics: Ratios comparing groups to reference groups (e.g., FPR disparity = female FPR / male FPR)
Example: If females have FPR=0.32 and males have FPR=0.16, the FPR disparity is 2.0, indicating females are twice as likely to be falsely labeled as high-risk.
Bias Monitor Assumptions
Requirements:
- Extended schema asset for input data
- Binary classification model type
- Categorical protected class features
- Input data with label (ground truth), score (model output), and protected class columns
Numerical Protected Classes: While not supported out-of-the-box, bias monitors can be modified to handle numerical protected classes (e.g., continuous age) by specifying cutoffs to create categorical buckets.
Bias Monitor Execution
- Init Function: Extracts extended input schema from job JSON
- Parameter Setting: Determines protected classes, label column, score column from schema
- Metrics Calculation: Runs Aequitas Bias and/or Group tests for each protected class with automatic reference group selection
- Result Compilation: Appends bias tests to returned results
Monitor Variants:
- Group Metrics Only: TPR, TNR, FOR, FDR, FPR, FNR, NPV, precision, PPR, PPREV, prevalence
- Disparity Metrics Only: All disparity ratios
- Combined: Both group and disparity metrics
Learn more about Running a Monitor
Documentation Generation
Automated Documentation
ModelOp streamlines compliance documentation by automatically populating templates with model metadata, metrics, and test results—saving time while ensuring enterprise-wide consistency.
Supported Document Types
- Model Cards: Markdown (.MD) format using industry-standard templates (Hugging Face extended)
- Microsoft Word Documents: (.DOCX) format with custom enterprise templates
Available Information
Use Case/Model Information: Complete storedModel object metadata
Snapshot Information: Complete deployableModel object metadata
Model Test Results: All monitoring and test result data
ModelOp Tag Structure
Insert placeholders in templates using format: <<modelop.object.field>>
Base Tag Structures:
- Use Case/Model:
<<modelop.storedModel>>
- Snapshot:
<<modelop.snapshot>>
- Model Test Result:
<<modelop.modelTestResult>>
Access Methods:
- Simple:
<<modelop.storedModel.name>>
- Nested:
<<modelop.storedModel.modelMetaData.custom.element>>
- Indexed:
<<modelop.modelTestResult.metric[0]>>
- Mapped:
<<modelop.modelTestResult.metric(key)>>
- Combined:
<<modelop.modelTestResult.metric.nested[index](key)>>
Generating Tables
- Create custom monitor output with
generic_table
field - Insert 1x1 table in Word template
- Add tag:
<<modelop.modelTestResult.testResults.(generic_table)>>
Result: Automatic table generation from test results
Generating Graphs
Line Graphs (timeseries, generic, decimal):
- Configure monitor output with appropriate structure
- Add tag:
<<modelopgraph.linegraph.generic_line_graph>>
Bar Charts (vertical, horizontal):
- Configure monitor output with bar chart structure
- Add tag:
<<modelopgraph.bargraph.generic_bar_graph>>
Learn more about Documentation Generation
Executive Dashboard
Dashboard Overview
The Executive Dashboard provides enterprise-wide visibility into business value, operational/IT health, risk metrics, and data science KPIs for all production models across diverse platforms, languages, and frameworks.
Key Terminology:
Model in Production: Snapshot in state=DEPLOYED to runtime with inProduction=true flag
Open Priority Issue: Notification with severity=CRITICAL or HIGH having attached ticket (JIRA, ServiceNow) with elevated priority
Dashboard Sections
Summary Section:
- Cumulative Value: Sum of dollar amounts from Business KPI fields across all production models
- Daily Inferences: Sum of daily inferences across all production models
- Models in Production: Count of deployed models meeting production criteria
- Open Priority Issues: Count of high-severity notifications with active tickets
Individual Model Status:
- Deployed Models by Business KPI: Model name and production snapshot
- Business Unit: Model organization metadata
- Business KPI: Cumulative business value calculated by Dashboard Model
- Open Priority Issues: Count and 30-day trend
- Heatmap: Red/yellow/green/gray status for key health indicators:
- Characteristic Stability (CSI)
- Performance Monitor (AUC, RMSE)
- Ethical Fairness (proportional parity)
- Data Drift (Kolmogorov-Smirnov p-values)
- Output Integrity
- Concept Drift
- Daily Inferences: Inference count over period
Issues Section:
- Issues by Business Unit: 30-day timeline grouped by organization
- Issues by Type: Percentage breakdown by issue category
Running the Dashboard
Scheduled Execution: Built-in scheduler triggers Dashboard Model regularly. View/modify schedules via Dashboard gear icon > Scheduler.
On-Demand Execution: Click "Fire a Signal" button in scheduler to trigger immediate Dashboard Model run across all production models.
Execution Process:
- Identify all models "deployed in production"
- For each production model:
- Pass model data and assets to Dashboard Model
- Generate metrics and heatmap
- Apply DMN thresholds (dashboard_model.dmn)
- Create notifications for failures/errors
Dashboard Configuration
Prerequisites:
- Dashboard Model deployed
- Dashboard scheduler configured
- Dashboard threshold file (DMN) present
- Business models configured with:
- Extended schema (drift fields, labeled fields, score fields)
- Business value metadata
- Baseline and production comparator data assets
- Production deployment status
Threshold Configuration: Modify dashboard_model.dmn
in Dashboard Model git repository. Decision table defines input metrics (including MAX/MIN aggregations) and output monitor names with color-coded statuses.
Learn more about Executive Dashboard
Integration with ModelOp Modules
Monitoring and Reporting integrates comprehensively with:
- Inventory Module: Monitors added to use cases, implementations, and snapshots
- Orchestration Module: MLC-triggered automated testing and monitoring
- Operations Module: Job execution and runtime management
- Compliance Module: Automated report generation and audit trails
Govern and Scale All Your Enterprise AI Initiatives with ModelOp Center
ModelOp is the leading AI Governance software for enterprises and helps safeguard all AI initiatives — including both traditional and generative AI, whether built in-house or by third-party vendors — without stifling innovation.
Through automation and integrations, ModelOp empowers enterprises to quickly address the critical governance and scale challenges necessary to protect and fully unlock the transformational value of enterprise AI — resulting in effective and responsible AI systems.
To See How ModelOp Center Can Help You Scale Your Approach to AI Governance