Company Blog

Analytic Deployment Stacks and Frameworks (Part 2): Models Abstraction

This is part 2 in a series discussing an approach to effective deployment of analytic models at scale.  You can find part 1 here.  

Our first abstraction intended to aide the coordination of analytic designers and analytic deployment is the model.  As an abstract entity, a model has four main components.

  • input (factors)
  • output (scores and other data)
  • state (including initial state, usually trained or fitted to historical data)
  • the math (some times what is called the “scoring function”)

The math can be written or automatically generated in any language: Python, R, Java, etc.  The math is math, which by definition is language neutral.  Generally the structure of the math, for example GBMs or Random forests are templated in language and the specific structures and numeric values are filled in by a “fitting tool” like R or Spark or even by hand by the data scientist.  The math is like a black box, it takes the input (sometimes called factors), current state and generates the output (sometimes called y-hat or score or prediction or outcome or insight) and potentially some meta data.  This output data may then be passed to another stage in an analytics chain or used to monitor and report on the process etc.

We call the application of the input “scoring” but that’s just a technical jargon for taking a given input and a “current state” and applying the “action” of the math to produce an “output” for that particular input and state combination. 

So let’s look two implementations of a trained neural net model of that fit this abstraction. One is in articulated in Python and one in R.  First let’s look at the Python Model:

# input: array-double
# output: double

# A Neural Net model
# y = f(W_2 f( W_1 x + b1) + b2) where
# x = input vector, y = output vector
# f is the activator function, and W_1, W_2 are the weights
# and b1, b2 are the bias vectors
# In this example, the neural net computes XOR of the inputs

import numpy as np
import math

# The Initial state

def begin():
global W_1, W_2, b_1, b_2, f
f = np.vectorize(activator)
W_1 = [[-6.0, -8.0], [-25.0, -30.0]]
b_1 = [4.0, 50.0]
W_2 = [[-12.0, 30.0]]
b_2 = -25.0

# The math with datum as type “input” and output at “yield”
def action(datum):
x = np.array(datum)
y = f(np.dot(W_2, f(np.dot(W_1, x) + b_1 ) ) + b_2)
yield y[0] # the dot product returns a numpy array with one element

# Supporting functions
# Here we use a sigmoid logistic activator function
# but you can define your own
def activator(x):
return 1 / (1 + math.exp(-x))

And here is the R model:

# input: array-double
# output: double

# A Neural Net model
# y = f(W_2 f( W_1 x + b1) + b2) where
# x = input vector, y = output vector
# f is the activator function, and W_1, W_2 are the weights
# and b1, b2 are the bias vectors
# In this example, the neural net computes XOR of the inputs

# The Initial state

begin <- function(){
W_1 <<- matrix(c(-6, -8, -25, -30), nc = 2, byrow = T)
W_2 <<- matrix(c(-12, 30), nc = 1)
b_1 <<- c(4, 50)
b_2 <<- -25
}

# Supporting functions

activator <- function(x){
y <- 1/(1+exp(-x))
return(y)
}

# The math with datum as type “input” and output at “emit”

action <- function(datum){
x <- matrix(datum, nc = 1)
y <- activator(t(W_2) %*% activator((W_1 %*% x) + b_1) + b_2)
emit(y[[1]])
}

Input “schema” or “type system” can be described with a language neutral system like Avro Schema:

{"type" : "array", "items" : "double"}

The output typing can be similarly described:

{"type" : "double"}

In fact, requiring a typed input / output set is essential to achieve scalable data science competency at scale.  You can also notice the regularity of structure and code points:

begin for the initial state

action(datum) where the math is defined over datum which is defined with the “input schema” describing an array of type double.  And some supporting functions for code organization, in this case the activator function of the neural net. 

It turns out most analytic models can be easily conformed to this structure.  Moreover, doing so allows them to be executed by an agnostic engine which abstracts some of the messy DevOps and Data Engineering details away from this fairly clean math.  We’ll deal with the abstraction “streams” in our next blog.

By Stu Bailey

All ModelOp Blog Posts 

Machine Learning Model Interpretation

To either a model-driven company or a company catching up with the rapid adoption of AI in the industry, machine learning model interpretation has become a key factor that helps to make decisions towards promoting models into business. This is not an easy task --...

Matching for Non Random Studies

Experimental designs such as A/B testing are a cornerstone of statistical practice. By randomly assigning treatments to subjects, we can test the effect of a test versus a control (as in a clinical trial for a proposed new drug) or can determine which of several web...

Distances And Data Science

We're all aware of what 'distance' means in real-life scenarios, and how our notion of what 'distance' means can change with context. If we're talking about the distance from the ODG office to one of our favorite lunch spots, we probably mean the distance we walk when...

Communicating between Go and Python or R

Data science and engineering teams at Open Data Group are polyglot by design: we like to choose the best tool for the task at hand. Most of the time, this means our services and components communicate through things like client libraries and RESTful APIs. But...

An Introduction to Hierarchical Models

In a previous post we gave an introduction to Stan and PyStan using a basic Bayesian logistic regression model. There isn't generally a compelling reason to use sophisticated Bayesian techniques to build a logistic regression model. This could be easily replicated...