Company

How To Score Big Data At Scale

Modern businesses leverage analytics to gain insights in a multitude of areas, from evaluating business performance to predicting future behaviors. In many industries, these insights are quantified numerically as “scores,” and the process of applying an analytic model to transform a collection of data into scores is called “scoring.”

Often, the more data that’s available to train the model, the more accurate and valuable the model will be. In today’s world, businesses have more data than ever before. While more data typically leads to more valuable models, an increasing quantity of data can strain many traditional end-to-end model scoring processes. In this blog post, we’ll look at how to overcome this problem by building scalable, big data-ready model deployment and scoring architectures.

Before we begin to discuss how to properly achieve scalable, big data-ready model deployment and scoring architectures, we need to understand exactly what scoring is. The best example to have in mind in terms of what scoring is, is a credit score. The model takes in a particular individual’s demographic information and credit history, and then produces a number which represents how safe they are to lend money too. In different industries, scores can represent different things, but the process is more complicated than “multiplying numbers and sending them off to be scored.”

Now that we understand what “scoring” is, let’s go through a few of the challenges that a company often faces when scoring with big data. One of the biggest obstacles in working with big data begins within a model’s training phase. If the dataset a model is utilizing for the training process is too large to fit into memory, then the modeler has to get creative about how to train the model. For example, the modeler can train the model on subsets of the data and merge the results. Depending on the type of model, this may not even be possible.

Once the model has been trained and is ready for deployment, the next big data challenge is encountered. The modeler has to ask themself, “What if the model needs to score a large quantity of data?”

The best-case scenario at this stage is that the model can be deployed in a “streaming” fashion. That is, mathematically, the model satisfies a couple of properties:

  1. The model produces scores “one record at a time”, which means that the model can be run on arbitrarily small subsets of the input data and the resulting scores are still valid, and
  2. Ideally, the scores produced for each individual record are independent of the scores of other records in the set, or the order the inputs are received in.

Both of these properties hold for many common types of models, such as: gradient boosting machines, logistic and linear regression models, (non-recurrent) neural nets, etc. If a model satisfies both of these properties, then the burden of scaling the model across large input datasets falls squarely on the infrastructure and deployment engine, rather than the model itself. These models can then be run concurrently to handle high data throughputs. Through this process, models do not have a significant memory footprint because only small subsets of the input data are ever loaded into memory.

At Open Data Group, we understand the importance of scoring big datasets in a way that is both fast and scalable, which is why we created FastScore. FastScore supports running models in high-throughput enterprise streaming data platforms (such as Kafka), and provides push-button concurrency and scaling. The FastScore scoring engine also supports advanced state sharing and state management functionality, which allows even typically single-threaded R models to be easily run concurrently and in a streaming manner. FastScore is packaged as a Docker container, the engine (and your models) can be deployed anywhere: on your physical hardware, in the cloud, or any combination thereof, and scaled with demand. To learn more about how to use FastScore to overcome your big data analytic challenges, click here!

All ModelOp Blog Posts 

Forbes Articles by Stu Bailey

Forbes Articles by Stu Bailey

The Enterprise AI Challenge: Common Misconceptions – January 15, 2020 Misconception 1 (of 5): Enterprise AI Is Primarily About The Technology – January 31, 2020

Q&A with Ben Mackenzie, AI Architect

Q&A with Ben Mackenzie, AI Architect

2 Minute Read By Ben Mackenzie & Linda Maggi How AI Architects are the Key to Operationalize and Scale Your AI Initiatives Each week we meet more and more clients who are realizing the importance of operationalizing the AI model lifecycle and who are dismissing...

AI Needs to Break Free from “Frozen” Processes

AI Needs to Break Free from “Frozen” Processes

4 Minute Read By Scott Rose There is no disputing that artificial intelligence (AI) has had a massive impact on a broad range of human activities, an impact that has been widely publicized. Accounts like this one from WIRED magazine are impressive. But then...

24 Basic Bullets  For Brewing Better Beer

24 Basic Bullets For Brewing Better Beer

4 Minute Read By Greg Lorence This time, I figured I’d rewind a bit from the last couple of posts, wherein I drove lots of beer a very long way for a very important work party, and change the focus a bit. Now, this is certainly not directly related to the work we do...

ModelOp Golden Ale Takes a Holiday – Part 2

ModelOp Golden Ale Takes a Holiday – Part 2

2 Minute Read By Greg Lorence Before we go much further, I feel obligated to state what is likely already obvious: I’m not all about that #InstaLife. All accompanying photography was snapped with little regard for composition, typically while stretching out from 4-6...

Behind the scene of ModelOp by our Brewmasters- Part1

Behind the scene of ModelOp by our Brewmasters- Part1

2 Minute Read By Greg Lorence As a long-time homebrewer, when our President, Scott asked me, “wouldn’t it be cool if you and Jim brewed a beer to commemorate our rebrand later this year?” my reaction, after the immediate “heck yeah! Beer is awesome”, was honestly...

Open Data Group Officially Becomes ModelOp

Open Data Group Rebrands as ModelOp to Help Enterprises put AI and Machine Learning Models in Business at Scale AIT News Desk, 22 October 2019   ModelOp targets AI-based model operations in large enterprises Converge! Network Digest, 21 October 2019   Open...