How are Analytic Engines different than “deployment servers”?
Well, did you do last week’s homework? If you used the emerging analytic standards of PFA or PPFA to create an analytic for trajectory planning, please send it to email@example.com. You just might get a job offer! Heck, if you use the open source tools Aurelius or Titus to convert your R or Python models to PFA we’d love to hear it, and share your work with the growing community of analytic engine users.
This week I want to spend more time on the abstraction I named that executes analytics: analytic engines. Again, if you think of “an analytic” as something separate from the tools, techniques, and people that created it, it naturally leads to the concept of a thing that executes, runs or deploys the analytic. This is similar to how relational databases allow us to “interact with” or “query” information or structured data in a way that is not too tied to a particular language or information philosophy like “object orientation”.
Let’s review: relational databases let you interact with or query information while analytic engines let you run, deploy or execute analytics.
The analogy to relational databases doesn’t end there. As most computer professionals know, relational databases can be run on any type of computer from supercomputers, to laptops, Raspberry Pis, and of course The Cloud. As an aside, I think my morning coffee is migrating to The Cloud along with everything else. Furthermore, relational databases support an almost unlimited number of underlying data storage architectures and implementations: disk, memory, files, raw partitions, and even computer networks.
The analytic engine abstraction provides the same kind of portability, flexibility, and environmental diversity. An ideal analytic engine allows us to run our analytic anywhere with the same safe, repeatable, reliable, and scalable results no matter what the environment. Sounds great, no? But what does an analytic engine allow you to do?
The short answer is to accelerate the “time to value” of analytic projects. Wow, that is some marketing speak! What does that mean? It means that if you spend some time and energy creating analytics that you THINK are valuable to your organization or mission, an analytic engine will help you realize that potential value (or invalidate it) quickly.
I know we talked about pumpkin chunkin’ analytics, recently, but now that we are getting into the real value of an analytic engine, let’s discuss more concrete applications like credit card fraud detection or stock market abuse identification. Given the best “analytic” fraud detector an organization can create, we still need to “deploy” that analytic into the operational data streams: card transactions and/or market order data. Also, we will likely iterate the fraud detector quickly since the bad guys are pretty creative. Easily deploying into data streams and allowing analytics that are deployed to change quickly are some of the concrete advantages of using an analytic engine.
Analytic engines are designed to hook into the data streams–like credit card transactions and other environmental data–that the analytic needs to do its best job detecting fraud. Once the engine or engines are hooked up to a data stream you can deploy hundreds or even thousands of analytics running concurrently for a given application. The power of abstracting the scoring of a model, delivered through an analytic engine, enables you to approach “one click” or even programmatic loading or “deployment” of the analytic and immediately start detecting fraud! And if your data scientists quickly decide to change the analytic in an engine or two, that is easy to do..
Some people ask: “Well, don’t the typical analytic generating tools like SAS, R, and scikit-learn have deployment engines built in?” The answer is yes; most folks that use exclusively those packages have some kind of deployment technique for those models. Typically called “deployment servers”, these systems are tied to a particular tool like an “R deployment server”. But there are a dizzying number of analytic creation tools emerging in the market challenging the big incumbent SAS. These include offerings from H20, Data Robot, Nutonian, Graphlab Create, and more seem to arrive weekly.
Since all the details of “deploying” analytics are pretty much the same no matter what the tools or technique used to create the analytic, there is no need to learn the specific but arbitrary nuances of an R specific or scikit-learn specific “deployment server”. A general analytic engine is the perfect abstraction to allow maximum flexibility and minimal operational expenses while scaling analytic deployments. Analytic engines are uniquely capable of increasing the time to value for all the resources and energy an organization spends creating their special secret sauce analytics!
Are there examples of general analytic engines that support standards to ensure maximum interoperability with all the old and new analytic creation tools and maximum value to the organization? Well, of course there are! My favorite happens to be the one my company created, sells, and supports. But since there is a very clear analytic standard PFA, you could build one yourself!
Next week I’ll introduce some of the things that robust Analytic Engines should focus on and how that in turn leads to thinking hard about the capability we call AnalyticOps.
Written by Stu Bailey