Company

Histograms and High Level Languages at StrangeLoop

This year’s StrangeLoop conference is less than a week away and I’m psyched. This meeting with an odd name lies at the intersection of an odd blend of topics, including distributed systems, languages, and data science. It would be a natural place for me to talk about PFA, which covers all three, but instead I decided to talk about something new: a language of histogram aggregation called Histo·grammar.

Histo·grammar arose from trying to fit together two conflicting philosophies of how to aggregate data. Histograms are the bread and butter of my first field of study, high energy physics, and high energy physics software views histograms as objects to be filled, like lists in LISP or dictionaries in Python. Non-physics analysis software views histograms as the statistical abstractions they technically are, an approximation of a dataset’s distribution. Physics code is infinitely scalable because histograms can forever accumulate data in-place, but it is cumbersome to use in a functional paradigm like Apache Spark. Non-physics histogram APIs are restrictive in how they let you add or access the aggregated data. The key to getting the best of both is to keep the idea of a histogram as a container, but make it a functional container that knows how to fill itself.

To non-physicists, my focus on histograms might seem narrow: after all, isn’t a histogram just one type of chart? According to the statistician’s definition, yes, but the ways physicists have used (abused?) histogram-filling software over the past forty years has led to much, much more. Histo·grammar makes this generality explicit by splitting the histogram into its constituent atoms— composable primitives of data-aggregation that can be used to build a statistician’s histogram and many other aggregate structures.

As datasets get larger in all fields, having a way to summarize them with complex aggregations will be increasingly important. I’ll show how the same declarative language can slice and dice data in HDFS, can be JIT-compiled for blazing speed, and can even be parallelized across vector devices like GPUs.

Around the time I was developing PFA, someone asked me if it was a big transition from particle physics to data science. I said no, because particle physics is the most industrial field in academia and data science is the most academic field in industry. Conferences like StrangeLoop prove this point, in that philosophical musings on some esoteric language can be followed by the next big software stack. If you’ll be there, I’m the guy with the long, scraggly beard (non-unique identifier?) and would love to hear your latest great idea.

A link to an overview of my talk can be found here.

Written by Jim Pivarski

All ModelOp Blog Posts 

Forbes Articles by Stu Bailey

Forbes Articles by Stu Bailey

The Enterprise AI Challenge: Common Misconceptions – January 15, 2020 Misconception 1 (of 5): Enterprise AI Is Primarily About The Technology – January 31, 2020

Q&A with Ben Mackenzie, AI Architect

Q&A with Ben Mackenzie, AI Architect

2 Minute Read By Ben Mackenzie & Linda Maggi How AI Architects are the Key to Operationalize and Scale Your AI Initiatives Each week we meet more and more clients who are realizing the importance of operationalizing the AI model lifecycle and who are dismissing...

AI Needs to Break Free from “Frozen” Processes

AI Needs to Break Free from “Frozen” Processes

4 Minute Read By Scott Rose There is no disputing that artificial intelligence (AI) has had a massive impact on a broad range of human activities, an impact that has been widely publicized. Accounts like this one from WIRED magazine are impressive. But then...

24 Basic Bullets  For Brewing Better Beer

24 Basic Bullets For Brewing Better Beer

4 Minute Read By Greg Lorence This time, I figured I’d rewind a bit from the last couple of posts, wherein I drove lots of beer a very long way for a very important work party, and change the focus a bit. Now, this is certainly not directly related to the work we do...

ModelOp Golden Ale Takes a Holiday – Part 2

ModelOp Golden Ale Takes a Holiday – Part 2

2 Minute Read By Greg Lorence Before we go much further, I feel obligated to state what is likely already obvious: I’m not all about that #InstaLife. All accompanying photography was snapped with little regard for composition, typically while stretching out from 4-6...

Behind the scene of ModelOp by our Brewmasters- Part1

Behind the scene of ModelOp by our Brewmasters- Part1

2 Minute Read By Greg Lorence As a long-time homebrewer, when our President, Scott asked me, “wouldn’t it be cool if you and Jim brewed a beer to commemorate our rebrand later this year?” my reaction, after the immediate “heck yeah! Beer is awesome”, was honestly...

Open Data Group Officially Becomes ModelOp

Open Data Group Rebrands as ModelOp to Help Enterprises put AI and Machine Learning Models in Business at Scale AIT News Desk, 22 October 2019   ModelOp targets AI-based model operations in large enterprises Converge! Network Digest, 21 October 2019   Open...