All Hands on Data #60
This week's All Hands on Data takes a step back from generative AI and looks at the data industry at large. We promise to cover our robot overlords again soon!
4 pillars of modern data quality
This piece presents a compelling argument for rethinking our approach to data quality, emphasizing its importance as a business issue rather than solely a technical one. It introduces the pillars of modern data quality, including top-down business KPIs, product thinking, data observability, and overall data governance. By offering practical examples and insights, it helps readers understand how these concepts can be applied to improve the reliability and usability of data within an organization. The shift in perspective it proposes from data as mere infrastructure to data as a product and an asset is an essential read for anyone interested in optimizing data management practices and strategies. - Jon Davidson
Synthetic data & safety
A little outside of our usual wheelhouse, this article caught my attention given the increased attention generally being placed on synthetic data as a means for training models and avoiding leaking personal information. What's interesting here, is that the author actually looks at it from a functional perspective - when we generate synthetic data to represent events that haven't actually happened, that's a pretty assumptive leap. - John Forstmeier
What enterprises can learn about data infrastructure from Cruise driverless cars
With all the AI hype, it talks about how data is at the core of being able to make sure you can successfully deploy the models AI needs. - Angel Catalan
Should we be more data-driven? Sometimes.
I like how Robert talks about the realistic approach to using data. Sometimes you use analytics, and sometimes you go with your instincts. He creates quadrants to help you decide when to used either or both based on two factors: importance and speed needed. - Arynn Martin-Post
Data Modeling 101 - Part 2
This is a great semi-deep-dive into the considerations of any data modeling effort, whether that is in an RDBMS or a Lake House. While the technique and technical approach may differ, depending on the tool or the size of the data set, the general concepts and themes of normalization, data types, constraints, etc., will be the same throughout. - Wes Poulsen
The Right Way to Measure ROI on Data Quality
As the budget of data teams shrink, it can be easy to overlook the work of data preparation. The ROI on the quality of your data can be hard to calculate versus ML models and dashboards. Barr does a great job looking at how you can calculate ROI on data quality in this article. - Steven Johnson