All Hands on Data #73
Heard that data's the new oil? Turns out, it might be the new water—vital yet plentiful. Dive in to uncover AI trends, data debates, Spark's possible adieu, and more.
Anaconda's State of Data Science Report Report Reveals Surge in AI Upskilling Among Data and IT Professionals
The "State of Data Science" report by Anaconda Inc. highlights the rapid integration of generative AI in the industry, with many companies adopting this technology. However, this swift change has led to job security concerns among IT and data science professionals. Concurrently, while open-source software is popular among data scientists, there's a significant gap in confidently identifying its vulnerabilities, posing potential risks. - Johnathan Rodriguez
Is Data Still a Moat?
While companies previously went all-in on the "data is the new oil" movement, the rise of LLMs poses some challenges to that position. Sure, having that data now may let you train a model BUT there is empirical evidence that you don't actually need that much to generate a useful fine tuned tool. Whoopsie. - John Forstmeier
Goodbye Spark. Hello Polars + Delta Lake.
The presents the case for how Spark has been used over the course of the last decade out of necessity because the alternative tooling is inadequate. That being said, many use cases of Spark are overkill and just add additional overhead and complexity to the project. With polars, the need for performance is met while keeping things relatively simple and not having to spin up a spark cluster - Wes Poulsen
Meta Enters The AI Race With An "Open" Approach
I'm encouraged to see that a large company has come out with an open-ish source model that is gaining a lot of traction. For the open vs closed model debate, I don't think one will ever win, but I don't think Meta's version of open-until-it-isn't will be the open source hero small AI practitioners will need it to be. - Eric Elsken
From pipelines to platforms
In a world where data engineers are a hot, limited commodity, Robert shares his data team's setup to automate data engineering efforts and create a "data flywheel". There's some clever team structuring and process setup in here! - Blake Burch
Self-Service Data Not Quite a Reality Yet, Capital One Software Says
It seems I can't go a day without hearing about "self-service data." From a vendor perspective, some claim they can make it a reality today. Most in my experience are skeptical. This article interests me as, shocker, it shares reasons why "self-service data" isn't happening for most and the variety of changes that need to happen for it to not only be made a reality, but leveraged,. - Shawn Fergus