All Hands on Data #30
As we sail deeper into holiday season, our team wishes you a Happy Holidays! Hopefully these articles can help you hold off from opening those presents under the tree.
Goodbye, Data Science
In a refreshingly blunt post, the author digs into some of their issues with the "data scientist" role. Specifically, one of the things the author has done, in their transition to "data engineer", was to focus on the data pipelines - a major pain point in the "data scientist" position. Building robust, dependable pipelines is more important to a project than adding another scientist to crunch the numbers. - John Forstmeier
MLOps isn't DevOps for ML!
While the true definition of MLOps (and DataOps for that matter) might be murky, it's important not to confuse it for the next evolution of DevOps. Instead, a critical and fresh look should be taken for all of the unique components that ultimately result in a flow of Machine Learning models driving the business. Partner with DevOps to understand structure, but don't expect them their work to transition 100% to MLOps. - Blake Burch
Check if You've Been Naughty or Nice Based on your Reddit Comments
Ever wonder how Santa can monitor everyone’s action across the world? Not just actions, but he also has to look at text messages and comments on social media. He has to use machine learning right? Well even if he doesn’t, a tool exists now to allow you see get into the mind of Santa and determine if you’ve been naughty or nice based on your Reddit comment history. - Steven Johnson
How to Build A Data Inventory At Your Organization
I've mentioned this before - but I'm fairly new to the data space. I find it helpful to see overviews of how teams can be structured, what tools they can use, and how they can best make use of their data. If you're a startup looking to make sense of your data - this is a good place to start. - Joseph McDermott
Gambler's Ruin Problem: A Probability Antiquity
I enjoy problems from antiquity and how we can interpret and improve our answers over time. The cool bit is taking a recursive relationship and applying it to get to a final probability function and how that relates to memoization in the programmatic solution. - Eric Elsken
The Role of Data Governance in Data Management
In today's world where data is so important to a businesses success, articles like this are very important. Data Governance is an important part of managing data and businesses need to be aware of how best to approach it. - Jon Davidson
4x Faster Pandas Operations with Minimal Code Change
What a bit more speed? Tang writes a quick and dirty introduction to 'pandarallel' that can help speed up your data processing in minutes. One quick note, do "not use Pandarallel when the data cannot be fit into memory." - Katt Baum
OpenAI releases Point-E, which is like DALL-E but for 3D modeling
OpenAI has been one of the biggest surprises for me this year. Maybe I was not informed of the space before, but I am now. With Dall-E, ChatGPT, and now Point-E, OpenAI has released some really cool products that have changed the landscape of the AI field especially for non-technical people. I look forward to playing with Point-E as I did before with Dall-E and ChatGPT - Steven Johnson