All Hands on Data #48
Welcome into the last AHoD of April! Our team hopes these articles can hold you over until May!
How to Use ChatGPT to Improve Your Data Science Skills
In what is likely becoming a common refrain, the author asks ChatGPT for some advice on how to better navigate the data science field. Honestly, the responses were clear but nothing particularly ground breaking in my humble opinion. And there's one big caveat about asking ChatGPT about recent trends: the model is only trained up to a certain point (2022 I think) meaning it won't actually know any very new information. - John Forstmeier
Stop Managing Infrastructure, Start Managing Data
As we look to work with more enterprise clients and start to see these large companies (like AutoZone) finally moving over into cloud-based architecture. This article definitely ties into the value proposition Shipyard can bring teams - Angel Catalan
Prompt Injection: What's the worst that can happen?
Do you know the security risks of over-relying on LLMs? In this article, Simon explores a lot of ways for prompt injection to occur and lays out what the potential impact could be. Could AI assistants that read your email delete your email if a malicious actor sends you an email whose contents instruct the assistant to do so? You bet. This is a fascinating exploration of things to watch out for and a thought provoking call to action to increase security before we rely too much on these systems. - Blake Burch
Building Better ML Systems - Chapter 1: Every Project Must Start with a Plan
Chernytska lays out how studying for a career in data science is much different than actually having said career. Bottom line... it is easy to get overwhelmed by the chaos out here. So, make a plan! While they are familiar steps for seasoned data scientists, it doesn't hurt to review. One of my favorite steps? Write a design document: we have instituted "no docs, no work" policy here at Shipyard and it has been great. - Katt Baum
The "Brittleness" Problem in Data Pipelines.
This article serves as a good reminder to prioritize data quality and technical debt so that you are not left with "brittle" pipelines that do not run consistently. There are many tools that provide you with robust data quality checks, but even many errors can be eliminated with some basic sanity checks - Wes Poulsen
5 Python Decorators I Use in Almost All My Data Science Projects
When I am working on a Data Science project, I spend the majority of my time just trying to get the model to a point where it does what I want. I barely have time to think about things like decorators. This article shows off 5 decorators that would help save time in your work and help you accomplish tasks more quickly. - Steven Johnson