All Hands on Data #90
Welcome to the 90th edition of All Hands on Data! To celebrate Oppenheimer's Oscar wins, we have an article about nuclear energy for you this week. Check it out and more data articles below.
AWS acquires Talen’s nuclear data center campus in Pennsylvania
Data operations are notoriously energy-demanding and AWS is taking the step to leverage nuclear to solve that problem. It's really cool to see giants like AWS acknowledging the need to incorporate clean and consistent energy via nuclear power plants in order to continue providing data-intensive services to their customers. Plus it's a cool PA win I'm always happy to see. : ) - John Forstmeier
Solving a Tennis Refactoring Challenge in Python using SOLID
I completely agree with the first point that these types of exercises are becoming highly beneficial for data scientists, with the increasing popularity of their work. Whether intended or not, the best point for me is the test driven development out of the box in the repo. No matter what refactoring you do, the Python section of the repo provides tests to increase confidence in your refactoring. - Eric Elsken
Exploring the Potential of Transfer Learning in Small Data Scenarios
Many teams don't have access to or the resources to get the massive amounts of data needed to train models well. Enter transfer learning: taking pre-trained models and using them on smaller datasets for specialization. Davies talks about some of the hurdles to using pre-trained models, but ultimately concludes that there is huge potential in transfer learning, especially as more effort is put into automating the selection and fine-tuning of pre-trained models. - Katt Baum
Best Data Engineering Projects for Beginners in 2024
As we look to help data teams make the most out of their engineering resources, we also offer an easy platform for folks to get started and sharpen up their data engineering skills. This article is a great starting point for folks looking to pick up a new data skill or put some recently used knowledge to good use. - Angel Catalan
Snowflake Lowers Cost from 1.5x Usage to 1.2x
Namely to highlight usage cost going down currently while other tools (DBT, DB, etc) are going up exponentially. - Jack Ryan
Using Databricks Notebooks for Production Data Pipelines
This was an interesting read because I find myself in the "notebooks should never see the light of day in a production pipeline." That being said, the author makes a good case for notebooks, emphasizing their strengths and why data scientists are drawn to them in the first place. Additionally, it seems that Databricks notebooks in particular may have some additional safe guards in place to address some of the common pitfalls (like versioning) - Wes Poulsen