All Hands on Data #65
Machine Learning, containers in orchestration, DuckDB for Data Engineering, Python skills, and creating more environmentally friendly office spaces - there's a large array of topics for this week!
Docker Swarm vs Kubernetes: how to choose a container orchestration tool
“For those looking to setup an orchestration tool for their containerized apps, this article provides a good comparison. At a high level, the piece presents Docker Swarm as the easier beginner tool while Kubernetes positions itself as a more advanced tool. Both are solid resources if you're rolling your own containers but if you just need to get some scripts running together, Shipyard might have some thoughts on an alternative..”. - John Forstmeier
To Use or Not to Use Machine Learning
“Machine Learning is not always the best option. While ML is powerful, it's important to ask questions before assuming you need it. The author explores the key factors to think about when determining if you need ML: data quality and quantity, labels, deployments, stakes, ethical considerations, and explainability.” - Arynn Martin-Post
10 killer Python automation scripts
“Our team at Shipyard is all about automating everything that we can. This article runs through some Python modules that allow you to automate video and audio edits along with some other fun tasks as well.” - Steven Johnson
Machine Learning Engineers - What Do They Actually Do?
“Kirmer talks about one of the newer job titles out in the market: Machine Learning Engineer (MLE). This article raises some thought-provoking questions: Is the emphasis on the ML or E? Who is qualified to be a MLE and how will they be compensated? How do professional ecosystems shift as more underrepresented folks fill positions in the overlapping fields of Data and Engineering?” - Katt Baum
Data-driven solutions to creating a net-zero office space
“Net-Zero Offices are those which generate emissions equal to or less than what they remove from the atmosphere. Using data-driven decision making, unnecessary energy consumption can be identified and addressed to increase energy efficiency and decrease emissions. Data can be a huge driving force in helping office spaces become greener!” - Reed Cowan
DuckDB for Data Engineering
“This is a very succinct yet comprehensive view of what makes DuckDB so attractive, as well as some of the potential shortfalls. With its simplicity, speed, and common vernacular (SQL), there's no question why DuckDB is popular in the data space. DuckDB is a fabulous alternative to in process data tools like pandas, MySQL, Postgres, etc. But be wary of using it to build full fledged pipelines, this may lead to complicated SQL scripts that will likely increase the overall complexity of the project. In short, I agree with the closing sentence:
I think DuckDB is the perfect tool to enhance and replace SOME parts of the Data Engineering pipeline tech stack. It’s SQL. Fast. Lightweight" - Wes Poulsen