All Hands on Data #27
With the group stages coming to a close at the World Cup, we hope you take some time to knock-out these articles.
Data-Centric Manifesto
This open manifesto takes a very interesting approach: treat data as a first-class citizen, not applications. Basically, what this means is that data lives open source and independent of any one application BUT all applications can read and write to it. This creates a massive shift from the current paradigm where companies hoard data and exploit it to generate returns - open sourcing the data reduces those returns but drops costs (construction, architecture, maintenance, etc) considerably elsewhere in the organization. - John Forstmeier
The Best Methods for One-Hot Encoding Your Data
This is a good refresher on data cleansing for ML, and could probably save you a few lines of code. - Wes Poulsen
What Good Data Self-Serve Looks Like
I hate the phrase "democratizing X" - it's super vague. This article takes a much more, in my opinion, concrete phrase of "self-service" specifically around data in a company. And on that, the author makes the point that there is no one-size-fits-all approach to creating and maintaining a data self-service platform for internal stakeholders - it takes a lot of work and collaboration to move away from "ticket in, ticket out" work styles currently typical of a data team. - John Forstmeier
Would Biden's Proposed AI 'Bill of Rights' Be Effective”- Or Just More Virtue Signaling?
Not to get political here - but with how quickly technology is advancing, the way we govern and oversee tech companies and products needs to keep up as well. While I do think we need to come up with a better way to regulate tech, and especially AI, it's going to be hard to come up with something that all parties involved can agree on. Do you think an AI Bill of Rights is the way to go? - Joseph McDermott
The Complete Data Engineering Study Roadmap
This article presents a good strategy for those looking to transition into data engineering and don't know where to start. I feel that the most important thing listed is projects; having a personal portfolio of data pipelines is perhaps the best way to demonstrate an understanding of the concepts - Wes Poulsen
5 Things to Know Before Using Snowflake's Native Data Classification
It is increasing important to properly protect PII (Personally Identifiable Information) data. One of the first steps is to classify the data so it is easy to pinpoint and safeguard. Briki explains that Snowflake's Data Classification can help with this task and how to get started using it. - Katt Baum
Predicting Group B of the World Cup with Data
Although the group just ended, this article still has some interesting observations that can be extended to the knockout phases. Check out how to combine Shipyard and Hex to help illustrate findings and help determine outcomes. - Steven Johnson