24 Moments to Capture Metadata - A Guide for Data Engineers, Data Analysts, and Data Scientists
Posted on May 30, 2024 by Zac
Every interaction with data is an opportunity to collect metadata. Let's consider this principle from two perspectives, the platforms and tools (everybody else) vs. you and your team.
Metadata Collection by Tools
Data platforms\tools already collect metadata in almost every facet of their operation. When you view a Tableau dashboard it will record the view, who the user is, how long the dashboard took to load. When you query a database it will log the query and execution time. Both of these are examples of usage metadata. The builders of these tools know that every interaction with data is an opportunity for them to collect metadata. Consuming data through a tool will result in the collection of usage metadata, producing data through a tool will result in the collection of descriptive or structural metadata. So the collection of metadata is happening all around you when doing data things.
Metadata Collection by You
As a data engineer, a data analyst, or a data scientist you may consider yourself to be collecting metadata if you are aware that the tools you use are already collecting it, that they make it available, and you know how to access that metadata.
Data engineers know how to query the information_schema tables of their database for slow running queries, data analysts know where to access Telemetry tables in their Count.co canvases and data scientists know how to view the executed code history of their Google Colab notebook.
The value proposition of this type of collection can only be personal, private and experiential. The data engineer is a better data engineer because he knows how to troubleshoot table constraints, maybe he can be considered more senior and command higher pay or maybe he can solve problems faster and command the respect of his peers. But the data analyst isn't even aware th
- Every interaction with data is an opportunity to collect metadata.
This list will keep growing as we delve deeper into metadata-first thinking across various industries.