Using a data catalog to find your way through the data jungle

By Daan Caminada

It is estimated that 90% of the world’s data was generated in the last two years alone. And that the total amount of data created, captured, copied and consumed on a global basis is estimated to reach a breathtaking 180 zettabytes by 2025 (Statista). Faced with this vast growth in data volume, it is no wonder that organisations find themselves in the midst of a data jungle, a complex ecosystem of vast volumes of information gathered from diverse sources, with little idea of how to navigate it.

Finding your way through the data jungle can be challenging, time consuming and frustrating. But mapping it needs to be a priority. In data governance – the practice of categorising data, ensuring its accuracy, quality, availability and security – a map is known as a data catalog. It basically enables you to create an extensive, comprehensive map of your data, which allows people in your organisation to navigate it easily and find all data assets relevant to them. Plus, it frees up your data team to focus on work that really matters – making data easily navigable means the team can focus on more strategic, higher value work.

Unlocking the value of your data

A data catalog harvests the metadata from your data sources and systems and stores them centrally, so becomes a centralised repository of the metadata of your organisation’s data assets. Due to the fact the catalog is searchable and has an intuitive UI, it will look and feel a bit like an online store where you can search and filter to find the product you’re looking for. But instead of buying shoes, or sports kit, you can search and filter data assets and when you find the right one, request it. Just as a good e-commerce site makes online shopping much easier, a data catalog simplifies the discovery and acquisition of data assets.

So when you’re thinking about creating a data catalog to enable data users in your organisation and help you navigate your data jungle, what are the important steps to take?

  • Enriching the data catalog: after harvesting the metadata from your source systems and data products, the next step is to enrich your data catalog by filling the business glossary with terms and definitions, just like you would a dictionary. You can also enrich your data catalog with data models (logical and conceptual), business rules, policies, standards, data quality results, and so on. You can also harvest your lineage to understand how data is used and where it originates. By centralising all this, you have the components of your data jungle and so the first step in creating the map.
  • Getting a holistic view of the data: the next step is to link the different concepts together. You can relate a business term to a conceptual entity, to a logical data attribute, or to a column. You can specify business rules for data attributes or relate retention policies – those that determine how long you can keep the data – to tables. By establishing these relationships, you start connecting all the components of your data landscape, creating a holistic view on your data assets. Whether you are on the technical side (looking at tables and columns), or on the business side (looking at business terms and data concepts), everyone has the same map to navigate through the data jungle.
  • Giving context to the data: when you have gathered and linked all the information in a central place, you can start enriching the metadata. For example, you can help data users understand the meaning of a column by adding the functional name of the column or by adding a description. You can also specify if this data is a critical data element, if it contains personal data or what the security classification of the data is. You can ingest the results of your data quality measures, to provide users insight in the data quality of the data asset they are looking at. All this extra information will help your data users understand what the data is and if it fits their needs. Adding this information, gives an extra layer to your map.
  • Automate your data management / governance processes: as an overlay to this beautiful centralised, interlinked repository of your data assets, you can build workflows to automate your data management/governance processes. For example, if a data user has found the table they need for their analysis, they can immediately request access to it via a workflow. This request is put through the relevant data owners and all the information around this request is stored within your catalog. This will help you see who has access to what data, for how long and the purpose of the data’s use.

A data catalog empowers your organisation to work easier, faster, safer and with more joy with data. It means your data is managed and governed centrally. It’s easier to visualise data ownership and comply with regulatory standards. It means everyone in the organisation can find and get access to data in one intuitive place, understand the data because of the provided context and trust the data because of the known data quality.

From a data mapping perspective, data catalogs have a pivotal role to play in good data governance and data management. It’s a data jungle out there – and your data catalog will be one of the most important things in your jungle survival kit.

Want to learn more? If you would like to speak to Valcon about how to enable data users within your organisation and find your way through the data jungle with good data management, governance and building a data catalogue, get in touch with [email protected].

Insights