Data mesh is the buzzword of the moment and everyone wants one. Coined by data architect Zhamak Dehghani in 2019 it refers to the way data domains can use data products from other domains and how data from multiple domains can be combined to get a more rounded view. But although demand for data mesh is soaring, organisations don’t necessarily understand what it is or what it is trying to achieve. As a result, a lot of data mesh projects fall by the wayside.
Data mesh is a structural change, which has an impact on technical components and infrastructure. It isn’t as simple as buying a data mesh platform like you can buy a data warehouse or lakehouse. It’s an approach that needs to be organised around your data platform. The main aim of data mesh is to allow you to scale data platforms and share data domains via self-service data platforms with rules and governance in place.
Data mesh promotes the interoperability of data between teams, makes data discoverable within the organisation, has clear ownership and provides a label to the trustworthiness of data. This all while being secure and making data products as clearly defined as possible to reduce the amount of required support on the data mesh.
So far, so clear cut. But as we said earlier, a lot of data mesh projects go wrong. Why is this? After helping lots of organisations implement data mesh within their organisations, we kept seeing a familiar pattern for failed or impeded implementations. What are they?
Lack of self-service: for data mesh to work, data- engineers, analysts and scientists need to be able to do their work without having to worry about the infrastructure. This means another cloud engineering team needs to enable self-service data platform building blocks to be rolled out by anyone. Once this is ready and data products are created, other data users need to be able to access data products without (too many) manual steps. Self-service foundational components require time to setup at the beginning of the data mesh journey which needs to be factored in.
Missing data catalog: to create a mesh of data you need to be able to find data products created by other teams. A great tool to discover data within the organisation is a data catalog. This tool is key within data mesh to make data discoverable and must be implemented before you can scale data mesh.
Too many data products: when a data catalog is implemented and data products are registered it must also be easy to find the correct data. If you search for common terms like ‘revenue’ or ‘customer’ and more than a hundred results show up, it is hard to find the correct data. This makes it important to certify that data of a certain level shows up higher in the search results to distinguish between less refined data products and your key high quality data products that are used by a lot of people.
No interoperability: if there are too many technology stacks and data formats being used, the ability to use data products by other teams decreases. So it is important to be strict on certain storage formats and data conventions (e.g. everything stored as Delta Lake in S3/ADLS).
Stuck in domain-driven architectures: we also see customers struggle with the domain part of data mesh and force themselves to fit into a certain structure just to adopt data mesh. Although it is important to have clear boundaries between domains/departments, it can take up a lot of valuable time that can also be spent in a PoC (proof of concept) to learn how domains interact and where to put boundaries.
So should you implement data mesh? It depends. There are quite a lot of considerations – do you already have the infrastructure building blocks, teams and tools in place to support data mesh? Do you have difficulty scaling your data products, both from the data ingestion and data consumption side? If the answer is yes, then data mesh might be the right way to go.
As a starting point, you could select the two domains/ departments with a good data mesh use case. Next, let one department create a data product based on their data. The next step would be to let the other department consume the data product and create an analytics report – this creates the first mesh of data and will get you started on your data mesh journey.
Want to know more?
To speak to someone at Valcon about defining how data mesh could look for you, helping you get ready for data mesh and how we can support you in your data mesh journey, please get in touch with Tim Ellens at [email protected]
If you want information on Valcon’s data offerings, take a read here, or dive into Valcon’s World of Data.