The Difference between Data Quality and Data Observability

niket.b
Dec 13, 2023
4 min read

Updated: Jul 26, 2024

You Found a Data Error. Now What?

Data-driven organizations have long employed data quality measures within their systems to ensure the veracity, freshness, and completeness of their data. But such measures have been found wanting with increases in the scale of data, the heterogeneity of use cases, and the complexity of data pipelines. These factors have given rise to the paradigm of Data Observability.

But what is Data Observability, exactly? Just another fancy marketing gimmick, or a concept with some substance and relevance for your organization? Our aim with this blog post is to help you decide. We take you through the entire extended lifecycle of a data error and illustrate the difference between data quality and data observability.

Let us start with the premise that your organization already has a data quality system, and that this system has detected an error. Now what?

Gather Information

First, you need to understand the error and gain sufficient context. What is the typical value of the erroneous metric, what is the deviation from norm, what query was executed to carry out this check, and when was the source data updated last? Typically, such data has to be gathered from various different systems, taking up valuable time that could have gone into fixing the problem. Wouldn’t it be nice to have this available at one glance in context of the incident?

This is the first difference between Data Quality and Data Observability: relevant data is surfaced and presented intuitively to enable rapid situational awareness.

ExpertSense information on a metric — All relevant information surfaced in one place

Diagnosis and Corrective Actions

Now you need to determine the fix. What are the procedures involved, and were any of them updated recently? For this, you need to understand the data lineage, so you know where to check for problems. You could do this by painfully digging backwards through the complex transformation code, or you could have the information at your fingertips in the form of a lineage graph.

This is the second difference between Data Quality and Data Observability: dependencies between code and data are illuminated to facilitate quick diagnosis and corrective action.

Tracing lineage visually vs. from code

ExpertSense Lineage — Would you rather trace lineage from a graph...

Discerning lineage from code — ...or derive it from pages upon pages of code like the sample shown here?

Issue Management

The delay caused in making the fix has given rise to scrutiny from business users and senior management. A detailed retrospective shows that there was a gap of two days between error being reported by the quality system and it being actioned by the engineering team. Turns out, the error had been logged to a table, which is not checked very often because the team is always grappling with other priorities.

This is the third difference between data quality and data observability: a data observability solution alerts engineering teams, so they do not have to spend time monitoring for errors. It also can alert business users, so they can hold the engineering team accountable, if needed. The solution also creates an incident, so that the lifecycle of the error can be managed properly.

Logs vs. Alerts

Error Log Table — Would you rather constantly monitor and query a log table like the one above...

ExpertSense alert on Microsoft Teams — ...or be alerted via notifications when there is an issue?

ExpertSense incident monitoring — And how do you ensure that an incident, once detected, does not slip through the cracks? With a data observability solution, each error generates an incident, whose lifecycle can be managed on the tool itself of via integrations with systems such as Jira and ServiceNow

Preventive Action

Okay, you fixed the issue that your quality check caught, but the whole episode was an embarrassment for your data engineering team. You want to prevent the recurrence of similar errors on any of the other myriad data sources in the future. To this end, you want to survey the existing rules in place, plug any gaps, then share the list of rules with your business users to check whether they have any additional rules to suggest based on their domain expertise.

But since the rules are embedded in code it is difficult to take stock of what is already in place. The only alternative is to manually compile this list in a form amenable to analysis and digestible for non-technical stakeholders. This is an error-prone and time-consuming exercise for time-strapped engineering teams. Would it not be better if these quality rules were all compiled in one place as an easily filterable list?

Rules in code vs. a filterable list

Digging through code to find Quality checks — Would you rather take stock of your quality checks by digging through multiple layers of code...

ExpertSense Quality checks — ...or have them presented as an easily filterable list?

Deployment

You find glaring gaps in rules coverage; these gaps exist largely because no one has undertaken such an exercise in a long time given how tedious it is. You have created new rules to plug these gaps and also have new rules suggested by the data consumers. You want to deploy these rules quickly, but since doing so involves updating your code, you have to follow the production deployment cycle. You know that you will have to follow the same cadence every time the rules need minor updates in the future. If only you could have a separate system where these rules can be deployed to and executed from without disturbing your production code, your life would be a lot easier!

Separation of Concerns between Code and Quality Checks

Quality checks in code workflow — When quality checks are embedded in code, you have to redeploy production code to make even the smallest updates

ExpertSense workflow — However, if they are housed in a separate data observability system, the checks can be updated independently, without disturbing an otherwise stable production environment.

Reporting

The entire episode has caused a loss of trust in the data engineering team. You want to restore this trust by proactively reporting how your team’s consistent efforts are resulting in marked improvements in aggregate quality outcomes over time. To this end, you have to extract the data from logs or error tables to present the data, which is time-consuming. Further, since the system is not transparent, the reporting you do create, does not engender the kind of trust you were hoping for. What if you could have a system that automatically compiles and presents this data to everyone in a transparent manner?

ExpertSense Quality metrics — Quality metrics automatically compiled for you so you can report on overall performance

Data Quality and Data Observability

Detecting data issues is only one part of the data quality lifecycle. How quickly you correct the issue, how you manage communication over the incident lifecycle, how you update and maintain preventative measures, and how you deploy these measures are all vital parts of the process that need systemic attention if you want to get a handle on data quality within your organization. This is where data observability tools help. We hope this article has helped you understand the differences between data quality and data observability.

Insurance

Enterprise Data Hub

Pharma

Data Observability

Commercial Data Warehouse (CDW)