Enterprise Data Hub
Enterprise Data Hub – A Reference Architecture
As organizations embark on their digital transformation journey, they want to implement an elastically scalable, resilient architecture. A centralized Enterprise Data Hub (EDH) that can provide high quality data to a wide variety of operational applications is being increasingly adopted by enterprises. Often times, this is accompanied by a decision that includes migration to the Cloud (AWS, Azure, Google); a flexible architecture that may span multi-cloud and is free from vendor lock-in is desirable. Finally, all chose technologies must provide enterprise level production support with required levels of security, availability and performance.
Data Aces has defined the following Enterprise Data Hub reference architecture to assist our customers in their digital transformation journey.
The Enterprise Data Hub Architecture defines an elastically scalable, resilient and event driven data platform to transform data into actionable insight. This Enterprise Data Hub Architecture is based on open standards (with available commercial support) and can be implemented on any public or on-premises cloud.
The following sections describe the key components of this EDH architecture.
Apps in the Enterprise Data Hub
Every enterprise has existing applications that automates the operational activities throughout the organization. Some of them tend to be SaaS applications. Users interacts with these applications through their UI to drive business processes.
For the tasks that can be performed in real time, an enterprise application reads and writes data directly from an operational data store.
However, there can be some operational activities which are best performed asynchronously; such as analyzing a given user claim for potential fraud. In such cases, the UI will inform the user to come back for the results or send them an email notification when the asynchronous task completes. Enterprise apps provide access to the data in the operational store of the data hub via APIs, web or mobile interface.
Operational Store in the Data Hub
Traditionally, relational databases are used as operational store. In many enterprises, a single database may serve all the applications. The core requirements for the operational data store is that it should be capable of persisting and retrieving high volume and high velocity data irrespective of whether the data is structured, semi-structured, or unstructured.
While some enterprises at the early stages of their digital transformation journey start out with relational databases as their operational data store, we recommended the use of a distributed NoSQL database such as Cassandra. These platforms do much better at handling the low latency requirements for the increasing variety of semi-structured and unstructured data that enterprise applications handle. The Operational Store in the Enterprise Data Hub, uses an always-on, scalable distributed database.
Workflow tasks (or events) are created by enterprise applications for asynchronous processing and published to a topic in a queuing system such as Kafka. Depending on the scalability and throughput requirements, the design will include one or more topics in an Enterprise Data Hub Architecture.
Events in the Enterprise Data Hub
The queued events in the Enterprise Data Hub are processed by elastically scalable microservices. Generally, there is one microservice dedicated for each tasks queue (or Kafka topic). As they process tasks or events, these microservices consume data available in operational data store and persists results in there. Further, these microservices may only do part of the required processing and publish one or more events on downstream topic for the remaining work.
External Data Loading Tasks
Often times there is data purchased from external sources or otherwise available from those sources that is needed to enrich (or provide context for) the data available in enterprise data stores. This offline data needs to be ingested in an asynchronous and failsafe manner.
In an Enterprise Data Hub Architecture, separate External data topics are used to trigger processing of these external data sets whenever they arrive. The event provides all the required information about the source data and the target location in enterprise data stores where this external data needs to be copied.
While the Operational Data Store serves as the Enterprise Data Hub to reliably serve mastered, high quality data for operational applications, there is a need for quick, on-the-fly operational analytics. For example, questions such as how many new claims came in during the last hour are handled by the Analytic Store. This should not be confused with a Data Ware house that supports aggregated, batch analytics via Business Intelligence tools that generate Dashboards and Reports. Data from the Operational Store along with other various other systems such as CRM, Finance, etc. are dynamically replicated in a real-time manner to the Analytic Store.
For more information on how an Enterprise Data Hub can help your organization, please Contact us.