What you need to know about the architecture of a datawarehouse

Share on facebook
Share on twitter
Share on linkedin
Share on telegram
Share on whatsapp

Contents

The architecture of a data warehouse can have different structures in different implementations. Some may have an ODS (operational data warehouse), while others may have multiple data stores. Some may have a small number of data sources, while others may have dozens of data sources. Given this, it is much more reasonable to present the different layers of a data warehouse architecture rather than discuss a specific system.

arquitectura20de20un20datawarehouse-1496408

In general, each architecture of a datawarehouse has the following layers:

  • Data extraction layer
  • Data storage layer
  • Data presentation layer
  • System operations layer

Properties of the architecture of a datawarehouse

The following properties are essential for the architecture of a datawarehouse:

  • Separation– The analytical and transactional procedure should be kept as separate as possible.
  • Scalability: the architecture of a datawarehouse, both hardware and software, should be easy to update as the volume of data to be managed and processed grows, as well as the amount of user requirements that must be satisfied.
  • Extensibility: The architecture must be able to adapt to new applications and technologies without needing to review the entire system.
  • Security: monitoring access is essential due to the strategic data that is stored in the datawarehouse.
  • Administrability: datastore management shouldn't be overly difficult.

Layers of the entire data warehouse architecture

Let's see each of the layers of the architecture of a datawarehouse below:

  • Data source layer: Represents the different data sources that feed the data in the data warehouse. The data source can be in any format: plain text file, relational database, other types of database, Excel file, etc. All of these can act as a data source. At the same time, data types can be very varied:
    • Operations dataas sales data, HR data, product data, inventory data, marketing data and systems data.
    • Logs from a web server, with user browsing data.
    • Internal market research data.
    • Third party data, as census data, demographic data or survey data.
  • Data extraction layer: The data is extracted from the data sources and carried to the data storage system. Some minimal data is likely to be cleaned up on this layer, but no major data transformation is expected.
  • Test area: this is where the data is purified and transformed into a datamart and datawarehouse. Having a common area facilitates the procedure and subsequent integration of the data.
  • ETL layer: This is where data gets its intelligence when logic is applied to transform data from a transactional nature to an analytical nature.. This layer is also where the data is cleaned. The ETL design stage is often the most time-consuming stage of a data warehouse project and an ETL tool is commonly used in this layer.
  • Data storage layer: This is where the clean transformed data is placed. Depending on scope and functionality, three types of entities can be found: data warehouse, data mart and operational data warehouse (ODS). In any system, can find only one of the 3, 2 of the 3 or all three guys together.
  • Logical data layer: This is where the trading rules are stored. These business rules do not impact the data transformation rules, but they do impact what you can see later in the reports.
  • Data presentation layer: It refers to the data that reaches users. This can be in the form of a tabular or graphical report via a browser., a report sent by email that is automatically generated and sent daily, an alert that warns users of exceptions, etc. As usual, in this layer an OLAP tool and a reporting tool are used.
  • Metadata layer: This is where information about the data stored in the data warehouse is stored. A logical data model would be an example of something found in this metadata layer. A metadata tool is often used to manage metadata.
  • System operations layer: This layer includes information about how the data storage system is working., what is the working status of ETL, what is the performance of the system and the access history of the users.

(function(d, s, id) {
var js, fjs = d.getElementsByTagName(s)[0];
if (d.getElementById(id)) return;
js = d.createElement(s); js.id = id;
js.src = “//connect.facebook.net/es_ES/all.js#xfbml=1&status=0”;
fjs.parentNode.insertBefore(js, fjs);
}(document, ‘script’, 'facebook-jssdk'));

Subscribe to our Newsletter

We will not send you SPAM mail. We hate it as much as you.