Essential keys to understanding Hadoop architecture

Contents

arquitectura hadoop

Hadoop, how is it known, it is an open source system based on an architecture that works with the master node and the slave nodes to create a cluster, always since a single master node and multiple slaves. Thanks to this structure, Hadoop is capable of storing and analyzing large volumes of data, hundreds of pentabytes and even more.

Its core was born as a set of solutions in the Apache environment, christened Hadoop, and his master architecture / slave uses the master node to store the metadata associated with its slave nodes within the rack of which it is a part. Besides, the master maintains the state of its Slavic nodes, while they store the information it is processing at a specific time.

Simply, is a technology that stores enormous volumes of information and makes it possible to put into practice predictive analytics from structured and unstructured data, running in a Hadoop cluster of a specified number of nodes.

A rich and growing ecosystem

The international open source community is refining the core of Hadoop while growing its ecosystem with contributions constant. Since the original does not meet the needs, functionalities begin to appear, as it happens with Spark – spark, that meets real-time requirements that a traditional Hadoop cluster cannot meet without your help. Thus, that open source community is in charge of maintaining, fix bugs and grant new packages to achieve new functionality.

For his part, commercial distributions They take the opensource of Apache and add new functionalities that satisfy the requirements of the business world, in order to adapt it, since opensource software has the advantage of being cost-free, but in a corporate environment other functionalities are needed.

Adapts to Hadoop architecture

In the moment of design a cluster We have to answer a series of key questions that allow us to adapt the Hadoop architecture to the different needs of each specific case.. We will have to choose with how many nodes We are going to start based on aspects such as the amount of data with which we are going to work, where are they, his, her nature …

It will also be essential to determine what I want to analyze, where to cut to make the procedure viable without hindering the achievement of the objective, which is none other than discovering trends and, in conclusion, understand patterns that allow the extraction of strategic value. .

The choice of distribution Hadoop will depend on what you offer us and how it adapts to what you are looking for. The free Hadoop distribution is often used for testing that, if they have success, often lead to an inexpensive business case that needs commercial distribution.

Even so, the open source version is an alternative to commercial. It is true that they will not have as many corporate applications and it will be more difficult to install and configure, since we will not have installation or configuration wizard. It will be more complex to implement a Hadoop cluster and there will be a lack of assistance to implement it and correct possible errors.

The use of Cloud it can also serve as complement have tools that design analyzes more efficiently, even though these must be done within the cluster. An example could be the use of visualization tools that would run on the cluster, no need to move data to the cloud. In general, these types of aids can be used to analyze, validate results, make comparisons or to be able to implement a system, Let's say.

Image source: jscreationzs / FreeDigitalPhotos.net

Related Post:

Subscribe to our Newsletter

We will not send you SPAM mail. We hate it as much as you.