If you know 10 people who have been in data science for more than 5 years, Everyone probably knows or has used SQL at some point in some way!! Such is the degree of influence that SQL had on anything to do with structured data.
In this post, we will learn the basics of SQL and focus on SQL for RDBMS. As you will see, SQL is pretty easy to learn and understand.
What is SQL?
SQL stands for Structured Query Language. Es un lenguaje de programación estándar para tener acceso a una databaseA database is an organized set of information that allows you to store, Manage and retrieve data efficiently. Used in various applications, from enterprise systems to online platforms, Databases can be relational or non-relational. Proper design is critical to optimizing performance and ensuring information integrity, thus facilitating informed decision-making in different contexts.... relacional. It has been designed for data management in relational database management systems (RDBMS) como Oracle, MySQL, MS SQL Server, IBM DB2.
SQL is one of the first commercial languages used for Edgar F's relational model.. Codd, further described in his influential post from 1970, “A relational data model for large shared databases. “
Previously, SQL was a de facto language for the generation of information technology professionals. This was due to the fact that the data stores consisted of one or the other RDBMS. The simplicity and beauty of the language enabled data warehousing professionals to query and provide data to business analysts.
Despite this, the problem with RDBMS is that they are often suitable only for structured information. For unstructured information, las bases de datos más nuevas como MongoDB y HBaseHBase is a NoSQL database designed to handle large volumes of data distributed in clusters. Based on the column model, Enables fast, scalable access to information. HBase easily integrates with Hadoop, making it a popular choice for applications that require massive data storage and processing. Its flexibility and ability to grow make it ideal for big data projects.... (the Hadoop) prove to be more suitable. Part of this is a compensation in the databases, which is due to the CAP theorem.
What is the CAP theorem?
The CAP theorem states that, in the best case, we can aspire to two of the following three properties. CAP means:
Consistency – This means that the data in the database remains consistent after the execution of an operation.
Availability – This means that the database system is always up to ensure availability..
Partition tolerance – This means that the system continues to function even if the transfer of information between the servers is not reliable..
The various databases and their relationships with the CAP theorem are shown below:
Database properties:
Despite this, a transactionThe "transaction" refers to the process by which an exchange of goods takes place, services or money between two or more parties. This concept is fundamental in the economic and legal field, since it involves mutual agreement and consideration of specific terms. Transactions can be formal, as contracts, or informal, and are essential for the functioning of markets and businesses.... de base de datos debe ser compatible con ACID. ACID means atomic, consistent, insulated and durable, as explained below:
Atomic: A transaction must be completed with all your data modifications or not.
Consistent: At the end of the transaction, all data must be left consistent.
Isolated : Data modifications made by one transaction must be independent of other transactions.
Durable : At the end of the transaction, the effects of the modifications made by the transaction must be permanent in the system.
To counteract ACID, consistent services provide BASE features (Simply available, soft state, eventual consistency).
Command set in SQL
SELECT- El siguiente es un ejemplo de una consulta SELECTThe command "SELECT" is fundamental in SQL, used to query and retrieve data from a database. Allows you to specify columns and tables, filtering results using clauses such as "WHERE" and ordering with "ORDER BY". Its versatility makes it an essential tool for data manipulation and analysis, facilitating the obtaining of specific information efficiently.... que devuelve una lista de libros económicos. The query retrieves all the rows from the Library table in which the price The column contains a value less than 10,00. The result is sorted in ascending order by price. The asterisk in the choose list indicates that all columns of the Book
SELECT * FROM Library WHERE"WHERE" is a term in English that translates as "where" in Spanish. Used to ask questions about the location of people, Objects or events. In grammatical contexts, it can function as an adverb of place and is fundamental in the formation of questions. Its correct application is essential in everyday communication and in language teaching, facilitating the understanding and exchange of information on positions and directions.... price < 10.00 ORDER BY price;
The table must be included in the result set.
UPGRADE –
This query helps update tables in a database. Además se puede combinar la consulta SELECT con el operador GROUP BYThe clause "GROUP BY" in SQL it is used to group rows that share values into specific columns. This allows aggregation functions to be performed, as SUM, COUNT or AVG, About the resulting groups. Its use is essential to analyze data and obtain statistical summaries. It is important to remember that all selected columns that are not part of an aggregation function must be included in the "GROUP BY".... para agregar estadísticas de una variableIn statistics and mathematics, a "variable" is a symbol that represents a value that can change or vary. There are different types of variables, and qualitative, that describe non-numerical characteristics, and quantitative, representing numerical quantities. Variables are fundamental in experiments and studies, since they allow the analysis of relationships and patterns between different elements, facilitating the understanding of complex phenomena.... numérica por una variable categórica.
JOINTS-
Therefore, SQL is widely used not only for querying data, but also to join the data returned by such queries or tables. The Data FusionData fusion is a process that integrates information from various sources to obtain a unified and coherent whole. This technique is fundamental in areas such as artificial intelligence, Data mining and analytics, as it improves the accuracy and quality of analyses. By combining heterogeneous data, Patterns and trends can be discovered that, else, pasarían desapercibidos.... en SQL se realiza a través de ‘uniones’. The next infographic is often used to explain SQL joins:
Cómo utilizar join"JOIN" is a fundamental operation in databases that allows you to combine records from two or more tables based on a logical relationship between them. There are different types of JOIN, as INNER JOIN, LEFT JOIN and RIGHT JOIN, each with its own characteristics and uses. This technique is essential for complex queries and more relevant and detailed information from multiple data sources.... and SQL
CASE- We have the case operator / when / then / else / than a SQL. Works like but
in other programming languages:
CASE WHEN n > 0 THEN 'positive' WHEN n < 0 THEN 'negative' ELSE 'zero' END
Nested subqueries – Queries can be nested so that the results of one query can be used in another query via a relational operator or an aggregate function. A nested query is also known assubqueryA subquery is a query within another query in SQL. Used to obtain results from a database that depend on the results of an external query. Subqueries can appear in SELECT clauses, WHERE o FROM, and enable more complex operations by efficiently filtering or modifying data. Su uso adecuado optimiza el rendimiento y la claridad del código SQL....
.
Where do we use SQL?
- SQL has been used extensively to retrieve data, merge data, query group and nested cases over decades. Even for data science, SQL has been widely adopted. Then, some examples of the specific use of SQL parsing are shown:
- In the case of the SAS language that uses PROC SQL, we can write SQL queries to query, update and manipulate data.
- An R, sqldf package can be used to run SQL queries on data frames.
In Python, pandasql library enables you to query Pandas DataFrames using SQL syntax.
Does SQL also influence other languages?
The downside of relational databases is that they cannot handle unstructured data. To cope with the appearance, new databases have emerged and are given NoSQL as an alternative name to DBMS. But SQL is not dead yet. See also:
A mapping from SQL to MongoDB
Here are some languages in which SQL has a significant influence:
.
SQL-Mapreduce
– Teradata utiliza la base de datos Aster que utiliza SQL con MapReduceMapReduce is a programming model designed to efficiently process and generate large data sets. Powered by Google, This approach breaks down work into smaller tasks, which are distributed among multiple nodes in a cluster. Each node processes its part and then the results are combined. This method allows you to scale applications and handle massive volumes of information, being fundamental in the world of Big Data.... para grandes conjuntos de datos en la era de Big Data. SQL-MapReduce® is a framework created by Teradata Aster to allow developers to write powerful and highly expressive SQL-MapReduce functions in languages such as Java, C #, Python, C ++ and R and bring them to the discovery platform for high-performance analytics. After, analysts can invoke SQL-MapReduce functions using standard SQL or R via the Aster database.
Spark SQL – Apache's Spark project is forReal-time processing, in-memory and parallel Hadoop data
. Spark SQL builds on it to allow SQL queries to be written to the data. In Impala of Cloudera, los datos almacenados en HDFSHDFS, o Hadoop Distributed File System, It is a key infrastructure for storing large volumes of data. Designed to run on common hardware, HDFS enables data distribution across multiple nodes, ensuring high availability and fault tolerance. Its architecture is based on a master-slave model, where a master node manages the system and slave nodes store the data, facilitating the efficient processing of information.. o HBase se pueden consultar, y la sintaxis SQL es la misma que la de Apache HiveHive is a decentralized social media platform that allows its users to share content and connect with others without the intervention of a central authority. Uses blockchain technology to ensure data security and ownership. Unlike other social networks, Hive allows users to monetize their content through crypto rewards, which encourages the creation and active exchange of information.....
See also: Learn more about ways to query Hadoop using SQLhere
.
Final notes
In this post we discuss SQL, its uses, the CAP theorem and influence of SQL in other languages. A basic knowledge of SQL is very relevant in today's world, where python, R, SAS are dominant languages in data science. SQL is still relevant in the age of BIG DATA. The beauty of language remains its elegant and simple structure. Thinkpot:
Do you think SQL has become an inevitable weapon for data management? Would you recommend any other database languages?
Share your views / opinion / feedback with us in the comment section below. We would love to hear from you!! If you like what you have just read and want to continue learning about analytics,subscribe to our emails , Follow us on twitter or like ourspage the Facebook
.
Related