Syntactic analysis | Guide to Mastering Natural Language Processing (Part 11)

Contents

This article was published as part of the Data Science Blogathon

Introduction

This article is part of an ongoing blog series on natural language processing (PNL). In the previous article, we discuss an entity extraction technique called, for instance, Recognition of named entities. There is also another entity extraction technique which is also a popular technique called Theme modeling, which we will discuss in later articles in our blog series.

Then, in this article, we will delve into the syntactic analysis, which is one of the crucial levels of NLP.

This is the part 11 from the blog series on the Step-by-Step Guide to Natural Language Processing.

Table of Contents

1. What is parsing?

2. What is the difference between parsing and lexical?

3. What is an analyzer?

4. What are the different types of analyzers?

5. What is shunt and its types?

6. What are the types of derivation-based analysis?

7. What is a Parse tree?

What is parsing?

Syntactic analysis is defined as the analysis that tells us the logical meaning of sentences given with certainty or parts of those sentences. We must also consider the grammar rules to define the logical meaning and correctness of sentences.

O, in simple words, parsing is the process of parsing natural language with the rules of formal grammar. We apply grammar rules only to categories and groups of words, does not apply to individual words.

Syntactic analysis basically assigns a semantic structure to the text. Also known as parsing or parsing. The word 'parsing’ It originates from the Latin word ‘pars’ what does 'part' mean. Syntactic analysis deals with natural language syntax. Grammar rules have been used in the syntactic analysis.

Let's take an example to better understand:

Consider the following sentence:

Sentence: School go a boy 

The previous sentence does not logically convey its meaning and its grammatical structure is not correct. Then, syntactic analysis tells us whether a particular sentence conveys its logical meaning or not and whether its grammatical structure is correct or not.

How we discuss the steps or different levels of NLP, the third level of NLP is parsing or parsing or syntax. The main objective of this level is to extract the exact meaning, or in simple words, can tell find dictionary meaning from text. Syntax analysis checks the meaning of the text against the rules of formal grammar.

For instance, consider the following sentence

Sentence: “hot ice cream” 

The above sentence would be rejected by the semantic parser.

Now, let's formally define parsing,

In the above sense, parsing or parsing can be defined as the process of parsing strings of symbols in natural language according to the rules of formal grammar.

Difference between lexical and syntactic analysis

The goal of lexical analysis is data cleansing and feature extraction with the help of techniques such as

  • Derivative,
  • Lematización,
  • Correct misspelled words, etc.

On the other hand, in parsing our goal is:

  • Find the roles that words play in a sentence,
  • Interpret the relationship between words,
  • Interpret the grammatical structure of sentences.

Let's consider the following example with 2 prayers:

Sentences:
Patna is the capital of Bihar.
Is Patna the of Bihar capital?

In both sentences, all the words are the same, but only the first sentence is syntactically correct and easily understandable.

But we cannot make these distinctions using basic lexical processing techniques.. Therefore, we need more sophisticated syntax processing techniques to understand the relationship between individual words in a sentence.

Syntactic analysis considers the following aspects of the sentence that the lexicon does not:

Order and meaning of words

The syntactic analysis aims to extract the dependency of words with other words in the document. If we change the order of the words, it will be difficult to understand the sentence.

Stopword retention

If we remove the empty words, can completely change the meaning of a sentence.

Word morphology

Stemming and stemming will bring the words to their basic form, thus modifying the grammar of the sentence.

Parts of speech for words in a sentence

It is important to identify the correct grammatical part of a word.

For instance, Consider the following phrases:

‘cuts on his hand’ (Here ‘cuts’ is a noun) 
‘he cuts an pineapple’ (Here, ‘cuts’ is a verb)

What is an analyzer?

The parser is used to implement the parsing task.

Now, let's see what exactly is a parser.

It is defined as the software component that is designed to take input text data and provides a structural representation of the input after checking the correct syntax with the help of formal grammar. It also generates a data structure generally in the form of a parse tree or abstract syntax tree or other hierarchical structure..

top2bdown2bparsing-1037540

Image source: Google images

We can understand the relevance of parsing in NLP with the help of the following points:

  • The parser can be used to report any syntax errors.
  • Helps recover from commonly occurring errors so processing of the rest of the program can continue.
  • A parse tree is created with the help of a parser.
  • The parser is used to create a symbol table, which plays an important role in NLP.
  • A parser is also used to produce intermediate representations (IR).

Different types of analyzers

As discussed, Basically, a parser is a procedural interpretation of grammar. Try to find an optimal tree for a particular sentence after searching the space for a variety of trees.

Let's take a look at some of the available analyzers:

  • Recursive descent analyzer
  • Case reduction parser
  • Graphics analyzer
  • Regular expression parser

Recursive descent analyzer

It is one of the simplest forms of parsing. Some important points about the recursive descent parser are as follows:

  • Follow a top-down process.
  • Try to check if the input stream syntax is correct or not.
  • Scans input text from left to right.
  • The operation required for this type of parser is to scan characters from the input stream and relate them to the terminals with the help of grammar..

Case reduction parser

Some of the important points about the shift-reduce parser are as follows:

  • Follow a simple process from the bottom up.
  • Your goal is to find the sequence of words and phrases that corresponds to the right side of a grammar production and replace them with the left side of the production.
  • Try to find a sequence of words that continues until the entire sentence is shortened.
  • In simple words, this parser starts with the input symbol and aims to build the parser tree up to the start symbol.

Graphics analyzer

Some of the important points about the chart analyzer are as follows:

  • Basically, this parser is useful for ambiguous grammars, including natural language grammars.
  • Applies the concept of dynamic programming to analysis problems.
  • Due to dynamic programming, stores partial hypothetical results in a structure called “graphic”.
  • The graphic’ can also be reused in different scenarios.

Regular expression parser

It is one of the most used parsers. Some of the important points about the Regexp analyzer are as follows:

  • Uses a regular expression that is defined in grammar form at the top of a string labeled POS.
  • Basically, use these regular expressions to parse the input sentences and produce a parse tree from this.

What is bypass?

We need a sequence of production rules to get the input string. Derivation is a set of production rules. During analysis, we have to decide the non-terminal, which will be replaced together with the production rule decision with the help of which the non-terminal will be replaced.

Bypass types

In this section, we will discuss the two types of derivations, which can be used to decide which nonterminal to replace with the production rule:

Bypass furthest to the left

In the leftmost bypass, the input enunciative form is scanned and replaced from left to right. In this case, the sentence form is known as the left sentence form.

Rightmost bypass

In the leftmost bypass, input sentence form is scanned and replaced from right to left. In this case, the form of the sentence is called the form of the right sentence.

Analysis types

The derivation divides the parsing into the following two types:

compiler-design-analysis-types-1864964

Image source: Google images

Top-down analysis

In top-down analysis, the parser starts to produce the parse tree from the start symbol and then tries to transform the start symbol into input. The most common form of top-down analysis uses the recursive procedure to process the input, but its main disadvantage is back.

Bottom-up analysis

In bottom-up analysis, the parser starts working with the input symbol and tries to build the parser tree up to the start symbol.

What is a Parse tree?

Represents the graphical representation of a derivation. The start symbol of the derivation is considered the root node of the parse tree and the leaf nodes are terminal and the interior nodes are nonterminal.

The most useful property of the parse tree is that the tour in order from the tree will produce the original input string.

For instance, Consider the following sentence:

Sentence: the dog saw a man in the park

After analyzing the sentence, the parse tree generated is shown below:

ch08-tree-4-2877785

Image source: Google images

This ends our Part 11 from the blog series on natural language processing!

Other blog posts of mine

You can also check out my previous blog posts.

Past Data Science Blog Posts.

LinkedIn

Here it is my Linkedin profile in case you want to connect with me. I will be happy to be connected with you.

Email

For any query, you can email me at Gmail.

Final notes

Thank you for reading!

I hope you liked the article. If you like, share it with your friends too. Anything not mentioned or do you want to share your thoughts? Feel free to comment below and I'll get back to you. 😉

The media shown in this article is not the property of DataPeaker and is used at the author's discretion.

Subscribe to our Newsletter

We will not send you SPAM mail. We hate it as much as you.