Guide to Natural Language Processing in Python (Part -1)

Contents

This article was published as part of the Data Science Blogathon

Introduction

Computers and machines are great for working with tabular data or spreadsheets. But nevertheless, humans generally communicate in words and sentences, not in the form of tables or spreadsheets, and most of the information that humans speak or write is present in an unstructured way. Therefore, it is not very understandable that computers interpret these languages.

Therefore, in natural language processing (PNL), our goal is to make unstructured computer text understandable and retrieve meaningful information from it.

Let's formally define natural language processing (PNL),

Natural language processing (PNL) is a subfield of artificial intelligence, involving computer-human interactions.

Then, in this article, we will discuss some of the basic concepts related to NLP. This article is part of a blog series on natural language processing (PNL).

This is the part 1 from the blog series on the Step-by-Step Guide to Natural Language Processing.

Important note

After completing some topics, there are some practice questions (Test your knowledge) since you have to solve and give the answer in the comment box so that you can check your understanding of a particular topic.

Table of Contents

1. What is natural language processing (PNL)?

2. Natural language processing applications

3. Understanding Natural Language Processing

4. Difference between rule-based NLP and stat-based NLP

5. Components of natural language processing

6. Ambiguity and uncertainty in natural language processing

What is natural language processing?

Natural language processing (PNL) is a subfield of computer science and artificial intelligence that deals with the interactions between computers and human languages (natural). This becomes crucial when we want to apply machine learning algorithms or deep learning to a dataset containing text and speech.

For instance, we can use NLP to create artificial intelligence systems like,

  • Speech recognition,
  • Summary of documents,
  • Translator machine,
  • Spam detection,
  • Named entity recognition,
  • Answer to questions,
  • Autocomplete,
  • Predictive writing, etc.

Nowadays, most of our smartphones have a voice recognition system. These smartphones use NLP to understand natural language and give the answer. What's more, most people use laptops whose operating system has built-in voice recognition.

Test your knowledge

Which of the following is the field of natural language processing?

  • computer's science
  • Artificial intelligence
  • Computational linguistics
  • All previous

NLP Applications

Some applications of natural language processing are as follows:

Cortana

1txj0kr4jvrtltmvxzfu8lw-9927551

Image source: Google images

Microsoft's operating system has a virtual assistant called Cortana that can recognize a natural voice. Its applications include

  • Set reminders
  • Open applications,
  • Email anyone,
  • Play games to entertain yourself,
  • Flight and package tracking,
  • Check the weather, etc.

If you want to read more about Cortana commands, see link here.

Siri

1-aukczbxivohi-agx4j8pq-7097538

Image source: Google Images

Siri is a virtual assistant created by iOS operating systems, watchOS, macOS, HomePod y tvOS de Apple Inc. Again, with this you can do many things with voice commands:

  • Start a call with anyone
  • Send a text message to someone
  • Send an e-mail
  • Set a timer
  • Take a photo
  • Open an app
  • Set an alarm
  • Use navigation, etc.

Here is a complete list of all Siri commands.

Gmail

1ftphu7pqgibnngbwg5zfwa-1381652

Image source: Google images

Gmail is the famous email service developed by Google and uses spam detection to filter out some spam emails by word processing, in which you get the texts of that particular email that you are trying to find as spam or not.

Test your knowledge

Which of the following are NLP use cases?

  • Detect objects from an image
  • Facial recognition
  • Speech biometric
  • Text summary

Understanding Natural Language Processing

Understanding Natural Language Processing

Image source: Google images

U.S, like human beings, it is not a very difficult task to perform natural language processing (PNL), but still, we are not perfect. We often misunderstand one thing for another and, often, we interpret the same sentences or words in a different way.

For instance, Consider the following sentences and try to understand their interpretation in many different ways:

Example 1

Sentence: I saw a student on a hill with a microscope.

These are various interpretations of the previous sentence shown below:

  • There is a student on the hill and I looked at him with my microscope.
  • There is a student on the hill and he has a microscope.
  • I am on a hill and I saw a student using my microscope.
  • I am on a hill and I saw a student who has a microscope.
  • There is a student on a hill and I saw something with my microscope.

Example 2

Sentence: Can you help me with the can?

In the previous sentence, we observe that there are two words “can”, but they have different meanings. Here.

The first word “may” is used to form a question.

The second word “years” which is used at the end of the sentence is used to represent a container that contains some things like food or liquids, etc.

What conclusions can we infer from the two previous examples?

From the two examples above, we can see that language processing is not “deterministic”, namely, the same language has the same interpretations, and something suitable for one person may not be suitable for another. Therefore, natural language processing (NLP) has a non-deterministic approach.

In simple words, we can use Natural Language Processing to create a new intelligent or AI system that can understand in the same way as human and interpret language in different situations.

Difference between rule-based NLP and statistical NLP

Natural language processing is divided into two different approaches:

Rules-based natural language processing

Use common sense reasoning to process tasks.

For instance,

  • Freezing temperatures can cause death or
  • Hot coffee can burn people's skin
  • Some other common sense reasoning tasks, etc.

But nevertheless, these processes can take longer and require manual effort.

Statistical processing of natural language

This type of NLP uses large amounts of data and aims to derive conclusions from them. To train NLP models, uses machine learning algorithms. After completing the training process on large amounts of data, the trained model will have positive results with deduction.

Comparison (pros and cons)

Comparison (pros and cons)

Components of NLP

The two basic components that NLP can be divided into are as follows:

  • Natural language understanding (NLU)
  • Natural language generation (NLG)

Components of NLP

Image source: Google images

Natural language understanding (NLU)

NLU is naturally more difficult than NLG tasks. Let's look at the challenges a machine faces as it tries to understand natural language.

When learning or trying to interpret a language, there are many ambiguities.

Sentence: He is looking for a match.

Here, What do you understand by “match” – Couples match or cricket / soccer.

Lexical ambiguity can occur when a word has a different meaning, namely, has more than one meaning, and the sentence in which that word is used may be interpreted differently based on its correct meaning. To resolve these types of ambiguities to some extent, we can use part-of-speech tagging techniques.

Sentence: The chicken is ready to eat.

Is the chicken ready to eat your meal or is the chicken ready for someone else to eat?? You never know.

Syntactic ambiguity occurs when we observe that there can be more than one meaning in a sequence of words. Also known as grammatical ambiguity.

Sentence: Chirag met Kshitiz and Dinesh. They went to a restaurant.

Here, they refer to Kshitiz and Dinesh or all.

Referential ambiguity: Very often in a text an entity is mentioned (something / somebody) and then it is referenced again, possibly in a different sentence, with the help of another word. Then, these different pronouns can cause ambiguity when it is not clear which noun you are referring to.

Natural language generation (NLG)

It is defined as the process of generating or extracting some meaningful phrases and sentences in the form of natural language with the help of some internal representation.

This component includes the three basic steps:

  • Text planning: It involves the retrieval of relevant information from the knowledge base.
  • Sentence planning: It involves processes like choosing the required words, form meaningful sentences, set the tone of the sentence.
  • Realization of text: It involves mapping the prayer plans into the sentence structure.

Test your knowledge

Question 1: NLP is divided into two subfields:

  • symbolic and numeric
  • algorithmic and heuristic
  • time and movement
  • understanding and generation

Question 2: Which of the following is used to map sentence plans into the sentence structure??

  • Text planning
  • Sentence planning
  • Realization of text
  • All of the above

Ambiguity and uncertainty in NLP

In natural language processing, ambiguity can be referred to as the ability to be understood in more than one way. In simple terms, we can understand the ambiguity regarding the ability to be understood in more than one way. Natural language is very ambiguous.

NLP has the following five types of ambiguities:

Lexical ambiguity

Lexical ambiguity is the ambiguity implied by the ambiguity of a single word.

For instance, Let's consider the following sentences:

She won two silver medals
She made a silver speech
His worries had silvered his hair

In the previous sentences, how we treat the word silver- as a noun, an adjective or a verb.

Syntactic ambiguity

Syntactic ambiguity occurs when a sentence is parsed in different ways.

For instance, Let's have a prayer

Sentence: The man saw the girl with the microscope

This sentence is ambiguous as:

whether the man saw the girl under a microscope or saw her through his microscope.

Semantic ambiguity

This type of ambiguity occurs when the meaning of the words themselves can be misinterpreted. In simple words, semantic ambiguity occurs when a sentence contains an ambiguous word or phrase.

For instance, Let's have a prayer

Sentence: The bus hit the pole while it was moving

The previous sentence has semantic ambiguity because this sentence can have two interpretations

  • “The moving bus hit the pole”
  • "The bus collided with the pole while the pole was moving".

Anaphoric ambiguity

Anaphora means when the same beginning of a sentence is repeated several times and anaphoric ambiguity occurs due to the use of anaphora entities in speech.

For instance, Let's have a group of prayers:

Sentence: The dog ran up the hill. It was very steep. It soon got tired. 

Here, the anaphoric reference of “that” in two situations causes ambiguity.

Pragmatic ambiguity

These types of ambiguities occur when the context of a sentence gives it multiple interpretations. In simple words, we can say that these ambiguities arise when the statement is not specific.

For instance, Let's have a prayer

Sentence: I like you too

which can have multiple interpretations like:

  • I like you (how you like me)
  • I like you (like the others).

This ends our Part 1 from the blog series on natural language processing!

Final notes

Thank you for reading!

If you liked this and want to know more, visit my other articles on data science and machine learning by clicking on the Link

Feel free to contact me at Linkedin, Email.

Anything not mentioned or do you want to share your thoughts? Feel free to comment below and I'll get back to you.

About the Author

Chirag Goyal

Nowadays, I am pursuing my Bachelor of Technology (B.Tech) in Computer Science and Engineering from Indian Institute of Technology Jodhpur (IITJ). I am very excited about machine learning, deep learning and artificial intelligence.

The media shown in this article is not the property of DataPeaker and is used at the author's discretion.

Subscribe to our Newsletter

We will not send you SPAM mail. We hate it as much as you.

Datapeaker