How to query a MongoDB database using PyMongo in Python?

Contents

Overview

  • We will discuss how you can query a MongoDB database using PyMongo library.
  • We will cover the basic aggregation operations in MongoDB.

Introduction

Following the global expansion of the Internet, we are generating data at an unprecedented rate now. Because conducting any type of analysis would require us to collect / let's consult the necessary data from the database, it is extremely important that we choose the right tool to consult the data. Consequently, we can't imagine using SQL to work with this volume of data, since each consultation will be expensive.

query-mongodb-using-pymongo-9814919

Query a MongoDB database using PyMongo

This is precisely where MongoDB comes in. MongoDB is an unstructured database that stores data in the form of documents. What's more, MongoDB can handle large volumes of data very efficiently and is the most widely used NoSQL database, as it offers a rich query language and fast and flexible access to data.

In this article, we will see several examples of how to query a MongoDB database using PyMongo. What's more, we will see how to use comparison operators and logical operators, Regular expression and aggregation pipelines basics.

This article is a continuation of the MongoDB in Python Tutorial for Beginners, where we cover the challenges of unstructured databases, MongoDB basic operations and installation steps. Then, if you are a complete beginner at MongoDB, I would recommend that you read that article first.

Table of Contents

  1. What is PyMongo?
  2. Installation steps
  3. Insert the data into the database
  4. Consult the database
    1. Field based filter
    2. Filter based on comparison operators
    3. Filter based on logical operators
    4. Regular expressions
    5. Aggregation pipes
  5. Final notes

What is PyMongo?

PyMongo is a Python library that allows us to connect with MongoDB. What's more, this is the most recommended way to work with MongoDB and Python.

What's more, We have chosen Python to interact with MongoDB because it is one of the most widely used and considerably more powerful languages ​​for Data science. PyMongo allows us to retrieve the data with a syntax similar to that of a dictionary.

In case you are a beginner in Python, I will recommend that you enroll in this free course: Introduction to Python.

Installation steps

Installing PyMongo is simple and straightforward. Here, i guess you already have python 3 and MongoDB installed. The following command will help you install PyMongo:

pip3 install pymongo

Insert the data into the database

Now let's set things up before querying a MongoDB database using PyMongo. First we will insert the data into the database. The following steps will help you in this:

  1. Import the libraries and connect to the mongo client

    Start the MongoDB server on your machine. I assume a file is running on localhost: 27017.

    Let's start by importing some of the libraries that we are going to use. By default, MongoDB server running on port 27017 from the local machine. Later, we will connect to the MongoDB client using the Pymongo Library.

    Later, get db instance from sample_db db. In case there is no, MongoDB will create one for you.

  2. Create the collections from the JSON files

    We will use the data of a food delivery company that operates in multiple cities. What's more, they have several logistics centers in these cities to send food orders to their customers. You can download the data and code here.

    1. weekly_demand:
      • ID: unique ID for each document
      • week: Week number
      • center_id: Unique ID for the fulfillment center
      • food_id: unique food ID
      • checkout_price: Final price with discount, taxes and shipping costs
      • base price: Base meal price
      • emailer_for_promotion: Emailer sent for food promotion
      • homepage_featured: Food featured on the home page.
      • num_orders: (Destiny) Order count
    2. food_info:
      • food_id: Unique ID for food
      • category: Kind of food (drinks / snacks / soups….)
      • kitchen room: Food kitchen (india / Italian /…)

    Then we will create two collections in the sample_db database:

    screenshot-from-2020-08-18-01-34-57-4989570

    screenshot-from-2020-08-18-01-35-10-4366550

  3. Insert data into collections

    Now, the data we have is in JSON format. Then we will get the collection instance, We will read the data file and insert the data using the insert_many function.

Finally, have 456548 documents in the weekly demand collection and 51 documents in the food information collection. Now, Let's take a look at a document from each of these collections.

weekly_collection

screenshot-from-2020-08-18-01-36-57-9527843

meal_info_collection

screenshot-from-2020-08-18-01-37-17-5694111

Now, our data is ready. Let's go to consult this database.

Consult the database

We can query a MongoDB database using PyMonfo with the find function to obtain all the results that satisfy the given condition and also using the find one function that will return only a result that satisfies the condition.

The following is the syntax of find and find_one:

your_collection.find( {<< query >>} , { << fields>>} )

You can query the database using the following filtering techniques:

  1. Field based filter

    For instance, you have hundreds of fields and you want to see only some of them. You can do it by simply putting all the required field names with the value 1. For instance:

    screenshot-from-2020-08-18-01-38-49-4552120

    Secondly, if you want to discard some fields only from the whole document, you can put the field names equal to 0. Therefore, only those fields will be excluded. Note that you cannot use a combination of 1 Y 0 to get the fields. Either all must be one or all must be zero.

    screenshot-from-2020-08-18-01-39-38-7933201

  2. Filter with a condition

    Now, in this section, we will provide a condition in the first braces and fields to discard in the second. Consequently, will return the first document with center_id equals 55 and meal_id equals 1885 and it will also discard the _id and week fields.

    screenshot-from-2020-08-18-01-40-27-4463234

  3. Filter based on comparison operators

    The following are the nine comparison operators in MongoDB.

    NAME DESCRIPTION
    $eq Will match values ​​that are equal to a specified value.
    $gt Will match values ​​that are greater than a specified value.
    $gte Will match all values ​​that are greater than or equal to a specified value.
    $in Will match any of the specified values ​​in an array.
    $lt Will match all values ​​that are less than a specified value.
    $lte Will match all values ​​that are less than or equal to a specified value.
    $ne Will match all values ​​that are not equal to a specified value.
    $nin Will not match any of the specified values ​​in an array.

    The following are some examples of the use of these comparison operators:

    1. Same as and not equal to

      We will find all the documents where center_id is equal to 55 and homepage_featured is not equal to 0. How we are going to use the search function, will return the cursor for that command. What's more, use a for loop to iterate through the query results.

      screenshot-from-2020-08-18-01-41-19-9573970

    2. On the list and not on the list

      For instance, must match one element with multiple elements. Then, instead of using the operator $ eq several times, we can use the operator $ in. We will try to find all the documents where center_id is 24 u 11.

      screenshot-from-2020-08-18-01-43-39-2455630

      Later, we look for all documents where center_id is not present in the specified list. The following query will return all documents where center_id is not 24 and neither 11.

      result_3-9662150

    3. Less than and Greater than

      Now, let's find all the documents where center_id is 55 and checkout_price is greater than 100 and less than 200. Use the following syntax for this-

      result_4-9980613

  4. Filter based on logical operator

    NAME DESCRIPTION
    $and Will join the query clauses with a logic. AND and returns all documents that meet both conditions.
    $not It will reverse the effect of a query and return documents that are not no match the query expression.
    $nor Will join the query clauses with a logic. NOR and return all documents that do not comply with the clauses.
    $or Will join the query clauses with a logic. OR and return all documents that match the conditions of any of the clauses.

    The following examples illustrate the use of logical operators:

    1. And operator

      The following query will return all documents where center_id equals 11 and also meal_id is not equal to 1778. The subqueries for the Y The operator will enter a list.

      result_5-2958763

    2. Operador OR

      The following query will return all documents where center_id equals 11 o meal_id es 1207 O 2707. What's more, the subqueries for the O The operator will enter a list.

      result_6-1186208

  5. Filter with regular expressions

    Regular expressions are very useful when you have text fields and you want to search for documents with a specific pattern. In case you want to learn more about regular expressions, I recommend that you read this article: Beginner's Tutorial for Regular Expressions in Python.

    Can be used with operator. $ regex and we can provide value to the operator so that the regex pattern is matc. We will use the meal_info collection for this query and then we will find the docs where the kitchen field starts with character C.

    result_7-9268219

    Let's take another example of regular expressions. We will discover all the documents in which the category starts from the character. “S” and the kitchen ends with “Ian“.

    result_8-8426772

  6. Aggregation pipes

    The MongoDB Aggregation Pipeline provides a framework for performing a series of data transformations on a dataset. The following is its syntax:

    your_collection.aggregate( [ { <stage1> }, { <stage2> },.. ] )
    

    The first stage takes the full set of documents as input and, from there, each subsequent stage takes the result set of the previous transformation as input to the next stage and produces the output.

    There are around of 10 transformations available in the MongoDB aggregate, of which we will see $ match Y $ group in this article. We will discuss each of the transforms in detail in the next MongoDB article.

    For instance, In the first stage, we will match the documents where center_id equals 11 and in the next stage, will count the number of documents with center_id equal to 11. Please note that we have assigned the $ count operator equal value total_rows in the second stage that is the name of the field that we want in the output.

    result_9-9401019

    Now, let's take another example where the first stage is the same as before, namely, center_id equals 11 and in the second stage, we want to calculate the average of the num_orders field for the center_id 11 and the only meal_ids for the center_id 11.

    result_10-1733037

Final notes

The unfathomable amount of data that is generated today makes it necessary to find better alternatives like this to consult data. To sum up, in this article, we learned how to query a MongoDB database using PyMongo. What's more, we understood how to apply various filters according to the required situation.

In case you want more information about the data query, I recommend the following course: Structured query language (SQL) for data science

In the next article, we will analyze aggregation pipelines in detail.

I encourage you to try things on your own and share your experiences in the comments section. What's more, if you have a problem with any of the above concepts, feel free to ask me in the comments below.

Subscribe to our Newsletter

We will not send you SPAM mail. We hate it as much as you.