UiPath Web Scraping | Web Scraping with the RPA UiPath tool

Contents

This post was made public as part of the Data Science Blogathon.

The world is moving fast towards AI, so you better go with the flow. This line represents the adaptation of technology in the real world for better and faster results.

INTRODUCTION

Web Scraping, web data extraction and web collection is the collection of data from the web. In this days, everything and everyone needs data to function. Data is the most precious jewel in running any organization and the most challenging part is collecting or collecting quality data. Finding the data is good; extracting it even better; doing it using automation is perfect.

What is UiPath?

UiPath is an RPA tool. But wait, What is RPA?

What is RPA?

Quoting from the UIPath site:

Robotic process automation is the technology that today enables anyone to configure computer software. Or put another way, it's a “robot” to emulate and integrate the actions of a human being that interacts within digital systems to execute a business procedure. RPA robots use the user interface to capture data and manipulate applications like humans do. Interpret, trigger responses and communicate with other systems to perform a wide variety of repetitive tasks.

Only substantially better: an RPA software robot never sleeps and makes no mistakes.

Experiential session

Made in versions

UiPath – 20.4.3

Let's do web scraping using UiPath. Just check the web portal to see the data you want to extract and check the list of parent and child HTML tags for better understanding.

Steps to follow to make Web Scrape

  • Select the web portal and data
  • Create a Project to your desired directory
  • Create a flowchart file for web scraping flow layout
  • Design the flow
  • Run the automation flow
  • Open the Excel file and check the scraped data

Paso 1- Select the web portal and data

I selected this web portal “https://www.bullion-rates.com/gold/INR/2007-1-history.htm” and I want to extract data from gold rates along with dates.

Paso 2- Create a Project in your desired directory

Provide the name, the path and a brief description of your project.

Paso 3- Create a flowchart file

Now create a flowchart file to design your web scraping flow.

Paso 3- Design the flow

a) Choose the browser open in the activities panel

b) Determine ownership of open exploration

i) Choose browser type like Chrome

ii) Set the URL: insert your url in quotes here Data to scratch

iii) Choose New session as Certain

iv) Add a delay activity with an extension of 6 seconds in the format of 00:00:06, for the page to load correctly, there is also another alternative, But for now, i am using delay option.

v) Choose the Data Collection option

a) Choose Item option is coming

b) Select the Next option

c) The item selector highlight will now appear, therefore select the item. Once the items have been selected, you can see the preview data. If the data arrives as expected, select the Finish button; opposite case, choose the data again.

d) Now a pop-up box appears requesting multi-page scraping, so if you want to do multi-page scraping, select Yes and choose the item that will redirect you to the next page. In today's case, I want to scratch just one page, therefore I am using the No option.

me) The data extraction activity will appear in the flow layout. Select the Extract structured data activity ‘TABLE dtDGrid’ and you will notice two things in the properties

i) The default maximum number of results is 100, you can change it based on page logs.

ii) In the outlet section, you can see that the Data Table variable is Extract Data Table.

F) Now, we have to write the scraped data in Excel format. So we use the Write Range activity.

i) The first field is for the Excel sheet path, provide it based on the location of the excel sheet.

ii) The second field is for the sheet name and the cell name, provide the sheet name in quotes and remove the cell name. For you to create the sheet and write all the data.

iii) The last field is for a variable name, in my case the variable name is ExtractDataTable.

Paso 4-Source Run the automation flow

Click on run option or hit ctrl + f6 to run the automation flow.

Paso 5- Open the Excel file and check the scraped data

Conclution

I tried to explain web scraping using the RPA UiPath tool in a very simple way, I hope this helps.

Find the complete code at GitHub

If you have any questions about the code or web scraping in general, contact me at

Connect to Gyan on Linkedin

We will meet again with something new.

Until then,

Happy coding ..!

The media shown in this post is not the property of DataPeaker and is used at the author's discretion.

Subscribe to our Newsletter

We will not send you SPAM mail. We hate it as much as you.