Web Scraping Using Python

Nihar javiya
2 min readJul 28, 2021

--

What is Web Scraping

Web scraping is a method to extract data from websites. Using web scarping we can store unstructured website data to structured website data. There are different ways of scrape websites. In this, we’ll do web scraping with python.

In general, web data extraction is used by people and businesses who want to make use of the vast amount of publicly available web data to make smarter decisions.

Requirement for Web Scraping

  1. Selenium
  2. Beautiful Soup
  3. Request
  4. lxml

Installation

We have to install following libraries and packages for web scraping and for that wehave to run following commands in python editor.

pip install selenium
!apt install chromium-chromedriver
pip install beautifulsoup4

Import Libraries

We have to import following libraries.

from selenium import webdriver
from bs4 import BeautifulSoup
import pandas as pd

Website URL

driver.get(“https://www.flipkart.com/search?q=mi+mobiles&sid=tyy%2C4io&as=on&as-show=on&otracker=AS_QueryStore_OrganicAutoSuggest_1_2_na_na_na&otracker1=AS_QueryStore_OrganicAutoSuggest_1_2_na_na_na&as-pos=1&as-type=RECENT&suggestionId=mi+mobiles%7CMobiles&requestId=1fc15655-826d-4cb0-8cd9-cd9660380f80&as-backfill=on")
content = driver.page_source
soup = BeautifulSoup(content)

Webdriver

As we all know, to get the content from the website we need to provide a web driver of the browser we are using.

from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument(‘ — headless’)
chrome_options.add_argument(‘ — no-sandbox’)
chrome_options.add_argument(‘ — disable-dev-shm-usage’)
wd = webdriver.Chrome(‘chromedriver’,chrome_options=chrome_options)
driver=webdriver.Chrome(‘chromedriver’,chrome_options=chrome_options)

Fetching data from website

First we have to create empty list for getting data from website

products=[] #List to store the name of the product
prices=[] #List to store price of the product
features=[] #List to store rating of the product

Now, you have to run a loop and fetch all data and append it to the list. To fetch the data we first need to inspect the webpage and then we need to check the class name of that particular div tag and we need to write it in the code.

for a in soup.findAll(‘a’,href=True, attrs={‘class’:’_1fQZEK’}):
name=a.find(‘div’,attrs={‘class’:’_4rR01T’})
price=a.find(‘div’,attrs={‘class’:’_30jeq3 _1_WHN1'})
feature=a.find(‘div’,attrs={‘class’:’fMghEO’})
products.append(name.text)
prices.append(price.text)
features.append(feature.text)

Data Frame

df = pd.DataFrame({‘Product Name’:products,’Price’:prices , ‘Feature’:features})print(df.head(10))

Convert data into CSV file

df.to_csv(‘products.csv’, index=False, encoding=’utf-8')

--

--

Nihar javiya
Nihar javiya

No responses yet