This content originally appeared on DEV Community and was authored by Nelson Orina
Introduction
In our first tutorial, we learnt the basics of web scraping. Now, let’s dive deeper into BeautifulSoup’s powerful features.
1. Finding Elements: find()
vs find_all()
Single vs Multiple Elements
<div class="product">
<h3 class="title">Product 1</h3>
<a href="/product1">View Product</a>
</div>
<div class="product">
<h3 class="title">Product 2</h3>
<a href="/product2">View Product</a>
</div>
#find() - returns FIRST matching element
first_product = soup.find('div', class_='product')
print(first_product)
#Output is the first div
#find_all() - returns LIST of all matching elements
all_products = soup.find_all('div', class_='product')
print(all_products)
# Output: [<div class="product">...</div>, <div class="product">...</div>]
Practical Examples
#Get the first h1 tag
main_title = soup.find('h1')
#Get all paragraph tags
all_paragraphs = soup.find_all('p')
#Get all links
all_links = soup.find_all('a')
2. CSS Selectors with select()
and select_one()
Powerful Selection Methods
<div class="product">
<h3 class="title">Product 1</h3>
<a href="/product1">View Product</a>
</div>
<div class="product">
<h3 class="title">Product 2</h3>
<a href="/product2">View Product</a>
</div>
#select_one() - returns first match (like find())
first_product = soup.select_one('.product')
print(first_product)
#Output is the div holding product one
#select() - returns all matches (like find_all())
all_products = soup.select('.product')
print(all_products)
#Output is both divs in this case as they are of the same class.
CSS Selector Examples
#Select by class
products = soup.select('.product-item')
#Select by ID
header = soup.select('#main-header')
#Complex selectors
featured_products = soup.select('div.product.featured')
product_titles = soup.select('div.product > h3.title')
select()
is often more flexible than find_all()
because you can use full CSS selectors, including nested element targeting.
3. Extracting Attributes and Data
Once we’ve located elements, the next step is usually to extract the actual information inside them such as text, links, or image URLs.
Extracting Text From Elements
Often, we just need the text content inside an element. You can access this using the .text
or .get_text()
:
Getting Attributes
<div class="product">
<h3 class="title">Product 1</h3>
<a href="/product1">View Product</a>
</div>
<div class="product">
<h3 class="title">Product 2</h3>
<a href="/product2">View Product</a>
</div>
product_titles = soup.select('div.product > h3.title')
for title in product_titles:
print(title.text)
#Output:
#Product 1
#Product 2
Both .text and .get_text() work the same way for most cases, so you can use whichever you prefer.
Extracting Attributes (href,src,etc.)
If you need to get the value of an attribute (like a link’s href or an image’s src), use .get():
#Extract all links and print their href values
links = soup.find_all('a')
for link in links:
print(link.get('href'))
#Output:
# /product1
# /product2
images = soup.find_all('img')
for img in images:
print(img.get('src'))
#Output:
#/images/product1.png
#/images/product2.png
Pro Tip:
You can combine both text and attributes. For example, printing both the product name and its link:
products = soup.select('div.product')
for product in products:
title = product.select_one('h3.title').text
link = product.select_one('a').get('href')
print(f"{title} -> {link}")
#Output:
# Product 1 → /product1
# Product 2 → /product2
This is the typical pattern you’ll use when building real scrappers, find the container element, then drill down into it to extract the data you need.
4.Handling Missing Data Gracefully
Sometimes websites may have missing fields, and trying to access them directly can cause AttributeError. A helper function like this ensures your scraper doesn’t break when data is missing.
Safe Data Extraction Techniques
def safe_extract(element, selector, default="Not available"):
"""Safely extract text from element with fallback"""
found = element.find(selector)
return found.text.strip() if found else default
# Usage in scraping loop
for product in products:
name = safe_extract(product, 'h3')
price = safe_extract(product, '.price', '$0.00')
rating = safe_extract(product, '.rating', 'No rating')
# Extract optional attributes safely
image = product.find('img')
image_src = image.get('src') if image else 'default.jpg'
5.Error Handling Pattern
Always wrap network calls in try/except blocks to prevent your script from crashing on connection issues:
try:
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
products = soup.find_all('div', class_='product')
if not products:
print("No products found - check your selectors!")
except Exception as e:
print(f"Error during scraping: {e}")
Conclusion
In this tutorial, we moved beyond the basics and learned:
- How to use
find()
/find_all()
and CSS selectors (select()
/select_one()
) - How to extract text and attributes from elements
- How to combine data points for structured results
- How to handle missing elements safely
- How to implement error handling for more reliable scrapers
With these techniques, you can now confidently scrape more complex websites and built robust, maintainable web scrapers.
Complete End to End Example
Here is a simple scrapper that takes data from a sample e-commerce platform and prints out the details.
#importing the BeautifulSoup library and requests module
from bs4 import BeautifulSoup
import requests
# Fetching the content of a webpage and parsing it with BeautifulSoup
# The URL is a test site for web scraping
url = "https://webscraper.io/test-sites/e-commerce/scroll"
#Sending a GET request to the URL
#This returns a response object
request = requests.get(url)
soup = BeautifulSoup(request.content,"html.parser")
cards = soup.select('div.product-wrapper')
for card in cards:
image = card.select_one('.img-fluid').get('src')
price = card.select_one('span').text
#The strip is used to remove any leading or trailing whitespace
title = card.select_one('a.title').text.strip()
description = card.select_one('p.description').text
rating = card.select_one('p.review-count > span').text
print(f"Title: {title}")
print(f"Description: {description}")
print(f"Price: {price}")
print(f"Rating: {rating}")
print(f"Image URL: {image}")
print("-" * 40)
Title: Apple MacBook...
Description: Apple MacBook Air 13", Intel Core i5 1.8GHz, 8GB, 256GB SSD, Intel HD 6000, RUS
Price: $1347.78
Rating: 11
Image URL: /images/test-sites/e-commerce/items/cart2.png
----------------------------------------
Title: HP 250 G3
Description: 15.6", Core i5-4210U, 4GB, 500GB, Windows 8.1
Price: $520.99
Rating: 13
Image URL: /images/test-sites/e-commerce/items/cart2.png
----------------------------------------
Title: Lenovo ThinkPa...
Description: Lenovo ThinkPad Yoga 370 Black, 13.3" FHD IPS Touch, Core i5-7200U, 8GB, 256GB SSD, 4G, Windows 10 Pro
Price: $1362.24
Rating: 12
Image URL: /images/test-sites/e-commerce/items/cart2.png
----------------------------------------
This content originally appeared on DEV Community and was authored by Nelson Orina