This content originally appeared on DEV Community and was authored by Fahad Shah
(From Course 2 & 3 of Python for Everybody – Applied Like a Pro)
Most beginners stop at print statements.
I used every course module to build scripts that scrape, parse, and automate real data pipelines.
Here are 5 scripts that went beyond the basics — each one feels like a tool, not a toy.
1⃣
Spam Confidence Extractor
Parses through emails and calculates average spam confidence from X-DSPAM-Confidence: headers.
Skills:
find(), float(), string parsing
File reading, data cleaning
count = 0
total = 0
with open("mbox.txt") as f:
for line in f:
if line.startswith("X-DSPAM-Confidence:"):
num = float(line.split(":")[1].strip())
count += 1
total += num
print("Average spam confidence:", total / count)
Real-World Use: Email filtering, NLP pre-cleaning, header analysis.
2⃣
Email Address Counter
Counts how many times each sender appears and prints the most frequent one.
Skills:
dict counting, string parsing, file handling
emails = {}
with open("mbox.txt") as f:
for line in f:
if line.startswith("From "):
parts = line.split()
email = parts[1]
emails[email] = emails.get(email, 0) + 1
max_email = max(emails, key=emails.get)
print(max_email, emails[max_email])
Real-World Use: Inbox analytics, sender clustering, contact insights.
3⃣
Hour Histogram
Parses timestamps from From lines and plots an hour-wise distribution.
Skills:
split(), dict, sorting keys
hours = {}
with open("mbox.txt") as f:
for line in f:
if line.startswith("From "):
time = line.split()[5]
hour = time.split(":")[0]
hours[hour] = hours.get(hour, 0) + 1
for hour in sorted(hours):
print(hour, hours[hour])
Real-World Use: Time-based behavior analysis, email scheduling data, logs monitoring.
4⃣
BeautifulSoup Scraper
Pulls all anchor tag texts from a live webpage using BeautifulSoup.
Skills:
HTTP requests, HTML parsing, bs4 tag navigation
import urllib.request
from bs4 import BeautifulSoup
url = input("Enter URL: ")
html = urllib.request.urlopen(url).read()
soup = BeautifulSoup(html, "html.parser")
for tag in soup("a"):
print(tag.get("href", None))
Real-World Use: Link scraping, data crawling, sitemap audits.
5⃣
JSON API Extractor
Fetches data from a REST API, parses JSON, and processes nested fields.
Skills:
urllib, json, nested dictionary access
import urllib.request, urllib.parse, json
url = "http://py4e-data.dr-chuck.net/comments_42.json"
data = urllib.request.urlopen(url).read().decode()
info = json.loads(data)
total = sum([int(item["count"]) for item in info["comments"]])
print("Sum:", total)
Real-World Use: API response processing, backend pipelines, data analytics inputs.
Why This Matters
These aren’t random exercises.
Each script taught me core data processing patterns that show up in real-world systems:
- Parsing messy input → extracting value
- Aggregating + filtering data
- Understanding structure behind unstructured sources
Not toy problems — these are backend blueprints.
Follow My Build Journey
- GitHub: github.com/1FahadShah
- Twitter/X: x.com/1FahadShah
- Medium: 1fahadshah.medium.com
- LinkedIn: linkedin.com/in/1fahadshah
- Hashnode: hashnode.com/@1FahadShah
- Personal Site: 1fahadshah.com (Launching soon)
#1FahadShah #Python #DataParsing #BackendEngineering #BuildInPublic #WebScraping #JSON #APIs #LearningInPublic
This content originally appeared on DEV Community and was authored by Fahad Shah