5 Python Scripts That Taught Me Real-World Parsing and Automation



This content originally appeared on DEV Community and was authored by Fahad Shah

(From Course 2 & 3 of Python for Everybody – Applied Like a Pro)

Most beginners stop at print statements.
I used every course module to build scripts that scrape, parse, and automate real data pipelines.

Here are 5 scripts that went beyond the basics — each one feels like a tool, not a toy.

1⃣ 📬 Spam Confidence Extractor

Parses through emails and calculates average spam confidence from X-DSPAM-Confidence: headers.

✅ Skills:

find(), float(), string parsing

File reading, data cleaning

count = 0
total = 0

with open("mbox.txt") as f:
    for line in f:
        if line.startswith("X-DSPAM-Confidence:"):
            num = float(line.split(":")[1].strip())
            count += 1
            total += num

print("Average spam confidence:", total / count)

📎 Real-World Use: Email filtering, NLP pre-cleaning, header analysis.

2⃣ 📧 Email Address Counter

Counts how many times each sender appears and prints the most frequent one.

✅ Skills:

dict counting, string parsing, file handling

emails = {}

with open("mbox.txt") as f:
    for line in f:
        if line.startswith("From "):
            parts = line.split()
            email = parts[1]
            emails[email] = emails.get(email, 0) + 1

max_email = max(emails, key=emails.get)
print(max_email, emails[max_email])

📎 Real-World Use: Inbox analytics, sender clustering, contact insights.

3⃣ ⏰ Hour Histogram

Parses timestamps from From lines and plots an hour-wise distribution.

✅ Skills:

split(), dict, sorting keys

hours = {}

with open("mbox.txt") as f:
    for line in f:
        if line.startswith("From "):
            time = line.split()[5]
            hour = time.split(":")[0]
            hours[hour] = hours.get(hour, 0) + 1

for hour in sorted(hours):
    print(hour, hours[hour])

📎 Real-World Use: Time-based behavior analysis, email scheduling data, logs monitoring.

4⃣ 🌐 BeautifulSoup Scraper

Pulls all anchor tag texts from a live webpage using BeautifulSoup.

✅ Skills:

HTTP requests, HTML parsing, bs4 tag navigation

import urllib.request
from bs4 import BeautifulSoup

url = input("Enter URL: ")
html = urllib.request.urlopen(url).read()
soup = BeautifulSoup(html, "html.parser")

for tag in soup("a"):
    print(tag.get("href", None))

📎 Real-World Use: Link scraping, data crawling, sitemap audits.

5⃣ 🔗 JSON API Extractor

Fetches data from a REST API, parses JSON, and processes nested fields.

✅ Skills:

urllib, json, nested dictionary access

import urllib.request, urllib.parse, json

url = "http://py4e-data.dr-chuck.net/comments_42.json"
data = urllib.request.urlopen(url).read().decode()
info = json.loads(data)

total = sum([int(item["count"]) for item in info["comments"]])
print("Sum:", total)

📎 Real-World Use: API response processing, backend pipelines, data analytics inputs.

🧩 Why This Matters

These aren’t random exercises.
Each script taught me core data processing patterns that show up in real-world systems:

  • Parsing messy input → extracting value
  • Aggregating + filtering data
  • Understanding structure behind unstructured sources

Not toy problems — these are backend blueprints.

🔗 Follow My Build Journey

#1FahadShah #Python #DataParsing #BackendEngineering #BuildInPublic #WebScraping #JSON #APIs #LearningInPublic


This content originally appeared on DEV Community and was authored by Fahad Shah