How I Built a Free Shopify Products Scraper Using ElectronJS



This content originally appeared on DEV Community and was authored by StartupNoon 💰 🚀

Having access to product data is crucial for market research, competitor analysis, and building product catalogs. While there are paid tools available, I’ll show you how to build your own free Shopify products scraper using ElectronJS that can extract comprehensive product information from any Shopify store.

Why ElectronJS for Web Scraping?

ElectronJS is a framework for creating native Windows/Mac/Linux applications with web technologies (Javascript, HTML, CSS). It includes the browser Chromium, fully configurable. This makes it perfect for web scraping because:

  • Real Browser Environment: Unlike headless scrapers, Electron runs a full Chrome browser
  • JavaScript Support: Handles dynamic content and AJAX requests naturally
  • Cross-Platform: Works on Windows, Mac, and Linux
  • User-Friendly GUI: No command-line expertise required
  • Cookie & Session Management: Maintains state across requests

Understanding Shopify’s Architecture

Before we dive into coding, it’s important to understand how Shopify works. Shopify provides an API endpoint called products.json that will give you all products with their prices, variants and all other attributes. Most Shopify stores expose their products at:

https://[store-name].com/products.json
https://[store-name].com/collections/[collection-handle]/products.json

Project Setup

Prerequisites

  • Node.js (v14 or higher)
  • Basic knowledge of JavaScript
  • Text editor (VS Code recommended)

Initialize the Project

mkdir shopify-scraper-electron
cd shopify-scraper-electron
npm init -y

Install Dependencies

npm install electron --save-dev
npm install axios cheerio csv-writer --save

Project Structure

Create the following directory structure:

shopify-scraper-electron/
├── package.json
├── main.js              # Main Electron process
├── index.html           # Main window UI
├── renderer.js          # Renderer process logic
├── scraper.js          # Scraping logic
└── preload.js          # Preload script for security

Core Implementation

1. Main Process (main.js)

const { app, BrowserWindow, ipcMain, dialog } = require('electron');
const path = require('path');

let mainWindow;

function createWindow() {
    mainWindow = new BrowserWindow({
        width: 1200,
        height: 800,
        webPreferences: {
            nodeIntegration: false,
            contextIsolation: true,
            preload: path.join(__dirname, 'preload.js')
        },
        icon: path.join(__dirname, 'assets/icon.png') // Optional
    });

    mainWindow.loadFile('index.html');

    // Open DevTools in development
    // mainWindow.webContents.openDevTools();
}

app.whenReady().then(() => {
    createWindow();

    app.on('activate', () => {
        if (BrowserWindow.getAllWindows().length === 0) {
            createWindow();
        }
    });
});

app.on('window-all-closed', () => {
    if (process.platform !== 'darwin') {
        app.quit();
    }
});

// Handle file save dialog
ipcMain.handle('save-file-dialog', async () => {
    const result = await dialog.showSaveDialog(mainWindow, {
        filters: [
            { name: 'CSV Files', extensions: ['csv'] },
            { name: 'JSON Files', extensions: ['json'] }
        ],
        defaultPath: 'shopify-products.csv'
    });
    return result.filePath;
});

2. User Interface (index.html)

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Shopify Products Scraper</title>
    <style>
        body {
            font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
            margin: 0;
            padding: 20px;
            background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
            min-height: 100vh;
        }

        .container {
            max-width: 800px;
            margin: 0 auto;
            background: white;
            border-radius: 10px;
            padding: 30px;
            box-shadow: 0 10px 30px rgba(0,0,0,0.2);
        }

        h1 {
            text-align: center;
            color: #333;
            margin-bottom: 30px;
        }

        .input-group {
            margin-bottom: 20px;
        }

        label {
            display: block;
            margin-bottom: 8px;
            font-weight: 600;
            color: #555;
        }

        input[type="text"], textarea, select {
            width: 100%;
            padding: 12px;
            border: 2px solid #ddd;
            border-radius: 6px;
            font-size: 14px;
            transition: border-color 0.3s;
        }

        input[type="text"]:focus, textarea:focus, select:focus {
            outline: none;
            border-color: #667eea;
        }

        textarea {
            height: 120px;
            resize: vertical;
        }

        .button-group {
            display: flex;
            gap: 10px;
            margin-top: 30px;
        }

        button {
            padding: 12px 24px;
            border: none;
            border-radius: 6px;
            font-size: 16px;
            font-weight: 600;
            cursor: pointer;
            transition: all 0.3s;
        }

        .btn-primary {
            background: #667eea;
            color: white;
            flex: 1;
        }

        .btn-primary:hover {
            background: #5a6fd8;
            transform: translateY(-2px);
        }

        .btn-secondary {
            background: #6c757d;
            color: white;
            flex: 1;
        }

        .btn-secondary:hover {
            background: #5a6268;
        }

        .progress-container {
            margin-top: 20px;
            display: none;
        }

        .progress-bar {
            width: 100%;
            height: 20px;
            background: #f0f0f0;
            border-radius: 10px;
            overflow: hidden;
        }

        .progress-fill {
            height: 100%;
            background: #28a745;
            width: 0%;
            transition: width 0.3s;
        }

        .log-container {
            margin-top: 20px;
            max-height: 200px;
            overflow-y: auto;
            background: #f8f9fa;
            border-radius: 6px;
            padding: 15px;
            border: 1px solid #ddd;
            display: none;
        }

        .log-entry {
            margin-bottom: 5px;
            padding: 5px 0;
            border-bottom: 1px solid #eee;
        }

        .log-success { color: #28a745; }
        .log-error { color: #dc3545; }
        .log-info { color: #17a2b8; }
    </style>
</head>
<body>
    <div class="container">
        <h1>🛍 Shopify Products Scraper</h1>

        <div class="input-group">
            <label for="shopifyUrl">Shopify Store URL:</label>
            <input type="text" id="shopifyUrl" placeholder="https://example.myshopify.com" />
        </div>

        <div class="input-group">
            <label for="urlList">Or paste multiple product URLs (one per line):</label>
            <textarea id="urlList" placeholder="https://store.com/products/product-1&#10;https://store.com/products/product-2"></textarea>
        </div>

        <div class="input-group">
            <label for="outputFormat">Output Format:</label>
            <select id="outputFormat">
                <option value="csv">CSV</option>
                <option value="json">JSON</option>
            </select>
        </div>

        <div class="input-group">
            <label for="delay">Delay between requests (seconds):</label>
            <input type="number" id="delay" value="2" min="1" max="10" />
        </div>

        <div class="button-group">
            <button class="btn-primary" onclick="startScraping()">Start Scraping</button>
            <button class="btn-secondary" onclick="stopScraping()">Stop</button>
        </div>

        <div class="progress-container" id="progressContainer">
            <div class="progress-bar">
                <div class="progress-fill" id="progressFill"></div>
            </div>
            <p id="progressText">Processing...</p>
        </div>

        <div class="log-container" id="logContainer">
            <div id="logEntries"></div>
        </div>
    </div>

    <script src="renderer.js"></script>
</body>
</html>

3. Preload Script (preload.js)

const { contextBridge, ipcRenderer } = require('electron');

contextBridge.exposeInMainWorld('electronAPI', {
    saveFileDialog: () => ipcRenderer.invoke('save-file-dialog'),
    scrapeProducts: (config) => ipcRenderer.invoke('scrape-products', config),
    onScrapingProgress: (callback) => ipcRenderer.on('scraping-progress', callback),
    onScrapingComplete: (callback) => ipcRenderer.on('scraping-complete', callback),
    onScrapingError: (callback) => ipcRenderer.on('scraping-error', callback)
});

4. Renderer Process (renderer.js)

let isScrapingActive = false;
let currentScrapeConfig = null;

async function startScraping() {
    if (isScrapingActive) return;

    const shopifyUrl = document.getElementById('shopifyUrl').value.trim();
    const urlList = document.getElementById('urlList').value.trim();
    const outputFormat = document.getElementById('outputFormat').value;
    const delay = parseInt(document.getElementById('delay').value) * 1000;

    if (!shopifyUrl && !urlList) {
        alert('Please enter either a Shopify store URL or a list of product URLs');
        return;
    }

    try {
        const outputPath = await window.electronAPI.saveFileDialog();
        if (!outputPath) return;

        isScrapingActive = true;
        document.querySelector('.btn-primary').textContent = 'Scraping...';
        document.getElementById('progressContainer').style.display = 'block';
        document.getElementById('logContainer').style.display = 'block';

        currentScrapeConfig = {
            shopifyUrl,
            urlList: urlList ? urlList.split('\n').filter(url => url.trim()) : [],
            outputFormat,
            outputPath,
            delay
        };

        // Start the scraping process
        const result = await window.electronAPI.scrapeProducts(currentScrapeConfig);

        if (result.success) {
            logMessage(`Scraping completed! Saved ${result.count} products to ${result.outputPath}`, 'success');
        }

    } catch (error) {
        logMessage(`Error: ${error.message}`, 'error');
    } finally {
        isScrapingActive = false;
        document.querySelector('.btn-primary').textContent = 'Start Scraping';
    }
}

function stopScraping() {
    if (isScrapingActive) {
        isScrapingActive = false;
        logMessage('Scraping stopped by user', 'info');
    }
}

function updateProgress(current, total) {
    const percentage = Math.round((current / total) * 100);
    document.getElementById('progressFill').style.width = `${percentage}%`;
    document.getElementById('progressText').textContent = `Processing ${current} of ${total} (${percentage}%)`;
}

function logMessage(message, type = 'info') {
    const logEntries = document.getElementById('logEntries');
    const entry = document.createElement('div');
    entry.className = `log-entry log-${type}`;
    entry.textContent = `${new Date().toLocaleTimeString()} - ${message}`;
    logEntries.appendChild(entry);
    logEntries.scrollTop = logEntries.scrollHeight;
}

// Listen for scraping events
window.electronAPI.onScrapingProgress((event, data) => {
    updateProgress(data.current, data.total);
    logMessage(data.message, 'info');
});

window.electronAPI.onScrapingComplete((event, data) => {
    logMessage(`Scraping completed successfully! ${data.count} products saved.`, 'success');
});

window.electronAPI.onScrapingError((event, data) => {
    logMessage(`Error: ${data.message}`, 'error');
});

5. Scraping Logic (scraper.js)

const axios = require('axios');
const cheerio = require('cheerio');
const createCsvWriter = require('csv-writer').createObjectCsvWriter;
const fs = require('fs').promises;
const { URL } = require('url');

class ShopifyScraper {
    constructor(config) {
        this.config = config;
        this.products = [];
        this.userAgent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36';
    }

    async scrapeStore() {
        try {
            if (this.config.shopifyUrl) {
                await this.scrapeFromStoreUrl();
            }

            if (this.config.urlList && this.config.urlList.length > 0) {
                await this.scrapeFromUrlList();
            }

            return await this.saveResults();
        } catch (error) {
            throw new Error(`Scraping failed: ${error.message}`);
        }
    }

    async scrapeFromStoreUrl() {
        const baseUrl = this.normalizeUrl(this.config.shopifyUrl);

        // Try different approaches to get products
        const endpoints = [
            '/products.json',
            '/collections/all/products.json',
            '/sitemap_products_1.xml'
        ];

        for (const endpoint of endpoints) {
            try {
                const url = baseUrl + endpoint;

                if (endpoint.endsWith('.xml')) {
                    await this.scrapeFromSitemap(url);
                } else {
                    await this.scrapeFromProductsJson(url);
                }

                if (this.products.length > 0) break;
            } catch (error) {
                console.log(`Failed to scrape from ${endpoint}: ${error.message}`);
                continue;
            }
        }
    }

    async scrapeFromProductsJson(url) {
        const response = await axios.get(url, {
            headers: { 'User-Agent': this.userAgent },
            timeout: 10000
        });

        const data = response.data;

        if (data.products) {
            for (const product of data.products) {
                this.products.push(this.extractProductData(product));
                await this.delay();
            }
        }
    }

    async scrapeFromSitemap(url) {
        const response = await axios.get(url, {
            headers: { 'User-Agent': this.userAgent },
            timeout: 10000
        });

        const $ = cheerio.load(response.data, { xmlMode: true });
        const productUrls = [];

        $('url loc').each((i, elem) => {
            const loc = $(elem).text();
            if (loc.includes('/products/')) {
                productUrls.push(loc);
            }
        });

        // Limit to first 100 products for demo
        const limitedUrls = productUrls.slice(0, 100);

        for (let i = 0; i < limitedUrls.length; i++) {
            try {
                await this.scrapeProductPage(limitedUrls[i]);
                this.notifyProgress(i + 1, limitedUrls.length, `Scraped product ${i + 1}`);
                await this.delay();
            } catch (error) {
                console.log(`Failed to scrape ${limitedUrls[i]}: ${error.message}`);
            }
        }
    }

    async scrapeFromUrlList() {
        for (let i = 0; i < this.config.urlList.length; i++) {
            const url = this.config.urlList[i].trim();
            if (!url) continue;

            try {
                await this.scrapeProductPage(url);
                this.notifyProgress(i + 1, this.config.urlList.length, `Scraped ${url}`);
                await this.delay();
            } catch (error) {
                console.log(`Failed to scrape ${url}: ${error.message}`);
            }
        }
    }

    async scrapeProductPage(url) {
        // Try to get JSON data first (faster)
        const productHandle = this.extractProductHandle(url);
        if (productHandle) {
            const jsonUrl = url.replace(/\/products\/.*/, `/products/${productHandle}.json`);

            try {
                const response = await axios.get(jsonUrl, {
                    headers: { 'User-Agent': this.userAgent },
                    timeout: 10000
                });

                if (response.data.product) {
                    this.products.push(this.extractProductData(response.data.product));
                    return;
                }
            } catch (error) {
                // Fall back to HTML scraping
            }
        }

        // Fallback: scrape HTML page
        await this.scrapeProductHtml(url);
    }

    async scrapeProductHtml(url) {
        const response = await axios.get(url, {
            headers: { 'User-Agent': this.userAgent },
            timeout: 10000
        });

        const $ = cheerio.load(response.data);

        // Extract product data from HTML
        const product = {
            title: this.extractText($, 'h1, .product-title, [data-product-title]'),
            price: this.extractPrice($),
            description: this.extractText($, '.product-description, .description, .product-content'),
            images: this.extractImages($, url),
            availability: this.extractAvailability($),
            url: url,
            sku: this.extractText($, '[data-sku], .sku'),
            vendor: this.extractText($, '.vendor, [data-vendor]'),
            tags: this.extractTags($),
            variants: this.extractVariants($)
        };

        this.products.push(product);
    }

    extractProductData(shopifyProduct) {
        return {
            id: shopifyProduct.id,
            title: shopifyProduct.title,
            description: shopifyProduct.body_html?.replace(/<[^>]*>/g, '').substring(0, 500),
            vendor: shopifyProduct.vendor,
            product_type: shopifyProduct.product_type,
            handle: shopifyProduct.handle,
            created_at: shopifyProduct.created_at,
            updated_at: shopifyProduct.updated_at,
            price: shopifyProduct.variants?.[0]?.price || 'N/A',
            compare_at_price: shopifyProduct.variants?.[0]?.compare_at_price,
            sku: shopifyProduct.variants?.[0]?.sku,
            inventory_quantity: shopifyProduct.variants?.[0]?.inventory_quantity,
            availability: shopifyProduct.variants?.[0]?.available ? 'In Stock' : 'Out of Stock',
            images: shopifyProduct.images?.map(img => img.src).join(', ') || '',
            tags: shopifyProduct.tags?.join(', ') || '',
            url: `${this.config.shopifyUrl}/products/${shopifyProduct.handle}`,
            variants_count: shopifyProduct.variants?.length || 0,
            options: shopifyProduct.options?.map(opt => `${opt.name}: ${opt.values.join(', ')}`).join(' | ') || ''
        };
    }

    extractText($, selector) {
        return $(selector).first().text()?.trim() || '';
    }

    extractPrice($) {
        const priceSelectors = [
            '.price, .product-price',
            '[data-price]',
            '.money',
            '.current-price',
            '.sale-price'
        ];

        for (const selector of priceSelectors) {
            const price = $(selector).first().text()?.trim();
            if (price) {
                return price.replace(/[^\d.,]/g, '');
            }
        }

        return 'N/A';
    }

    extractImages($, baseUrl) {
        const images = [];
        const baseUrlObj = new URL(baseUrl);

        $('img').each((i, elem) => {
            let src = $(elem).attr('src') || $(elem).attr('data-src');
            if (src) {
                if (src.startsWith('//')) {
                    src = 'https:' + src;
                } else if (src.startsWith('/')) {
                    src = baseUrlObj.origin + src;
                }
                images.push(src);
            }
        });

        return images.slice(0, 5).join(', '); // Limit to 5 images
    }

    extractAvailability($) {
        const availabilitySelectors = [
            '.availability, .stock-status',
            '[data-availability]',
            '.in-stock, .out-of-stock'
        ];

        for (const selector of availabilitySelectors) {
            const availability = $(selector).first().text()?.trim();
            if (availability) {
                return availability.toLowerCase().includes('out') ? 'Out of Stock' : 'In Stock';
            }
        }

        return 'Unknown';
    }

    extractTags($) {
        const tags = [];
        $('.tags a, .product-tags a').each((i, elem) => {
            const tag = $(elem).text()?.trim();
            if (tag) tags.push(tag);
        });
        return tags.join(', ');
    }

    extractVariants($) {
        const variants = [];
        $('.variant-option, .product-variant').each((i, elem) => {
            const variant = $(elem).text()?.trim();
            if (variant) variants.push(variant);
        });
        return variants.join(', ');
    }

    extractProductHandle(url) {
        const match = url.match(/\/products\/([^\/\?]+)/);
        return match ? match[1] : null;
    }

    normalizeUrl(url) {
        if (!url.startsWith('http')) {
            url = 'https://' + url;
        }
        return url.replace(/\/$/, '');
    }

    async delay() {
        if (this.config.delay) {
            await new Promise(resolve => setTimeout(resolve, this.config.delay));
        }
    }

    notifyProgress(current, total, message) {
        // This would be implemented to communicate with the main process
        console.log(`Progress: ${current}/${total} - ${message}`);
    }

    async saveResults() {
        if (this.products.length === 0) {
            throw new Error('No products found to save');
        }

        if (this.config.outputFormat === 'csv') {
            return await this.saveToCsv();
        } else {
            return await this.saveToJson();
        }
    }

    async saveToCsv() {
        const csvWriter = createCsvWriter({
            path: this.config.outputPath,
            header: [
                { id: 'title', title: 'Title' },
                { id: 'price', title: 'Price' },
                { id: 'description', title: 'Description' },
                { id: 'vendor', title: 'Vendor' },
                { id: 'product_type', title: 'Product Type' },
                { id: 'availability', title: 'Availability' },
                { id: 'sku', title: 'SKU' },
                { id: 'tags', title: 'Tags' },
                { id: 'images', title: 'Images' },
                { id: 'url', title: 'URL' },
                { id: 'variants_count', title: 'Variants Count' },
                { id: 'options', title: 'Options' }
            ]
        });

        await csvWriter.writeRecords(this.products);

        return {
            success: true,
            count: this.products.length,
            outputPath: this.config.outputPath
        };
    }

    async saveToJson() {
        await fs.writeFile(this.config.outputPath, JSON.stringify(this.products, null, 2));

        return {
            success: true,
            count: this.products.length,
            outputPath: this.config.outputPath
        };
    }
}

module.exports = ShopifyScraper;

6. Update package.json

{
    "name": "shopify-scraper-electron",
    "version": "1.0.0",
    "description": "A free Shopify products scraper built with ElectronJS",
    "main": "main.js",
    "scripts": {
        "start": "electron .",
        "dev": "electron . --enable-logging",
        "build": "electron-builder",
        "build-win": "electron-builder --win",
        "build-mac": "electron-builder --mac",
        "build-linux": "electron-builder --linux"
    },
    "keywords": ["electron", "shopify", "scraper", "ecommerce"],
    "author": "Your Name",
    "license": "MIT",
    "devDependencies": {
        "electron": "^latest",
        "electron-builder": "^latest"
    },
    "dependencies": {
        "axios": "^1.6.0",
        "cheerio": "^1.0.0-rc.12",
        "csv-writer": "^1.6.0"
    }
}

Advanced Features

1. Proxy Support

Add proxy rotation to avoid IP blocking:

// In scraper.js
const HttpsProxyAgent = require('https-proxy-agent');

class ShopifyScraper {
    constructor(config) {
        this.config = config;
        this.proxies = config.proxies || [];
        this.currentProxyIndex = 0;
    }

    getAxiosConfig() {
        const config = {
            headers: { 'User-Agent': this.userAgent },
            timeout: 10000
        };

        if (this.proxies.length > 0) {
            const proxy = this.proxies[this.currentProxyIndex];
            config.httpsAgent = new HttpsProxyAgent(proxy);
            this.currentProxyIndex = (this.currentProxyIndex + 1) % this.proxies.length;
        }

        return config;
    }
}

2. Rate Limiting & Retry Logic

async makeRequest(url, retries = 3) {
    for (let i = 0; i < retries; i++) {
        try {
            const response = await axios.get(url, this.getAxiosConfig());
            return response;
        } catch (error) {
            if (i === retries - 1) throw error;

            // Exponential backoff
            const delay = Math.pow(2, i) * 1000;
            await new Promise(resolve => setTimeout(resolve, delay));
        }
    }
}

3. Data Validation & Cleaning


javascript
validateProduct(product) {
    return {
        ...product,
        title: product.title?.substring(0, 255) || 'Untitled',
        price: this.cleanPrice(product.price),
        description: this.cleanDescription(product.description)
    };
}

cleanPrice(price) {
    if (!price) return 'N/A';
    const cleaned = price.toString().replace(/[^\d.,]/g, '');
    return cleaned || 'N/A';
}

cleanDescription(description) {
    if (!description) return '';
    return description
        .replace(/<[^>]*>/g, '') // Remove HTML tags
        .replace(/\s+/g, ' ') // Normalize whitespace
        .trim()
        .substring(0, 1000);


This content originally appeared on DEV Community and was authored by StartupNoon 💰 🚀