This content originally appeared on DEV Community and was authored by bikash119
This is a submission for the AI Agents Challenge powered by n8n and Bright Data
What I Built
I created an AI-powered assistant to streamline fundamental analysis for stock market investment ( India ). Investors typically spend hours reviewing annual reports, credit ratings, quarterly results, and transcripts to make investment decisions—a process that’s slow and overwhelming.
My solution uses Large Language Models (LLMs) to analyze and summarize these documents, enabling users to ask questions and instantly receive targeted, contextual answers. This tool dramatically accelerates research, reduces manual effort, and helps analysts focus on insights, not information overload.
Demo
n8n Workflow
Equity Fundamental Octo Researcher N8N Workflow
Technical Implementation
Overview
The Equity Fundamental Octo Researcher workflow automates the extraction, processing, and retrieval of fundamental stock data and reports from Screener.in. It enables interactive fundamental analysis by combining web scraping, automated document parsing (using docling-project by IBM), vector storage (Pinecone), and leading LLMs for Q&A.
Architecture/Components
- Form Trigger: Accepts Screener.in company URLs from the user.
- BrightData HTTP Nodes: Automate scraping of key disclosure documents (annual reports, transcripts, presentations, etc.).
- Intermediary Storage & Extraction: Handles job tracking and discovery of report/document URLs.
- Docling OSS: All document parsing, text extraction, and conversion (from PDF, HTML, image, etc.) are performed through the open-source docling-project library, orchestrated via a custom Gradio API deployed to AWS ECS.
- Async/Task Management: Manages and monitors asynchronous parsing with job/event IDs.
- Embedding and Vector Database: Processes parsed content into OpenAI embeddings and loads into Pinecone for fast semantic search.
- LLM-Powered Q&A Agent: Uses Anthropic Claude for question/answer generation, with context retrieved from Pinecone.
- n8n Chat Trigger: Interface for handling user questions about the company or its documents.
Execution Flow
- User Input User submits a company URL from Screener.in.
- Web Scraping with BrightData The workflow triggers a BrightData scraper run to find and collect links to all relevant fundamental documents.
- Extract and Track Results Extraction nodes gather the discovered URLs and manage job progress.
- Document Processing (Docling OSS) Every collected URL is processed by docling (invoked via a Gradio-based API deployed to AWS). docling-project extracts, parses, and converts source documents—including PDFs, scanned images, web pages—into clean markdown/text suitable for downstream NLP.
- Async Task Management The workflow creates, tracks, and retrieves outputs from asynchronous Docling processing jobs using event/task IDs.
- Text Embeddings and Storage Extracted text is transformed into embeddings using OpenAI and loaded into Pinecone for retrieval-augmented analysis.
- Interactive Chat/Q&A The user interacts via chat; queries leverage the Anthropic Claude LLM plus context pulled via semantic search from Pinecone vector store.
Integrations
BrightData API: Web scraping for financial documents.
n8n: Workflow automation and orchestration.
Docling OSS (deployed to AWS ECS): Core document conversion and extraction.
OpenAI API: Embedding generation.
Pinecone: Vector database.
Anthropic Claude API: Question-answering language model.
Usage Instructions
Deploy the workflow with all service credentials and dependencies configured.
Expose the webform to users for them to submit Screener.in URLs.
Scraping, parsing (via Docling OSS), and embedding are automated in the pipeline.
Users chat with the agent to ask analysis or document-grounded questions.
Extend as needed—add new endpoints, sources, or Docling capabilities for more coverage.
Bright Data Verified Node
There was no scraper available for https://screener.in that could reliably capture all relevant document links (annual reports, con call transcripts, quarterly reports, investor presentations, etc.). To overcome this, I built a custom collector from scratch using the BrightData IDE. You can find the scraper here
Challenges
Building a Custom Scraper for Screener.in
There was no ready-made scraper available for Screener.in that could reliably capture all relevant document links (annual reports, con call transcripts, quarterly reports, investor presentations, etc.). To overcome this, I built a custom collector from scratch using the BrightData IDE. You can find the scraper hereSelf-Hosting
docling-project
OSS as a Cloud Service
docling
, which handles document parsing and conversion, is not available as a hosted SaaS. To make it production-ready, converted Docling’s docker-compose deployment into an AWS ECS task, set up networking with a load balancer, and ensured stable API access from n8n. This required container orchestration, secure environment set-up, persistent storage, and custom integration so that document parsing at scale could be achieved in a robust and maintainable way.
Learnings
- Use Brightdata and scrape / collect whatever you are interested in.
- A ton about AWS networking, IAM , ECS & LoadBalancers.
- I have just scratched the surface of docling-project.
This content originally appeared on DEV Community and was authored by bikash119