This content originally appeared on DEV Community and was authored by Muhibbudin Suretno
Hey DEV community!
Like many of you, I have a habit—some might call it an obsession—of keeping up with the latest technology. My evenings are often spent browsing GitHub Trending, watching “new tool” roundups on YouTube, and checking out what’s new on Hugging Face.
The thrill of discovery is great, but it always leads to the same bottleneck: a mountain of README files. The real challenge isn’t finding new projects; it’s evaluating them quickly. I constantly ask myself:
- How steep is the learning curve?
- Is it practical for a real-world project?
- How much of a headache will it be to deploy?
These questions aren’t answered by star counts alone. I needed deeper insights, but I didn’t have the time to investigate every single interesting repo.
So, winding down last Friday evening here in Indonesia, I decided to finally build the solution I’d been dreaming of. I call it Spy.
What is Spy?
Simply put, Spy is an automated curator and analyst for open-source projects. It fetches the latest trending projects and then uses a pipeline of tools and AI models to enrich the data, giving you a summary that feels like a mini-blog post, complete with an analysis of how easy it is to learn, use, and deploy.
The Tech Stack: How It’s Made
This project was a blast to build because it combines scraping, automation, and the magic of modern LLMs. Here’s a look under the hood.
The Data Sources
The foundation is a curated list of sources. Right now, it’s GitHub Trending and Hugging Face Papers. I use simple HTTP requests in a scheduled job to scrape the list of new projects. For papers, I specifically filter for those that have an associated GitHub repository.
The Automation Engine
The entire process is orchestrated by n8n. If you haven’t used it, it’s a fantastic workflow automation tool that acts as the central nervous system for this project. My n8n workflow looks something like this:
- Trigger: A cron job kicks off the workflow daily.
- Fetch: It runs the scraping scripts to get the latest lists.
- Enrich: For each repository, it makes a call to the GitHub API to pull down crucial metadata: star/fork counts, languages, topics, and most importantly, the full README.md content.
- Process: It then calls out to other services for the AI magic and image processing.
Using n8n saved me from writing a ton of boilerplate backend code for job scheduling and data pipelining.
The Brains
This is where it gets really fun. I use a combination of models via OpenRouter and OpenAI to process the README content.
First, I generate a blog-style narrative. I feed the README content to a model like Llama 3 or GPT-4.1 with a prompt similar to this:
As a tech blogger, analyze the following README content. Write a compelling, blog-style excerpt and a longer content body that explains what this project is, who it's for, and its key features. Maintain a knowledgeable yet approachable tone.
README Content:
"""
[...the full README markdown...]
"""
Next, and most importantly, I ask the AI to perform the analysis I used to do manually.
Based on the README, documentation links, and project description, evaluate this open-source project on the following criteria. Provide a short, one-sentence summary for each.
1. **Easy to Learn:** (Does it have good docs, simple APIs, clear examples?)
2. **Easy to Use:** (Is the setup straightforward? Can a developer be productive quickly?)
3. **Easy to Deploy:** (Does it mention Docker, Vercel, or have clear deployment instructions?)
README Content:
"""
[...the full README markdown...]
"""
The results are sometimes surprisingly accurate and provide that at-a-glance insight I was always missing.
The Polish
A wall of text is boring. To make the site more engaging, the workflow also:
- Extracts image URLs from the README markdown.
- Uses a screenshot service to capture an image of the project’s homepage if one is listed.
- Saves all this structured data into my database.
What’s Next
Of course, it’s not perfect. The biggest challenge is handling “messy” repos. I have to discard some trending projects if they don’t have a homepage, any images, or if the repo link is dead. Quality over quantity.
I’m hoping to add more data sources in the future and refine the AI prompts to provide even more nuanced insights.
I’d Love Your Feedback!
This has been an incredibly rewarding personal project, and I’d be thrilled if you checked it out at spy.hibuno.com.
- As developers, I’m especially keen to hear your thoughts:
- How would you approach a task like this? Any tools I should have used instead?
- What other data points from the GitHub API would be valuable for analysis?
- Any ideas for im proving the AI prompts for more accurate summaries?
Thanks for reading about this project! Let me know what you think in the comments.
This content originally appeared on DEV Community and was authored by Muhibbudin Suretno