Build Agentic Video RAG with Strands Agents and Amazon Aurora PostgreSQL – Local Infrastructure



This content originally appeared on DEV Community and was authored by Elizabeth Fuentes L

🇻🇪🇨🇱 Dev.to Linkedin GitHub Twitter Instagram Youtube
Linktr

Link to the app 06_video_embeddings_with_strands_enhanced.ipynb / ⭐ Star this repository

This post continues the series that began with “Building a RAG System for Video Content Search and Analysis.” In the previous post, you learned how to build a foundational video RAG system using Amazon Bedrock, Amazon Transcribe, and Amazon Aurora PostgreSQL.

This post shows you how to transform existing code into intelligent agent tools using the Strands Agents framework. When you have Python code working perfectly, you can transform it into autonomous AI agent tools withStrands Agents.

🤖 Agent Architecture

1. Video Analysis Agent

Prerequisites: ⚠⚠⚠⚠ Create Amazon Aurora PostgreSQL with this Amazon CDK Stack. Follow steps in 05_create_audio_video_embeddings.ipynb ⚠⚠⚠⚠

  • Purpose: Processes and searches video content globally
  • Capabilities: Analyzes visual frames, transcribed audio, technical content
  • Tools: video_embedding_local for multimodal video search
  • Use Case: Technical content analysis, finding specific moments in videos

2. Memory-Enhanced Agent

Prerequisites: ⚠⚠⚠⚠ Create Amazon Aurora PostgreSQL with this Amazon CDK Stack. Follow steps in 05_create_audio_video_embeddings.ipynb and create an Amazon S3 verctor bucket that will serve as the backend for your vector memory. ⚠⚠⚠⚠

  • Purpose: Provides personalized, context-aware video analysis
  • Capabilities: Remembers user preferences, learns from interactions, provides tailored responses
  • Tools: video_embedding_local + s3_vector_memory for persistent user context
  • Use Case: Personalized learning experiences, adaptive content recommendations

The Magic of the @tool Decorator

Strands Agents transforms your existing video processing code into agent tools. Here’s an example of existing video processing code that becomes an agent tool:

from strands import tool

@tool
def video_embedding_local(
    video_path: str,
    user_id: str = "default_user",
    action: str = "process",
    query: Optional[str] = None,
    similarity_threshold: float = 0.8,
    frames_per_second: int = 1,
    region: str = None,
) -> Dict[str, Any]:
    """
    Simple video embedding processor following notebook 05 pattern exactly.

    Args:
        video_path: Path to video file (local)
        user_id: User identifier for data isolation
        action: Action to perform ('process', 'search', 'list')
        query: Search query for retrieval (when action is 'search')
        similarity_threshold: Threshold for frame similarity comparison (0.0-1.0)
        frames_per_second: Frames to extract per second
        region: AWS region

    Returns:
        Dictionary with processing results
    """
    try:
        # Complex video processing logic here...
        cluster_arn = os.getenv('AURORA_CLUSTER_ARN')
        secret_arn = os.getenv('AURORA_SECRET_ARN')
        database_name = os.getenv('AURORA_DATABASE_NAME', 'kbdata')

        aurora = AuroraPostgres(cluster_arn, database_name, secret_arn, region)

        if action == "search":
            return _search_videos(query, aurora, region)
        elif action == "list":
            return _list_videos(user_id, aurora)
        else:
            return _process_video(video_path, user_id, similarity_threshold, 
                                frames_per_second, s3_bucket, aurora, region)

    except Exception as e:
        return {
            "status": "error",
            "message": f"Error: {str(e)}",
            "error_type": type(e).__name__
        }

That’s it! By adding @tool and a descriptive docstring, the complex logic for video processing, frame extraction, embedding generation, and Amazon Aurora PostgreSQL storage becomes intelligently usable by agents.

Behind this simple decorator, Strands Agents handles:

  • ✅ Tool registration and discovery
  • ✅ Input validation and error handling
  • ✅ Result formatting and response generation
  • ✅ Conversation state management
  • ✅ Multi-step reasoning coordination

Persistent Memory with S3 Vectors

Here’s another powerful example from my blog Building Scalable Multi-Modal AI Agents with Strands Agents and Amazon S3 Vectors – a tool for persistent agent memory using Amazon S3 Vectors:

@tool
def s3_vector_memory(
    action: str,
    content: str = None,
    query: str = None,
    user_id: str = None,
    vector_bucket_name: str = None,
    index_name: str = None,
    top_k: int = 20,
    min_score: float = 0.1
) -> Dict:
    """
    AWS-native memory management using Amazon S3 Vectors.

    Actions:
    - store: Store new memory content
    - retrieve: Search and retrieve relevant memories
    - list: List all user memories
    """
    if not user_id:
        return {"status": "error", "message": "user_id is required for memory isolation"}

    try:
        config = {
            "bucket_name": vector_bucket_name or os.environ.get('VECTOR_BUCKET_NAME'),
            "index_name": index_name or os.environ.get('VECTOR_INDEX_NAME'),
            "region": region_name or os.environ.get('AWS_REGION', 'us-east-1'),
            "model_id": embedding_model or os.environ.get('EMBEDDING_MODEL')
        }

        bedrock = boto3.client("bedrock-runtime", region_name=config["region"])
        s3vectors = boto3.client("s3vectors", region_name=config["region"])

        if action == "store":
            return _store_memory(s3vectors, bedrock, config, content, user_id)
        elif action == "retrieve":
            return _retrieve_memories(s3vectors, bedrock, config, query, user_id, top_k, min_score)
        elif action == "list":
            return _list_memories(s3vectors, bedrock, config, user_id, top_k)

    except Exception as e:
        return {"status": "error", "message": str(e)}

This tool handles user isolation, semantic search, and persistent memory – all with a simple interface the agent understands perfectly.

Results Visualization Tool

Even specialized display functions become powerful tools:

@tool
def display_video_images(
    search_results: List[Dict[str, Any]],
    region: str = None,
    base_path: str = "images/"
) -> Dict[str, Any]:
    """
    Display images from video search results.

    Args:
        search_results: List of search results from video_embeddings_aws
        region: AWS region for S3 client
        base_path: Local path to save downloaded images
    """
    try:
        os.makedirs(base_path, exist_ok=True)
        s3_client = boto3.client('s3', region_name=region)

        displayed_count = 0
        text_count = 0

        for result in search_results:
            metadata = result.get('metadata', {})
            content_type = metadata.get('content_type', 'unknown')

            if content_type == "text":
                text_count += 1
                print(f"📝 Text Result: {result.get('content_preview', '')}")
            elif content_type == "image":
                displayed_count += 1
                # Download and display logic...

        return {
            "status": "success",
            "images_displayed": displayed_count,
            "text_results": text_count,
            "total_processed": len(search_results)
        }

    except Exception as e:
        return {"status": "error", "message": f"Failed to display images: {str(e)}"}

Creating Agents with Strands Agents

With your tools ready, creating agents is straightforward:

from strands import Agent
from strands.models import BedrockModel

model = BedrockModel(model_id="us.anthropic.claude-3-5-sonnet-20241022-v2:0")

VIDEO_SYSTEM_PROMPT = """You are a video processing AI assistant.

Available actions:
- process: Upload and process videos 
- search: Search video content using semantic similarity
- list: List all processed videos

Use video_embeddings_aws for all cloud video operations.
Use display_video_images to show search results.
"""

Total: 6 lines of new code for a production-ready AI agent.

🎯 Model Configuration Options

Strands supports multiple model configuration approaches:

Option 1: Default Configuration

from strands import Agent
agent = Agent()  # Uses Claude 4 Sonnet by default

Option 2: Specify Model ID

agent = Agent(model="anthropic.claude-sonnet-4-20250514-v1:0")

Option 3: BedrockModel (Recommended)

from strands.models import BedrockModel

model = BedrockModel(
    model_id="anthropic.claude-sonnet-4-20250514-v1:0",
    temperature=0.3,
    top_p=0.8
)
agent = Agent(model=model)

Option 4: Anthropic Direct

from strands.models.anthropic import AnthropicModel

model = AnthropicModel(
    model_id="claude-sonnet-4-20250514",
    max_tokens=1028,
    params={"temperature": 0.7}
)

You can also use other model providers:

video_agent = Agent(
    model=model, 
    tools=[video_embedding_local, display_video_images],
    system_prompt=VIDEO_SYSTEM_PROMPT
)
  • Memory-enhanced agent
memory_agent = Agent(
    model=model, 
    tools=[video_embedding_local, s3_vector_memory, display_video_images],
    system_prompt=VIDEO_SYSTEM_PROMPT
)

The repository demonstrates two distinct agent architectures:

Video Analysis Agent

  • Purpose: Processes and searches video content globally
  • Capabilities: Analyzes visual frames, transcribed audio, technical content
  • Tools: video_embedding_local for multimodal video search

Memory-Enhanced Agent

  • Purpose: Provides personalized, context-aware video analysis
  • Capabilities: Remembers user preferences, learns from interactions
  • Tools: video_embedding_local + s3_vector_memory for persistent context

Natural Agent Interactions

The real power emerges in natural conversations:

  • Simple processing
response = video_agent(f"What is the video about in {VIDEO_PATH}?")

Once the video is processed, the agent proceeds to perform the analysis.

  • Memory-enhanced interaction
response = memory_agent(f"""I'm interested in learning about AI and database technologies. 
Store this preference for user {USER_ID}, then search the video in {VIDEO_PATH} for technical 
discussions about vector databases and embeddings.""")


response = memory_agent(f"What did user {USER_ID} ask before?")

Behind the scenes, the agent:

  1. Uses video_embedding_local to process the video
  2. Uses the same tool with action="search" to find similar content
  3. Uses display_video_images to show results
  4. Combines everything into a coherent response

Configuration and Performance

The system accepts flexible configuration parameters [1]:

Parameter Description Default
video_path Path to video (local or S3) Required
user_id User identifier Required
action ‘process’, ‘search’, ‘list’ ‘process’
similarity_threshold Similarity threshold (0.0-1.0) 0.8
frames_per_second Frame extraction rate 1

Performance optimization examples:

  • High precision: frames_per_second: 2, similarity_threshold: 0.7
  • Balanced: frames_per_second: 1, similarity_threshold: 0.8
  • Fast processing: frames_per_second: 0.5, similarity_threshold: 0.9

Learn More About Strands Agents

Strands Agents provides comprehensive documentation and examples to help you get started.

The agent loop documentation explains how agents process user input, make decisions, execute tools, and generate responses through an intelligent cycle of reasoning and action.

Get Started Today

Ready to transform your code into agentic tools? The complete implementation is available in the LangChain embeddings repository.

Ready to create your own Strands agent? Here are some resources:

Gracias!

🇻🇪🇨🇱 Dev.to Linkedin GitHub Twitter Instagram Youtube
Linktr


This content originally appeared on DEV Community and was authored by Elizabeth Fuentes L