This content originally appeared on DEV Community and was authored by Elizabeth Fuentes L
Dev.to Linkedin GitHub Twitter Instagram Youtube
Linktr
Link to the app 06_video_embeddings_with_strands_enhanced.ipynb /
Star this repository
This post continues the series that began with “Building a RAG System for Video Content Search and Analysis.” In the previous post, you learned how to build a foundational video RAG system using Amazon Bedrock, Amazon Transcribe, and Amazon Aurora PostgreSQL.
This post shows you how to transform existing code into intelligent agent tools using the Strands Agents framework. When you have Python code working perfectly, you can transform it into autonomous AI agent tools withStrands Agents.
Agent Architecture
1. Video Analysis Agent
Prerequisites:
Create Amazon Aurora PostgreSQL with this Amazon CDK Stack. Follow steps in 05_create_audio_video_embeddings.ipynb
- Purpose: Processes and searches video content globally
- Capabilities: Analyzes visual frames, transcribed audio, technical content
-
Tools:
video_embedding_local
for multimodal video search - Use Case: Technical content analysis, finding specific moments in videos
2. Memory-Enhanced Agent
Prerequisites:
Create Amazon Aurora PostgreSQL with this Amazon CDK Stack. Follow steps in 05_create_audio_video_embeddings.ipynb and create an Amazon S3 verctor bucket that will serve as the backend for your vector memory.
- Purpose: Provides personalized, context-aware video analysis
- Capabilities: Remembers user preferences, learns from interactions, provides tailored responses
-
Tools:
video_embedding_local
+s3_vector_memory
for persistent user context - Use Case: Personalized learning experiences, adaptive content recommendations
The Magic of the @tool
Decorator
Strands Agents transforms your existing video processing code into agent tools. Here’s an example of existing video processing code that becomes an agent tool:
from strands import tool
@tool
def video_embedding_local(
video_path: str,
user_id: str = "default_user",
action: str = "process",
query: Optional[str] = None,
similarity_threshold: float = 0.8,
frames_per_second: int = 1,
region: str = None,
) -> Dict[str, Any]:
"""
Simple video embedding processor following notebook 05 pattern exactly.
Args:
video_path: Path to video file (local)
user_id: User identifier for data isolation
action: Action to perform ('process', 'search', 'list')
query: Search query for retrieval (when action is 'search')
similarity_threshold: Threshold for frame similarity comparison (0.0-1.0)
frames_per_second: Frames to extract per second
region: AWS region
Returns:
Dictionary with processing results
"""
try:
# Complex video processing logic here...
cluster_arn = os.getenv('AURORA_CLUSTER_ARN')
secret_arn = os.getenv('AURORA_SECRET_ARN')
database_name = os.getenv('AURORA_DATABASE_NAME', 'kbdata')
aurora = AuroraPostgres(cluster_arn, database_name, secret_arn, region)
if action == "search":
return _search_videos(query, aurora, region)
elif action == "list":
return _list_videos(user_id, aurora)
else:
return _process_video(video_path, user_id, similarity_threshold,
frames_per_second, s3_bucket, aurora, region)
except Exception as e:
return {
"status": "error",
"message": f"Error: {str(e)}",
"error_type": type(e).__name__
}
That’s it! By adding @tool
and a descriptive docstring, the complex logic for video processing, frame extraction, embedding generation, and Amazon Aurora PostgreSQL storage becomes intelligently usable by agents.
Behind this simple decorator, Strands Agents handles:
Tool registration and discovery
Input validation and error handling
Result formatting and response generation
Conversation state management
Multi-step reasoning coordination
Persistent Memory with S3 Vectors
Here’s another powerful example from my blog Building Scalable Multi-Modal AI Agents with Strands Agents and Amazon S3 Vectors – a tool for persistent agent memory using Amazon S3 Vectors:
@tool
def s3_vector_memory(
action: str,
content: str = None,
query: str = None,
user_id: str = None,
vector_bucket_name: str = None,
index_name: str = None,
top_k: int = 20,
min_score: float = 0.1
) -> Dict:
"""
AWS-native memory management using Amazon S3 Vectors.
Actions:
- store: Store new memory content
- retrieve: Search and retrieve relevant memories
- list: List all user memories
"""
if not user_id:
return {"status": "error", "message": "user_id is required for memory isolation"}
try:
config = {
"bucket_name": vector_bucket_name or os.environ.get('VECTOR_BUCKET_NAME'),
"index_name": index_name or os.environ.get('VECTOR_INDEX_NAME'),
"region": region_name or os.environ.get('AWS_REGION', 'us-east-1'),
"model_id": embedding_model or os.environ.get('EMBEDDING_MODEL')
}
bedrock = boto3.client("bedrock-runtime", region_name=config["region"])
s3vectors = boto3.client("s3vectors", region_name=config["region"])
if action == "store":
return _store_memory(s3vectors, bedrock, config, content, user_id)
elif action == "retrieve":
return _retrieve_memories(s3vectors, bedrock, config, query, user_id, top_k, min_score)
elif action == "list":
return _list_memories(s3vectors, bedrock, config, user_id, top_k)
except Exception as e:
return {"status": "error", "message": str(e)}
This tool handles user isolation, semantic search, and persistent memory – all with a simple interface the agent understands perfectly.
Results Visualization Tool
Even specialized display functions become powerful tools:
@tool
def display_video_images(
search_results: List[Dict[str, Any]],
region: str = None,
base_path: str = "images/"
) -> Dict[str, Any]:
"""
Display images from video search results.
Args:
search_results: List of search results from video_embeddings_aws
region: AWS region for S3 client
base_path: Local path to save downloaded images
"""
try:
os.makedirs(base_path, exist_ok=True)
s3_client = boto3.client('s3', region_name=region)
displayed_count = 0
text_count = 0
for result in search_results:
metadata = result.get('metadata', {})
content_type = metadata.get('content_type', 'unknown')
if content_type == "text":
text_count += 1
print(f"📝 Text Result: {result.get('content_preview', '')}")
elif content_type == "image":
displayed_count += 1
# Download and display logic...
return {
"status": "success",
"images_displayed": displayed_count,
"text_results": text_count,
"total_processed": len(search_results)
}
except Exception as e:
return {"status": "error", "message": f"Failed to display images: {str(e)}"}
Creating Agents with Strands Agents
With your tools ready, creating agents is straightforward:
from strands import Agent
from strands.models import BedrockModel
model = BedrockModel(model_id="us.anthropic.claude-3-5-sonnet-20241022-v2:0")
VIDEO_SYSTEM_PROMPT = """You are a video processing AI assistant.
Available actions:
- process: Upload and process videos
- search: Search video content using semantic similarity
- list: List all processed videos
Use video_embeddings_aws for all cloud video operations.
Use display_video_images to show search results.
"""
Total: 6 lines of new code for a production-ready AI agent.
Model Configuration Options
Strands supports multiple model configuration approaches:
Option 1: Default Configuration
from strands import Agent
agent = Agent() # Uses Claude 4 Sonnet by default
Option 2: Specify Model ID
agent = Agent(model="anthropic.claude-sonnet-4-20250514-v1:0")
Option 3: BedrockModel (Recommended)
from strands.models import BedrockModel
model = BedrockModel(
model_id="anthropic.claude-sonnet-4-20250514-v1:0",
temperature=0.3,
top_p=0.8
)
agent = Agent(model=model)
Option 4: Anthropic Direct
from strands.models.anthropic import AnthropicModel
model = AnthropicModel(
model_id="claude-sonnet-4-20250514",
max_tokens=1028,
params={"temperature": 0.7}
)
You can also use other model providers:
video_agent = Agent(
model=model,
tools=[video_embedding_local, display_video_images],
system_prompt=VIDEO_SYSTEM_PROMPT
)
- Memory-enhanced agent
memory_agent = Agent(
model=model,
tools=[video_embedding_local, s3_vector_memory, display_video_images],
system_prompt=VIDEO_SYSTEM_PROMPT
)
The repository demonstrates two distinct agent architectures:
Video Analysis Agent
- Purpose: Processes and searches video content globally
- Capabilities: Analyzes visual frames, transcribed audio, technical content
-
Tools:
video_embedding_local
for multimodal video search
Memory-Enhanced Agent
- Purpose: Provides personalized, context-aware video analysis
- Capabilities: Remembers user preferences, learns from interactions
-
Tools:
video_embedding_local
+s3_vector_memory
for persistent context
Natural Agent Interactions
The real power emerges in natural conversations:
- Simple processing
response = video_agent(f"What is the video about in {VIDEO_PATH}?")
Once the video is processed, the agent proceeds to perform the analysis.
- Memory-enhanced interaction
response = memory_agent(f"""I'm interested in learning about AI and database technologies.
Store this preference for user {USER_ID}, then search the video in {VIDEO_PATH} for technical
discussions about vector databases and embeddings.""")
response = memory_agent(f"What did user {USER_ID} ask before?")
Behind the scenes, the agent:
- Uses
video_embedding_local
to process the video - Uses the same tool with
action="search"
to find similar content - Uses
display_video_images
to show results - Combines everything into a coherent response
Configuration and Performance
The system accepts flexible configuration parameters [1]:
Parameter | Description | Default |
---|---|---|
video_path |
Path to video (local or S3) | Required |
user_id |
User identifier | Required |
action |
‘process’, ‘search’, ‘list’ | ‘process’ |
similarity_threshold |
Similarity threshold (0.0-1.0) | 0.8 |
frames_per_second |
Frame extraction rate | 1 |
Performance optimization examples:
-
High precision:
frames_per_second: 2
,similarity_threshold: 0.7
-
Balanced:
frames_per_second: 1
,similarity_threshold: 0.8
-
Fast processing:
frames_per_second: 0.5
,similarity_threshold: 0.9
Learn More About Strands Agents
Strands Agents provides comprehensive documentation and examples to help you get started.
The agent loop documentation explains how agents process user input, make decisions, execute tools, and generate responses through an intelligent cycle of reasoning and action.
Get Started Today
Ready to transform your code into agentic tools? The complete implementation is available in the LangChain embeddings repository.
Ready to create your own Strands agent? Here are some resources:
- Previous blog serie: Multi-Modal Content Processing with Strands Agent
- Previous blog serie: Building Strands Agents with a few lines of code
- Strands Agent Framework
- Complete Code Examples
- Getting Started with Strands Agents
Gracias!
Dev.to Linkedin GitHub Twitter Instagram Youtube
Linktr
This content originally appeared on DEV Community and was authored by Elizabeth Fuentes L