This content originally appeared on DEV Community and was authored by Elizabeth Fuentes L
Dev.to Linkedin GitHub Twitter Instagram Youtube
Linktr
Link to the app container-video-embeddings /
Star this repository
In this second part of the series, you’ll learn how to implement a containerized version of Ask Your Video using AWS Step Functions for orchestration. The application processes video content in parallel streams, enabling natural language search across visual and audio elements.
In Part 1: Building a RAG System for Video Content Search and Analysis, explored implementing a RAG system using jupyter notebook . While this approach works well for prototypes and small applications, scaling presents a principal challenges: Video processing demands intensive CPU resources, especially during frame extraction and embedding generation.
To address these constraint, this blog demonstrates a containerized application that offers improved scalability and resource management. The containerized architecture provides these key benefits:
- Unlimited processing time using Amazon Elastic Container Service (Amaozon ECS).
- Consistent environment management through Docker containers .
- Robust workflow orchestration with AWS Step Functions.
This architecture creates a application for processing video content at scale.
Architecture Deep Dive
The solution uses AWS Step Functions to orchestrate a parallel workflow that processes both visual and audio content simultaneously:
Trigger: When a video is uploaded to Amazon S3, it initiates the Step Functions workflow
-
Parallel Processing Branches:
Visual Branch:- An Amazon ECS task runs a containerized FFmpeg process that extracts frames at 1 FPS
- Each frame is processed to minimize storage costs by comparing frame similarity
- Unique frames are sent to Amazon Bedrock for embedding generation Audio Branch:
Amazon Transcribe processes the audio track with speaker diarization enabled
The transcription is segmented based on speaker changes and timing
Text segments are converted to embeddings using Amazon Bedrock
3 . Convergence:A Lambda function processes both streams’ outputs
Generates final embeddings using Amazon Bedrock Titan multimodal model
Stores vectors in Amazon Aurora PostgreSQL with pgvector
Container Implementation
Step 0: Clone the GitHub repository
git clone https://github.com/build-on-aws/langchain-embeddings
cd container-video-embeddings
Set up the environment:
- Create a virtual environment:
python3 -m venv .venv
- Activate the virtual environment:
# For Linux/macOS
source .venv/bin/activate
# For Windows
.venv\Scripts\activate.bat
- Install dependencies:
pip install -r 04-retrieval/requirements.txt
Step 1: Deploy Amazon ECS Cluster for Audio/Video Embeddings Processing
This CDK project creates the foundational infrastructure for an audio and video processing application that generates embeddings from media files. The infrastructure includes:
- An Amazon ECS cluster named “video-processing”
- A VPC with public and private subnets for secure networking
- SSM parameters to store cluster and VPC information for use by other stacks
cd 01-ecs-cluster
cdk deploy
This deployment takes approximately 162 s.
Verify Deployment
After deployment, you can verify the resources in the AWS Cloudformation console:
- Check parameters in the Systems Manager Parameter Store, necesary to deploy other stacks that are part of this application:
–
/videopgvector/cluster-name
: Contains the ECS cluster name –/videopgvector/vpc-id
: Contains the VPC ID
Step 2: Deploy Amazon Aurora PostgreSQL Vector Database for Audio/Video Embeddings
This CDK project creates an Amazon Aurora PostgreSQL database with vector capabilities for storing and querying embeddings generated from audio and video files.
The infrastructure includes:
- An Aurora PostgreSQL Serverless v2 cluster with pgvector extension
- Lambda functions for database setup and management
- Security groups and IAM roles for secure access- SSM parameters to store database connection information
cd ../02-aurora-pg-vector
cdk deploy
This deployment takes approximately 594.29s.
Verify Deployment
After deployment, you can verify the resources in the AWS Cloudformation console:
Check parameters in the Systems Manager Parameter Store, necesary to deploy other stacks that are part of this application:
-
/videopgvector/cluster_arn
: Contains the Aurora cluster ARN -
/videopgvector/secret_arn
: Contains the secret ARN for database credentials -
/videopgvector/video_table_name
: Contains the table name for video embeddings
Step 3: Deploy Audio/Video processing workflow
This CDK project creates a complete workflow for processing audio and video files to generate embeddings.
The infrastructure includes:
- A Step Functions workflow that orchestrates the entire process
- Lambda functions for various processing steps
- An ECS Fargate task for video frame extraction
- Integration with Amazon Transcribe for audio transcription
- DynamoDB tables for tracking job status
- S3 bucket for storing media files and processing results
Install Docker Desktop and then:
cd ../03-audio-video-workflow
cdk deploy
This deployment takes approximately 171s.
Verify Deployment
After deployment, you can verify the resources in the AWS Cloudformation console:
Step 4: Deploy retrieval API for Audio/Video embeddings
This CDK project creates a retrieval API for searching and querying embeddings generated from audio and video files.
The infrastructure includes:
- An API Gateway REST API with Cognito authentication.
- Lambda functions for retrieval operations.
- Integration with the Aurora PostgreSQL vector database.
cd ../04-retrieval
cdk deploy
This deployment takes approximately 56.77s.
Verify Deployment
After deployment, you can verify the resources in the AWS Cloudformation console:
Check parameters in the Systems Manager Parameter Store, necesary to deploy other stacks that are part of this application:
- /videopgvector/api_retrieve: Contains the API endpoint URL
- /videopgvector/lambda_retreval_name: Contains the retrieval Lambda function name
Testing the Application
Navigate to the test environment:
../04-retrieval/test-retrieval/
Upload the video file to the bucket created in the previous deployment.
Check the bucket name as follows:
import boto3
ssm = boto3.client(service_name="ssm", region_name=region)
def get_ssm_parameter(name):
response = ssm.get_parameter(Name=name, WithDecryption=True)
return response["Parameter"]["Value"]
Then upload the file with this function:
s3_client = boto3.client('s3')
# Upload Video to Amazon S3 bucket
def upload_file_to_s3 (video_path,bucket_name,s3_key):
s3_client.upload_file(video_path, bucket_name,s3_key)
print("Upload successful!")
Once the file upload is complete, the Step Functions workflow is automatically triggered. The pipeline will automatically:
- Extract audio and start transcription
- Process video frames and generate embeddings
- Store results in Aurora PostgreSQL
You can test the application in two ways:
Query:
Open the notebook 01_query_audio_video_embeddings.ipynb and make queries directly to Aurora PostgreSQL, similar to what we did in the previous blog.
Try the API:
Open the notebook 02_test_webhook.ipynb. This notebook demonstrates how to:
- Upload video files to the S3 bucket for processing
- Test the retrieval API endpoints with different query parameters
*Upload video files to the S3 bucket for processing *
response = sns_client.list_executions(
stateMachineArn=state_machine_arn,
maxResults=12
)
response['executions'][0]
You can also see the status in the AWS Step Functions console.
Test the retrieval API endpoints with different query parameters:
To method as retrieve
: For basic search functionality.
To method as retrieve_generate
: For enhanced search with generated responses.
Basándome en el contenido del blog y el notebook de referencia que menciona el uso de Strands agents, aquí está una conclusión que conecta naturalmente con el siguiente paso:
What’s Next?
This containerized implementation of Ask Your Video demonstrates how you can scale video content processing using AWS Step Functions and Amazon ECS. The parallel processing architecture significantly improves performance while maintaining cost efficiency through optimized resource utilization.
The solution provides several key advantages over traditional approaches:
- Scalability: Handle multiple video files simultaneously without resource constraints
- Reliability: Robust error handling and workflow orchestration through Step Functions
- Cost optimization: Pay only for the compute resources you use with Fargate
- Maintainability: Containerized components ensure consistent deployments across environments
The complete source code and deployment instructions are available in the GitHub repository Star this repository.
Try implementing this solution in your AWS environment and share your feedback on how it performs with your video content. Stay tuned for Part 3, where we’ll dive into building AI agents that can intelligently interact with your video content!
Taking It Further with AI Agents
Now that you have a robust video processing pipeline, the next logical step is to integrate this capability with AI agents for more sophisticated interactions. In the upcoming Part 3 of this series, you’ll learn how to transform this containerized video analysis system into a powerful tool for Strands Agents Open Sources fremework.
By creating a custom tool that connects to your video processing API, you can build conversational AI agents that can:
- Analyze video content through natural language queries
- Provide contextual responses based on both visual and audio elements
- Enable complex multi-modal interactions across your video library
- Integrate seamlessly with other business workflows through agent orchestration
This integration opens up possibilities for applications like intelligent video search assistants, content moderation agents, and automated video analysis workflows that respond to natural language instructions.
Gracias
Eli
Dev.to Linkedin GitHub Twitter Instagram Youtube
Linktr
This content originally appeared on DEV Community and was authored by Elizabeth Fuentes L