AWS Bedrock Agent: Intelligent PDF to Markdown Converter🔗︎
A production-ready serverless application that leverages AWS Bedrock Agents with Claude AI to convert PDF documents into structured Markdown with intelligent image positioning and bulk processing capabilities.
Project Overview🔗︎
This project demonstrates how to build a sophisticated document processing pipeline using AWS Bedrock Agents, Lambda functions, and Claude AI. The system automatically extracts text and images from PDF files, analyses their spatial relationships, and generates high-quality Markdown output with contextually positioned images.
Using AI to build this AI solution
This project showcases an interesting meta-application of AI: using Claude Code in VS Code to build an AI-powered document processing system. The development process itself demonstrates semi-agentic AI assistance in action.
The Development Partnership: As an Azure-focused developer with limited AWS experience, I relied heavily on Claude Code's autonomous capabilities to navigate the AWS ecosystem. Claude Code acted as both a knowledgeable AWS consultant and a hands-on development partner, executing commands and analysing results in real-time.
Semi-Agentic Development in Action:
- AWS CLI Operations: Claude Code autonomously executed AWS CLI commands to discover available Bedrock models, check account permissions, and configure service access
- Model Discovery: When the initial Claude model IDs failed, Claude Code systematically explored available inference profiles, eventually discovering the correct model path:
us.anthropic.claude-sonnet-4-20250514-v1:0
- Configuration Troubleshooting: Claude Code automatically ran diagnostic commands to resolve authentication issues, check IAM permissions, and validate service configurations
- Code Generation: Real-time creation of Lambda functions, Docker configurations, and deployment scripts based on iterative testing and feedback
- Error Resolution: When faced with runtime errors, Claude Code autonomously analysed CloudWatch logs, identified issues, and proposed code fixes
Platform Choice: AWS vs Azure: The decision to use AWS instead of my familiar Azure platform was driven by AI model availability. At the time of development, Azure's AI services were heavily based on OpenAI's ChatGPT models, whilst AWS Bedrock offered direct access to Anthropic's Claude family - specifically Claude Sonnet 4, which provided superior document analysis capabilities.
Interestingly, Microsoft and OpenAI are evolving their partnership away from the exclusive relationship model, with Microsoft diversifying to include Anthropic's Claude models in Office 365 applications. This shift may significantly change the Azure AI landscape in the future.
The Human-AI Development Flow:
- Problem Definition: I described the goal (PDF to Markdown conversion)
- Architecture Exploration: Claude Code suggested AWS Bedrock Agents and researched the implementation approach
- Hands-on Implementation: Claude Code wrote code, executed AWS commands, and debugged issues autonomously
- Iterative Refinement: Together we refined the solution through multiple deployment and testing cycles
- Production Optimisation: Claude Code implemented bulk processing, error handling, and performance improvements
Version Control Strategy: To maintain clarity during the rapid iterative development process, I implemented a simple but effective version control approach: each Lambda function deployment included a hardcoded version number that Claude Code would automatically increment with every code change. This seemingly simple practice proved invaluable:
- Deployment Verification: Instant confirmation that the correct code version was running in AWS
- Communication Clarity: Easy reference to specific iterations during troubleshooting ("version 0.2.8 had the timeout issue")
- Development Tracking: Clear progression through the evolution from basic PDF processing to full bulk operations
- Rollback Identification: Quick identification of which version to revert to when issues arose
This human-directed, AI-executed versioning approach exemplified the collaborative nature of the development process - strategic oversight combined with autonomous implementation.
The Crucial Role of Context7 MCP: A critical component of this development success was the Context7 MCP (Model Context Protocol) integration. Context7 served as Claude Code's "fact-checker" and up-to-date documentation source, preventing the common AI pitfall of working with outdated information.
How Context7 Kept Development on Track:
- Real-time Documentation: Provided current AWS Bedrock API specifications and model availability
- Version Verification: Ensured PyMuPDF installation commands matched the latest best practices
- Parameter Validation: Confirmed correct AWS CLI syntax and Lambda configuration options
- Dependency Management: Verified compatible Python package versions for Lambda runtime
MCP Configuration:
Context7 MCP is configured in %userprofile%\.claude.json
(not the .claude
directory):
%userprofile%\.claude.json | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 |
|
Key Learning: This project demonstrates how AI assistants can serve as domain experts in unfamiliar technology stacks, effectively acting as both consultant and implementation partner. The semi-agentic nature of Claude Code - autonomously executing commands whilst maintaining human oversight - combined with Context7's real-time documentation access, proved invaluable for cross-platform development.
Production Status
✅ PRODUCTION READY - Version 0.3.2 deployed and operational
- Real PDF text and image extraction using PyMuPDF
- Intelligent image positioning with Claude AI analysis
- Bulk processing with smart skip logic
- Conversational interface with progress indicators
- 15-minute timeout, 1GB memory, comprehensive error handling
Architecture Overview🔗︎
System Components🔗︎
![AWS Architecture Diagram Placeholder] Screenshot needed: AWS Console showing Bedrock Agent with action groups, model configuration, and Lambda function integration
The solution consists of several integrated AWS services:
- Amazon Bedrock Agent: Orchestrates the conversion workflow with conversational interface
- AWS Lambda: Serverless compute for PDF processing (Python 3.12)
- Amazon S3: Storage for input PDFs and output Markdown files
- Amazon Bedrock: Claude AI model access for intelligent content analysis
- Amazon CloudWatch: Logging and monitoring
Data Flow🔗︎
graph TD
A[User Request] --> B[Bedrock Agent]
B --> C[Lambda Function]
C --> D[S3 Input Bucket]
D --> E[PyMuPDF Processing]
E --> F[Claude AI Analysis]
F --> G[Markdown Generation]
G --> H[S3 Output Bucket]
H --> I[Optional Zip Creation]
I --> J[User Response]
Key Features🔗︎
🔄 Intelligent PDF Processing🔗︎
The system uses PyMuPDF for real PDF text and image extraction, going beyond simple text parsing:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
🧠 Claude AI Integration🔗︎
The extracted content is sent to Claude Sonnet 4 for intelligent analysis:
Model Configuration
- Model: Claude Sonnet 4 (inference profile)
- API Version:
bedrock-2023-05-31
- Processing Time: ~2-5 minutes per PDF
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
📦 Bulk Processing Capabilities🔗︎
Smart Processing Features
- Auto-discovery: Finds all PDFs in specified S3 paths
- Skip logic: Avoids reprocessing existing files
- Batch limits: Max 10 files, 100MB total, 200 pages
- Error resilience: Continues processing if individual files fail
💬 Conversational Interface🔗︎
The Bedrock Agent provides an intuitive chat interface:
Example Conversation:
User: Process all PDFs in s3://pdf-input-bucket/
Agent: Found 3 PDF(s) to process. Would you like the output zipped?
1. Yes - Create zip file
2. No - Keep individual folders
User: Yes
Agent: ✅ Processed 3 PDFs. Zip created: s3://pdf-output-bucket/PDF2MD-bulk-20250121-143022.zip
Implementation Details🔗︎
Lambda Function Architecture🔗︎
![Lambda Configuration Screenshot Placeholder] Screenshot needed: AWS Lambda console showing function configuration with Runtime (Python 3.12), Memory (1024 MB), Timeout (15 min), and attached layers
Function Configuration:
- Runtime: Python 3.12
- Memory: 1GB
- Timeout: 15 minutes (900 seconds)
- Layer: Custom PyMuPDF layer (50MB)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
PyMuPDF Integration🔗︎
The system uses a custom Lambda layer for PyMuPDF integration:
Lambda Layer Requirements
- Size: 50MB PyMuPDF 1.26.4 compiled for AWS Lambda
- Paths:
/opt/python/
and/opt/python/lib/python3.12/site-packages/
- Build: Docker-based compilation using AWS Lambda Python 3.12 base image
Creating the PyMuPDF Lambda Layer
PyMuPDF requires native libraries that must be compiled specifically for the AWS Lambda runtime environment. The path to the final Docker-based solution involved several failed attempts and important lessons about Lambda's constraints.
Initial Attempts and Why They Failed:
Before settling on the Docker approach, Claude Code first attempted to find pre-built PyMuPDF packages compatible with AWS Lambda:
- PyPI Wheel Search: Searched for existing wheels (1) compiled for Linux x86_64 that might work in Lambda's Amazon Linux environment
- AWS Lambda Layers: Looked for community-contributed layers containing PyMuPDF
- Conda Packages: Investigated conda-forge packages as an alternative source
The Size Problem: AWS Lambda has strict size limits that made PyMuPDF particularly challenging:
- Layer Limit: 250MB uncompressed (50MB compressed)
- PyMuPDF Dependencies: Includes large graphics libraries (MuPDF, FreeType, OpenJPEG)
- Architecture Mismatch: Windows/Mac wheels wouldn't work on Lambda's Linux environment
- Wheel Bloat: Standard PyMuPDF wheels often include unnecessary components for our PDF-only use case
Why Docker Became Necessary: After the initial approaches failed, Claude Code determined that building from source in Lambda's exact runtime environment was the only reliable solution:
- Environment Matching: Docker uses the official AWS Lambda Python 3.12 base image
- Dependency Control: Can exclude unnecessary components during compilation
- Size Optimisation: Targeted installation to specific paths reduces bloat
- Reproducible Builds: Ensures consistent results across different development machines
Here's the complete build process with the actual scripts used:
Build Files Overview:
requirements.txt
- Python dependenciesbuild-layer.dockerfile
- Docker configuration for Lambda environmentbuild-layer.bat
- Windows batch script for automated buildingcreate-layer.py
- Python script alternative for cross-platform building
requirements.txt
PyMuPDF==1.26.4
build-layer.dockerfile
build-layer.dockerfile | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
build-layer.bat (Windows)
build-layer.bat | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
|
create-layer.py (Cross-platform Python)
create-layer.py | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 |
|
Build Process:
-
Using Windows Batch Script:
# Simply run the batch file ./build-layer.bat
-
Using Python Script (Cross-platform):
# Run the Python script python create-layer.py
-
Manual Docker Commands: ```bash linenum="1" # Build the Docker image docker build -t pymupdf-layer-builder -f build-layer.dockerfile .
# Extract layer files docker create --name temp-container pymupdf-layer-builder docker cp temp-container:/opt/python ./ docker rm temp-container
# Create ZIP file (Linux/Mac) zip -r pymupdf-layer.zip python/
# Create ZIP file (Windows PowerShell) Compress-Archive -Path python -DestinationPath pymupdf-layer.zip -Force ```
Upload to AWS Lambda:
The resulting pymupdf-layer.zip
file (approximately 50MB) is uploaded to AWS Lambda:
- Navigate to AWS Lambda → Layers → Create layer
- Upload the
pymupdf-layer.zip
file - Set compatible runtimes to Python 3.12
- Attach the layer to the PDF processing Lambda function
Why This Approach?
- Native Compatibility: Uses AWS Lambda's exact runtime environment
- Automated Process: Scripts handle the complex Docker operations
- Size Optimisation: Dual-path installation ensures compatibility while staying under limits
- Reproducible: Version-locked dependencies ensure consistent builds
- Wheels are pre-compiled Python packages that include binary dependencies. They're faster to install than source distributions because they don't require compilation, but they must match the target platform's architecture and operating system exactly.
Image Processing Pipeline🔗︎
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
|
Setup and Configuration🔗︎
Prerequisites🔗︎
AWS Requirements
- AWS Account with appropriate permissions
- Bedrock service access in us-east-1 region
- Claude model access (requires separate request)
- S3 buckets for input and output
Why us-east-1?
The us-east-1 region is chosen because it has the best availability of newer Claude models and Bedrock features compared to other regions. While this may introduce slightly higher latency for users outside North America, the access to latest AI capabilities outweighs the marginal performance difference for this use case.
1. S3 Bucket Configuration🔗︎
![S3 Bucket Configuration Screenshot Placeholder] Screenshot needed: S3 console showing both input and output buckets with bucket policies, permissions tab, and access control lists
Create two S3 buckets:
- Input bucket:
pdf-input-bucket
- Output bucket:
pdf-output-bucket
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::YOUR-ACCOUNT-ID:role/lambda-execution-role"
},
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject"
],
"Resource": [
"arn:aws:s3:::YOUR-INPUT-BUCKET/*",
"arn:aws:s3:::YOUR-OUTPUT-BUCKET/*"
]
}
]
}
2. Lambda Function Deployment🔗︎
# Package and deploy Lambda function
cd aws/lambda/pdf-processing-function/
zip -r function.zip .
aws lambda update-function-code \
--function-name pdf-processing-function \
--zip-file fileb://function.zip
3. Bedrock Agent Configuration🔗︎
![Bedrock Agent Configuration Screenshot Placeholder] Screenshot needed: Bedrock console showing Agent details page with model selection (Claude Sonnet 4), action groups configured, and instructions panel
Agent Configuration:
- Model: Claude Sonnet 4
- Instructions: Custom instructions for PDF processing
- Action Groups: Lambda function integration
- Knowledge Base: Optional for enhanced context
Agent Instructions: |
You are a PDF to Markdown conversion specialist. Your role is to:
1. Accept PDF file paths from users
2. Process PDFs through the Lambda function
3. Provide conversational feedback on processing status
4. Offer zip options for bulk processing
5. Handle errors gracefully with helpful messages
4. IAM Permissions🔗︎
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
|
Usage Examples🔗︎
Single PDF Processing🔗︎
1 2 3 4 5 |
|
Bulk Processing🔗︎
# Upload multiple PDFs
aws s3 sync ./documents/ s3://pdf-input-bucket/
# Process all PDFs
"Process all PDFs in s3://pdf-input-bucket/ and save to pdf-output-bucket"
Output Structure🔗︎
The system generates organised output:
pdf-output-bucket/
├── document-name/
│ ├── document-name.md # Main Markdown file
│ └── images/ # Extracted images
│ ├── document-name_image_001_page1.png
│ ├── document-name_image_002_page2.png
│ └── ...
└── PDF2MD-bulk-20250121-143022.zip # Optional bulk zip
Performance and Limitations🔗︎
Performance Metrics🔗︎
Production Performance
- Processing Time: 2-5 minutes per PDF
- Throughput: Up to 10 PDFs per batch
- Size Limits: 100MB total batch size
- Page Limits: 200 pages estimated per batch
- Memory: 1GB Lambda allocation
- Timeout: 15 minutes maximum
Current Limitations🔗︎
Known Limitations
- Complex Layouts: Multi-column layouts simplified to linear flow
- Table Processing: Tables converted to simple Markdown format
- Font Information: Font styles not preserved in output
- Vector Graphics: Only raster images extracted
- File Size: Large PDFs may timeout (>50MB individual files)
Monitoring and Troubleshooting🔗︎
CloudWatch Logging🔗︎
![CloudWatch Logs Screenshot Placeholder] Screenshot needed: CloudWatch Logs console showing log stream with emoji progress indicators (🔍 Step ⅕, 📥 Step ⅖, etc.) and processing details
The system provides detailed logging with emoji indicators:
logger.info("🔍 Step 1/5: Discovering PDFs to process...")
logger.info("📥 Step 2/5: Downloading and extracting PDF content...")
logger.info("🧠 Step 3/5: Processing content with Claude AI...")
logger.info("💾 Step 4/5: Saving Markdown and images to S3...")
logger.info("📦 Step 5/5: Creating zip file (if requested)...")
Common Issues and Solutions🔗︎
Troubleshooting Guide
Issue: Model access denied Solution: Ensure Claude model access is granted in Bedrock console
Issue: Lambda timeout Solution: Reduce batch size or increase timeout limit
Issue: Image extraction fails Solution: Check PDF format compatibility with PyMuPDF
Issue: Memory errors Solution: Process smaller batches or increase Lambda memory
Future Enhancements🔗︎
Planned Web Interface🔗︎
Roadmap: Web Interface Development
The next major enhancement involves creating a modern web interface to replace the Bedrock Agent chat interface:
- Modern UI: Drag & drop upload with progress indicators
- Real-time Progress: WebSocket updates for processing status
- Batch Management: Visual controls for bulk operations
- Authentication: Entra ID OIDC integration
- Mobile Support: Responsive design for all devices
Technical Architecture for Web Interface🔗︎
graph TD
A[React/Vue.js Frontend] --> B[API Gateway]
B --> C[Lambda Functions]
C --> D[Existing PDF Processor]
C --> E[WebSocket API]
E --> F[Real-time Progress]
A --> G[Entra ID OIDC]
G --> H[JWT Validation]
Cost Analysis🔗︎
AWS Service Costs🔗︎
Estimated Monthly Costs (100 PDFs/month)
- Lambda: ~$5-10 (execution time and memory)
- Bedrock: ~$20-30 (Claude model usage)
- S3: ~$1-2 (storage and requests)
- CloudWatch: ~$1-2 (logging)
- Total: ~$27-44/month
Cost Optimization Tips🔗︎
- Use S3 lifecycle policies for old outputs
- Implement CloudWatch log retention policies
- Monitor Bedrock token usage
- Consider Reserved Capacity for high volume
Security Considerations🔗︎
Data Protection🔗︎
Security Best Practices
- No Hardcoded Credentials: Uses IAM roles exclusively
- Encrypted Storage: S3 buckets use server-side encryption
- Access Logging: All API calls logged to CloudTrail
- Network Security: Lambda runs in VPC if needed
- Content Validation: PDF malware scanning recommended
Access Control🔗︎
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
Conclusion🔗︎
This AWS Bedrock Agent project demonstrates the power of combining serverless computing with AI services to create intelligent document processing solutions. The system successfully processes PDFs with real text extraction, intelligent image positioning, and bulk processing capabilities, all wrapped in a conversational interface.
The production-ready implementation provides a solid foundation for enterprise document processing workflows, with clear paths for enhancement through web interfaces and additional AI capabilities.
Next Steps
- Deploy the web interface for improved user experience
- Add OCR capabilities for scanned documents
- Implement table recognition for better data extraction
- Add support for additional formats (Word, PowerPoint)
- Integrate with enterprise systems via APIs
This project showcases the integration of AWS Bedrock Agents, Lambda functions, and Claude AI to create a sophisticated document processing pipeline that balances automation with intelligent content analysis.
Meta-Documentation Note
In a delightful case of AI recursion, this entire article was written by Claude Code using the project's own CLAUDE.md
context file as source material. With just a little human guidance (and the occasional "fix the spelling mistakes" reminder), Claude Code transformed a technical development log into comprehensive project documentation.
It's rather fitting that an AI assistant wrote the documentation for an AI-powered document processing system - we've essentially created an AI that helps write about AI that processes documents! 🤖📄✨