# Project 1: AI Chatbot - Well-Architected Production Design

**Document Created**: March 19, 2026
**Author**: Rus Teston
**Status**: Reference Architecture (Not Deployed)

---

## Overview

This document outlines how the AI Chatbot (Cloud Architecture Advisor) would be designed and deployed in a true production environment following the **AWS Well-Architected Framework**. The current implementation is a functional demo; this document describes the enterprise-grade version suitable for internal employee use.

---

## Current Architecture (Demo)

| Component | Current Implementation |
|-----------|----------------------|
| Frontend | S3 static website (HTTP) |
| API | API Gateway (REST) - no auth |
| Compute | Single Lambda function |
| AI Model | Bedrock Nova Lite |
| Auth | None - open to public |
| IaC | Bash scripts |
| Monitoring | CloudWatch Logs only |
| CDN/HTTPS | None |

---

## Production Architecture by Well-Architected Pillar

### 1. Operational Excellence

**Principle**: Run and monitor systems to deliver business value and continually improve processes.

| Area | Production Design | Reasoning |
|------|------------------|-----------|
| **Infrastructure as Code** | AWS SAM (Serverless Application Model) template defining all resources in a single stack | Repeatable deployments, version-controlled, peer-reviewable, rollback capable |
| **CI/CD Pipeline** | CodePipeline triggered by GitHub commits → CodeBuild → SAM deploy | Eliminates manual deployment errors, enables automated testing before production |
| **Structured Logging** | Lambda Powertools for Python with correlation IDs, JSON-formatted logs | Enables log aggregation, filtering, and tracing across requests |
| **Monitoring Dashboard** | CloudWatch dashboard with widgets for: invocation count, error rate, latency (p50/p90/p99), Bedrock token usage | Single pane of glass for operational health |
| **Alerting** | CloudWatch Alarms → SNS notifications for: error rate > 5%, latency p99 > 10s, 5xx responses, monthly cost threshold | Proactive issue detection before users report problems |
| **Runbooks** | Documented procedures for common operational tasks: deployment, rollback, log investigation, scaling | Reduces mean time to recovery (MTTR) |

### 2. Security

**Principle**: Protect information, systems, and assets through risk assessments and mitigation strategies.

| Area | Production Design | Reasoning |
|------|------------------|-----------|
| **Authentication** | Amazon Cognito User Pool with employee email domain restriction | Only authorized employees can access the chatbot; supports MFA |
| **Authorization** | Cognito Authorizer on API Gateway; JWT token validation | Every API request verified against valid session token |
| **CORS Policy** | Locked to specific domain (e.g., `https://chatbot.company.com`) | Prevents cross-origin abuse from unauthorized domains |
| **WAF** | AWS WAF on API Gateway with rate limiting rules (100 requests/min per IP), SQL injection protection, known bad IP blocking | Protects against DDoS, abuse, and injection attacks |
| **IAM Least Privilege** | Lambda role scoped to: specific Bedrock model ARN, CloudWatch Logs for its own log group only | Minimizes blast radius if credentials are compromised |
| **Encryption in Transit** | CloudFront with TLS 1.2+ enforced, HTTPS-only | All data encrypted between client and server |
| **Encryption at Rest** | S3 bucket encryption (SSE-S3), CloudWatch Logs encryption | Protects stored data |
| **Input Validation** | Lambda validates message length (max 2000 chars), sanitizes input before sending to Bedrock | Prevents prompt injection and excessive token consumption |
| **Secrets Management** | No hardcoded values; all configuration via environment variables and SSM Parameter Store | Credentials never in source code |

### 3. Reliability

**Principle**: Ensure a workload performs its intended function correctly and consistently.

| Area | Production Design | Reasoning |
|------|------------------|-----------|
| **Throttling** | API Gateway usage plan: 100 requests/second burst, 50 requests/second steady state | Prevents any single user or bot from overwhelming the system |
| **Dead Letter Queue** | SQS DLQ attached to Lambda for failed async invocations | Failed requests captured for investigation rather than silently lost |
| **Retry Logic** | Bedrock calls wrapped with exponential backoff (3 retries, 1s/2s/4s) | Handles transient Bedrock throttling or timeouts gracefully |
| **Health Check** | API Gateway `/health` endpoint returning Lambda + Bedrock connectivity status | Enables monitoring systems to detect outages |
| **Graceful Degradation** | If Bedrock is unavailable, return cached "service temporarily unavailable" message | Users get a clear message instead of a cryptic error |
| **Multi-AZ** | Lambda runs across multiple AZs by default | Built-in redundancy |

### 4. Performance Efficiency

**Principle**: Use computing resources efficiently to meet system requirements and maintain efficiency as demand changes.

| Area | Production Design | Reasoning |
|------|------------------|-----------|
| **CloudFront CDN** | CloudFront distribution in front of S3 for frontend assets | Sub-100ms load times globally, HTTPS, caching |
| **Lambda Memory** | 256MB (current) with performance testing to right-size | Bedrock API calls are I/O-bound, not CPU-bound; 256MB is likely optimal |
| **API Gateway Caching** | Optional: 5-minute cache on identical requests | Reduces Bedrock costs for repeated questions (trade-off: slightly stale responses) |
| **Lambda Provisioned Concurrency** | Not recommended unless cold start latency is unacceptable | Cold starts add ~1-2s on first request; Bedrock response time (3-8s) dominates total latency |
| **Connection Reuse** | Bedrock client initialized outside handler (already done) | Reuses TCP connections across warm invocations |

### 5. Cost Optimization

**Principle**: Avoid unnecessary costs and understand where money is being spent.

| Area | Production Design | Reasoning |
|------|------------------|-----------|
| **Usage Plans** | API Gateway usage plan with monthly quota (e.g., 10,000 requests/month) | Hard cap prevents runaway Bedrock costs |
| **Cost Alarms** | CloudWatch Billing alarm at $25/month threshold → SNS email alert | Early warning before costs spiral |
| **Bedrock Token Limits** | maxTokens set to 1000 in inference config; input capped at 2000 chars | Controls per-request cost |
| **S3 Lifecycle** | No lifecycle needed (static assets only, minimal storage) | Already cost-optimized |
| **Right-Sized Lambda** | 256MB memory, 30s timeout | Matches workload requirements without over-provisioning |
| **Estimated Monthly Cost** | ~$8-15/month for moderate usage (500 requests/day) | Cognito free tier (50K MAU), API Gateway ($3.50/million), Lambda ($0.20/million), Bedrock (~$5-10 for Nova Lite) |

### 6. Sustainability

**Principle**: Minimize environmental impact of running cloud workloads.

| Area | Production Design | Reasoning |
|------|------------------|-----------|
| **Serverless Architecture** | Lambda + API Gateway + S3 (no idle resources) | Zero compute when not in use; scales to zero automatically |
| **Right-Sized Responses** | System prompt limits response to 300-500 words | Reduces unnecessary token generation and compute |
| **CloudFront Caching** | Reduces origin requests for static assets | Fewer S3 reads, lower data transfer |
| **Region Selection** | us-east-1 (one of AWS's most energy-efficient regions) | Lower carbon footprint |

---

## Production Architecture Components

```
User (Browser)
  → CloudFront (HTTPS, CDN, caching)
    → S3 (static frontend assets, encrypted)

User (Browser)
  → CloudFront
    → WAF (rate limiting, IP filtering, injection protection)
      → API Gateway (REST API)
        → Cognito Authorizer (JWT validation)
          → Lambda (input validation, structured logging)
            → Bedrock Nova Lite (AI inference)

Lambda failures → SQS Dead Letter Queue
CloudWatch Logs → CloudWatch Dashboard → CloudWatch Alarms → SNS → Email
AWS SAM Template → CodePipeline → CodeBuild → Automated Deployment
```

---

## Key Differences: Demo vs Production

| Aspect | Demo (Current) | Production (Proposed) |
|--------|---------------|----------------------|
| Authentication | None | Cognito + JWT |
| HTTPS | No (HTTP S3) | Yes (CloudFront + ACM) |
| Firewall | None | WAF with rate limiting |
| Deployment | Bash scripts | SAM + CI/CD pipeline |
| Monitoring | Basic logs | Dashboard + alarms + SNS |
| Error Handling | Generic catch | DLQ + retries + graceful degradation |
| Input Validation | None | Length + sanitization |
| Cost Controls | None | Usage plans + billing alarms |
| CORS | Open (`*`) | Domain-restricted |
| IaC | None | Full SAM template |

---

## Architecture Diagram

See the production architecture diagram: [Project 1 - Well-Architected Production Design](project-1-well-architected.png)

---

*This document demonstrates understanding of the AWS Well-Architected Framework applied to a real-world serverless AI application. The current demo proves functionality; this design proves production readiness.*