# Go-Live Checklist

## Overview

This checklist ensures a smooth transition from staging to production. Follow all phases sequentially: Pre-Launch, Go-Live Day, Launch Period, and Post-Launch Monitoring.

---

## Phase 1: One Week Before Go-Live

Timeline: T-7 days

### Pre-Deployment Verification

- [ ] **E2E Tests Passed (100%)**
  - All workflow tests successful: `bash tests/curl-test-collection.sh`
  - No critical bugs or failures
  - Test results documented

- [ ] **Staging Environment Verified**
  - Deploy to staging identical to production
  - Run full load test (simulate 100+ concurrent tickets)
  - Verify integration with TEST Freescout account
  - Verify integration with TEST Baramundi account
  - All workflows processing correctly in staging

- [ ] **Production Database Setup**
  - PostgreSQL audit schema initialized
  - Backup strategy configured and tested
  - Database performance baseline recorded
  - Disk space verified (minimum 20GB available)

- [ ] **API Credentials Verified**
  - Freescout API key tested and active
  - Freescout custom fields created and working
  - Baramundi API key tested and active
  - n8n encryption key generated and secure

### Team Readiness

- [ ] **Team Training Completed**
  - Operations team trained on:
    - System architecture overview
    - Deployment and rollback procedures
    - Monitoring dashboard usage
    - Alert response procedures
    - Escalation paths
  - Support team trained on:
    - Workflow functionality overview
    - Expected behavior and timing
    - How to verify system health
    - When to escalate issues

- [ ] **Documentation Review**
  - All team members reviewed DEPLOYMENT.md
  - All team members reviewed MONITORING.md
  - Runbooks reviewed and acknowledged
  - Contact list updated (on-call schedule)

- [ ] **Backup Strategy Finalized**
  - Daily backup schedule defined
  - Backup retention policy set (7 days minimum)
  - Backup restore procedure tested
  - Backup storage verified (separate location from production)

### Risk Mitigation

- [ ] **Rollback Plan Confirmed**
  - Rollback procedures documented
  - Rollback tested in staging environment
  - Estimated rollback time: < 30 minutes
  - All team members trained on rollback

- [ ] **Communication Plan Ready**
  - Stakeholder notification list prepared
  - Status page update process defined
  - Internal update frequency established (30min intervals initially)
  - Escalation contacts verified

- [ ] **Monitoring & Alerting**
  - All monitoring dashboards configured
  - Alert recipients confirmed
  - Alert thresholds set and validated
  - On-call rotation established

---

## Phase 2: Go-Live Day

Timeline: T-0 (Launch day)

### Pre-Launch Checks (T-2 hours)

- [ ] **Final System Status**
  - All Docker services running and healthy
  - `docker-compose ps` output verified
  - All services show "Up (healthy)"
  - No services in "Restarting" state

- [ ] **Service Health Verification**
  - n8n health check: `curl http://localhost:5678/api/v1/health`
  - PostgreSQL connection: `docker-compose exec postgres pg_isready`
  - Milvus connectivity: Vector DB responding
  - External integrations reachable (Freescout, Baramundi)

- [ ] **Database Integrity**
  - Audit schema verified: `SELECT COUNT(*) FROM audit.workflows;`
  - No corruption or errors in logs
  - Backup created and verified: `ls -lh backups/`
  - Backup restore tested

- [ ] **n8n Workflows Status**
  - All 3 workflows imported successfully
  - Workflow A (Mail Processing): Ready
  - Workflow B (Approval Execution): Ready
  - Workflow C (KB Update): Ready
  - All workflows set to Inactive (will activate after final check)

- [ ] **Monitoring System Active**
  - Monitoring dashboard accessible
  - All metric collectors running
  - Alert system armed and tested
  - Log aggregation working (docker-compose logs verified)

- [ ] **Final Pre-Launch Meeting**
  - All team members present and ready
  - Roles and responsibilities confirmed:
    - Platform Lead: Overall coordination
    - n8n Administrator: Workflow management
    - Database Administrator: Database monitoring
    - System Administrator: Infrastructure monitoring
    - Support Lead: User support readiness
  - Communication channels verified (Slack, phone, etc.)

### Launch Window (T-0 hours)

- [ ] **Final Backup (T-15 minutes)**
  - Backup created immediately before activation
  - Backup file verified and tested
  - Backup location: `backups/pre-golive-backup-$(date +%Y%m%d-%H%M%S).sql`

- [ ] **Activate Workflows (T-0 minutes)**
  - n8n Dashboard accessed
  - Workflow A (Mail Processing) activated:
    - Toggle "Active" switch ON
    - Verify activation confirmed in UI
    - Check logs: `docker-compose logs -f n8n | grep "Workflow A"`
  - Workflow B (Approval Execution) activated
  - Workflow C (KB Update) activated
  - All three workflows showing "Active" status

- [ ] **Launch Announcement**
  - Internal team notified: "System is LIVE"
  - Stakeholders notified of go-live
  - Status page updated: "System operational"
  - Time of launch recorded: __________

- [ ] **Confirm System Accepting Requests**
  - Send test email to Freescout inbox
  - Verify ticket created in Freescout
  - Verify n8n workflow triggered (check logs)
  - Verify workflow execution started

---

## Phase 3: Launch Period Monitoring (First 24 Hours)

Timeline: T+0 to T+24 hours

### Continuous Monitoring (Every 15 minutes)

- [ ] **n8n Workflow Execution**
  - Command: `docker-compose logs -f n8n | tail -50`
  - Check for:
    - No error messages
    - Workflows executing successfully
    - No hung or stuck executions
  - Log location: `/d/n8n-compose/logs/n8n.log`

- [ ] **Freescout Integration**
  - New tickets arriving in system
  - Custom fields populated correctly
  - No integration errors in Freescout logs
  - Ticket processing speed acceptable

- [ ] **Baramundi Job Queue**
  - Check job queue status
  - Verify jobs accepted from n8n
  - Monitor job completion rate
  - Check for failed jobs

- [ ] **Alert System**
  - All critical alerts functioning
  - No false positive alerts
  - Escalation procedures working
  - On-call team responsive

- [ ] **Database Performance**
  - Query performance acceptable
  - No locks or deadlocks
  - Disk space usage normal
  - Command: `docker-compose exec postgres pg_stat_statements`

### Hourly System Status Report (First 6 hours)

Document every hour:

**Hour 1 (T+1h)**
- Total tickets processed: _____
- Total workflows executed: _____
- Failed executions: _____
- System health: [ ] Green [ ] Yellow [ ] Red
- Issues encountered: _____

**Hour 2 (T+2h)**
- Total tickets processed: _____
- Total workflows executed: _____
- Failed executions: _____
- System health: [ ] Green [ ] Yellow [ ] Red
- Issues encountered: _____

**Hour 3-6 (T+3h to T+6h)**
- Repeat above for each hour
- Escalate any issues immediately
- Document all changes or interventions

### Functional Validation (T+2 hours and T+12 hours)

**After 2 hours:**
- [ ] **AI Suggestions Displayed**
  - Sample processed tickets show AI suggestions
  - Suggestion accuracy acceptable
  - Custom field updated with ai_suggestion
  - Performance acceptable (< 5 second processing time)

- [ ] **Approval Workflow Operating**
  - HIGH priority tickets flagged for approval
  - Approval custom field populated
  - Notifications sent to approvers
  - Approvals received and reflected in system

- [ ] **Knowledge Base Updates**
  - KB articles being created/updated
  - Vector embeddings generated (Milvus)
  - PostgreSQL KB table growing
  - Query: `SELECT COUNT(*) FROM audit.kb_articles;`

**After 12 hours (overnight validation):**
- [ ] **Validate Overnight Processing**
  - All workflows executed correctly overnight
  - No race conditions or deadlocks occurred
  - Database backups completed successfully
  - All alerts functioned as expected

### Critical Metrics (Monitor Continuously)

```bash
# Check n8n workflow execution rate
curl -H "X-N8N-API-KEY: $N8N_API_KEY" \
  http://localhost:5678/api/v1/executions?limit=100 | jq '.executions | length'

# Check database growth
docker exec n8n-postgres psql -U n8n_user -d n8n_production -c \
  "SELECT schemaname, tablename, pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename))
   FROM pg_tables ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC LIMIT 10;"

# Monitor CPU/Memory
docker stats --no-stream
```

### Incident Response (If Issues Occur)

**Critical Issue (System Down):**
1. Immediately notify team lead
2. Assess severity and scope
3. Execute rollback if necessary (see DEPLOYMENT.md)
4. Document incident details
5. Begin root cause analysis

**Performance Degradation:**
1. Check system resources: `docker stats`
2. Check database locks: `docker-compose logs postgres | grep LOCK`
3. Scale resources if needed: `docker-compose up -d --scale n8n=2`
4. Monitor improvement

**Integration Failures:**
1. Verify API credentials still valid
2. Check external service status
3. Review integration logs
4. Test connectivity manually
5. Retry or escalate

---

## Phase 4: Post-24 Hour Validation (T+24 to T+7 days)

### Day 2 Validation (T+24 hours)

- [ ] **Verify KI Suggestions Working**
  - Sample 10 random processed tickets
  - AI suggestions present and relevant
  - Suggestion accuracy rate > 80%
  - Processing time < 5 seconds average
  - Document findings: ________

- [ ] **Approval Workflow Performance**
  - [ ] All HIGH priority tickets flagged
  - [ ] Approval response time < 2 hours
  - [ ] Approval completion rate > 95%
  - [ ] No pending approvals > 4 hours old
  - Total approvals processed: _____
  - Approval success rate: _____%

- [ ] **Baramundi Integration Validation**
  - [ ] Jobs submitted successfully
  - [ ] Job queue processing normally
  - [ ] Job completion rate > 90%
  - [ ] No stuck or failed jobs
  - Total jobs processed: _____
  - Job success rate: _____%

- [ ] **Knowledge Base Growth**
  - [ ] KB articles being created
  - [ ] Vector embeddings calculated
  - [ ] Query performance acceptable
  - Total KB articles: _____
  - Total embeddings: _____
  - Query response time: _____ ms

- [ ] **System Stability**
  - [ ] No service crashes
  - [ ] No memory leaks
  - [ ] Disk usage normal
  - [ ] Database integrity verified
  - [ ] No orphaned records

### Day 7 Comprehensive Review (T+7 days)

- [ ] **Collect Statistics**

  **Email Processing:**
  - Total emails processed: _____
  - Success rate: _____%
  - Average processing time: _____ seconds
  - Error rate: _____%

  **AI Suggestions:**
  - Total suggestions generated: _____
  - Acceptance rate: _____%
  - Average accuracy: _____%
  - Processing time p95: _____ seconds

  **Approvals:**
  - Total approval requests: _____
  - Total approvals completed: _____
  - Approval completion rate: _____%
  - Average response time: _____ minutes
  - HIGH priority count: _____

  **Baramundi Jobs:**
  - Total jobs submitted: _____
  - Total jobs completed: _____
  - Success rate: _____%
  - Failed jobs: _____

  **Knowledge Base:**
  - Total KB articles created: _____
  - Total articles updated: _____
  - Total searches: _____
  - Average search response: _____ ms

- [ ] **Performance Analysis**
  - [ ] n8n CPU usage normal: _____ %
  - [ ] n8n Memory usage normal: _____ MB
  - [ ] PostgreSQL query time p95: _____ ms
  - [ ] Database size: _____ GB
  - [ ] Backup size: _____ GB

- [ ] **Team Feedback Collected**
  - [ ] Operations team feedback: ________
  - [ ] Support team feedback: ________
  - [ ] End user feedback: ________
  - [ ] Issues encountered: ________
  - [ ] Improvement suggestions: ________

- [ ] **Issue Resolution Status**
  - [ ] All critical issues resolved
  - [ ] All high priority issues resolved
  - [ ] Medium priority issues tracked
  - [ ] Minor issues documented for next release
  - Issue tracking document: __________

### Go-Live Success Criteria - Final Sign-Off

All criteria must be met to declare go-live successful:

- [ ] **Stability (99% uptime minimum)**
  - System remained operational for 7 consecutive days
  - Unplanned downtime < 14.4 minutes total
  - All services restarted cleanly without issues

- [ ] **Functionality (100% requirements met)**
  - Mail processing working correctly
  - AI suggestions functional and accurate
  - Approval workflow operational
  - Baramundi job submission successful
  - KB updates functioning

- [ ] **Performance (Acceptable for workload)**
  - Average email processing < 5 seconds
  - Average workflow execution < 10 seconds
  - Database queries < 1 second (p95)
  - No performance degradation observed

- [ ] **Data Integrity (100% accuracy)**
  - All processed tickets correctly handled
  - No duplicate records
  - No data loss or corruption
  - Audit trail complete and accurate

- [ ] **Monitoring (All systems active)**
  - Real-time dashboards operational
  - Alerts functioning correctly
  - Logs aggregated and searchable
  - Performance metrics recorded

- [ ] **Team Readiness (100% trained)**
  - Operations team fully trained
  - Support team fully trained
  - All runbooks completed
  - On-call schedule established

**Sign-Off By:**

Project Manager: _________________ Date: _______

Operations Lead: _________________ Date: _______

Technical Lead: _________________ Date: _______

---

## Ongoing Monitoring (Post Go-Live)

### Daily Checks (First 30 Days)

- [ ] Review system health dashboard
- [ ] Check backup completion status
- [ ] Review error logs for new issues
- [ ] Verify workflow execution metrics
- [ ] Check database growth rate
- [ ] Monitor alert frequency and relevance

### Weekly Checks (Ongoing)

- [ ] Generate performance report
- [ ] Review all system logs
- [ ] Verify backup restore capability
- [ ] Update documentation as needed
- [ ] Team retrospective meeting
- [ ] Plan for optimization improvements

### Monthly Reviews (Ongoing)

- [ ] Comprehensive system audit
- [ ] Capacity planning review
- [ ] Security assessment
- [ ] Performance optimization review
- [ ] Team training refresher (as needed)
- [ ] Update escalation procedures

---

## Contacts and Escalation

### Primary Contacts

**Project Manager:**
- Name: _____________________
- Phone: _____________________
- Email: _____________________

**Technical Lead:**
- Name: _____________________
- Phone: _____________________
- Email: _____________________

**On-Call Engineer:**
- Name: _____________________
- Phone: _____________________
- Email: _____________________

### Escalation Matrix

**Level 1 - Application Issue:**
- On-call engineer
- Response time: 15 minutes

**Level 2 - System Down:**
- Technical lead + On-call engineer
- Response time: 5 minutes

**Level 3 - Critical Data Loss:**
- Technical lead + Project manager + Database admin
- Response time: Immediate

---

## Related Documentation

- [DEPLOYMENT.md](DEPLOYMENT.md) - Deployment procedures and rollback
- [MONITORING.md](MONITORING.md) - Monitoring dashboard and alerts
- [ARCHITECTURE.md](ARCHITECTURE.md) - System architecture details
- [TROUBLESHOOTING.md](TROUBLESHOOTING.md) - Common issues and solutions