15 KiB
Go-Live Checklist
Overview
This checklist ensures a smooth transition from staging to production. Follow all phases sequentially: Pre-Launch, Go-Live Day, Launch Period, and Post-Launch Monitoring.
Phase 1: One Week Before Go-Live
Timeline: T-7 days
Pre-Deployment Verification
-
E2E Tests Passed (100%)
- All workflow tests successful:
bash tests/curl-test-collection.sh - No critical bugs or failures
- Test results documented
- All workflow tests successful:
-
Staging Environment Verified
- Deploy to staging identical to production
- Run full load test (simulate 100+ concurrent tickets)
- Verify integration with TEST Freescout account
- Verify integration with TEST Baramundi account
- All workflows processing correctly in staging
-
Production Database Setup
- PostgreSQL audit schema initialized
- Backup strategy configured and tested
- Database performance baseline recorded
- Disk space verified (minimum 20GB available)
-
API Credentials Verified
- Freescout API key tested and active
- Freescout custom fields created and working
- Baramundi API key tested and active
- n8n encryption key generated and secure
Team Readiness
-
Team Training Completed
- Operations team trained on:
- System architecture overview
- Deployment and rollback procedures
- Monitoring dashboard usage
- Alert response procedures
- Escalation paths
- Support team trained on:
- Workflow functionality overview
- Expected behavior and timing
- How to verify system health
- When to escalate issues
- Operations team trained on:
-
Documentation Review
- All team members reviewed DEPLOYMENT.md
- All team members reviewed MONITORING.md
- Runbooks reviewed and acknowledged
- Contact list updated (on-call schedule)
-
Backup Strategy Finalized
- Daily backup schedule defined
- Backup retention policy set (7 days minimum)
- Backup restore procedure tested
- Backup storage verified (separate location from production)
Risk Mitigation
-
Rollback Plan Confirmed
- Rollback procedures documented
- Rollback tested in staging environment
- Estimated rollback time: < 30 minutes
- All team members trained on rollback
-
Communication Plan Ready
- Stakeholder notification list prepared
- Status page update process defined
- Internal update frequency established (30min intervals initially)
- Escalation contacts verified
-
Monitoring & Alerting
- All monitoring dashboards configured
- Alert recipients confirmed
- Alert thresholds set and validated
- On-call rotation established
Phase 2: Go-Live Day
Timeline: T-0 (Launch day)
Pre-Launch Checks (T-2 hours)
-
Final System Status
- All Docker services running and healthy
docker-compose psoutput verified- All services show "Up (healthy)"
- No services in "Restarting" state
-
Service Health Verification
- n8n health check:
curl http://localhost:5678/api/v1/health - PostgreSQL connection:
docker-compose exec postgres pg_isready - Milvus connectivity: Vector DB responding
- External integrations reachable (Freescout, Baramundi)
- n8n health check:
-
Database Integrity
- Audit schema verified:
SELECT COUNT(*) FROM audit.workflows; - No corruption or errors in logs
- Backup created and verified:
ls -lh backups/ - Backup restore tested
- Audit schema verified:
-
n8n Workflows Status
- All 3 workflows imported successfully
- Workflow A (Mail Processing): Ready
- Workflow B (Approval Execution): Ready
- Workflow C (KB Update): Ready
- All workflows set to Inactive (will activate after final check)
-
Monitoring System Active
- Monitoring dashboard accessible
- All metric collectors running
- Alert system armed and tested
- Log aggregation working (docker-compose logs verified)
-
Final Pre-Launch Meeting
- All team members present and ready
- Roles and responsibilities confirmed:
- Platform Lead: Overall coordination
- n8n Administrator: Workflow management
- Database Administrator: Database monitoring
- System Administrator: Infrastructure monitoring
- Support Lead: User support readiness
- Communication channels verified (Slack, phone, etc.)
Launch Window (T-0 hours)
-
Final Backup (T-15 minutes)
- Backup created immediately before activation
- Backup file verified and tested
- Backup location:
backups/pre-golive-backup-$(date +%Y%m%d-%H%M%S).sql
-
Activate Workflows (T-0 minutes)
- n8n Dashboard accessed
- Workflow A (Mail Processing) activated:
- Toggle "Active" switch ON
- Verify activation confirmed in UI
- Check logs:
docker-compose logs -f n8n | grep "Workflow A"
- Workflow B (Approval Execution) activated
- Workflow C (KB Update) activated
- All three workflows showing "Active" status
-
Launch Announcement
- Internal team notified: "System is LIVE"
- Stakeholders notified of go-live
- Status page updated: "System operational"
- Time of launch recorded: __________
-
Confirm System Accepting Requests
- Send test email to Freescout inbox
- Verify ticket created in Freescout
- Verify n8n workflow triggered (check logs)
- Verify workflow execution started
Phase 3: Launch Period Monitoring (First 24 Hours)
Timeline: T+0 to T+24 hours
Continuous Monitoring (Every 15 minutes)
-
n8n Workflow Execution
- Command:
docker-compose logs -f n8n | tail -50 - Check for:
- No error messages
- Workflows executing successfully
- No hung or stuck executions
- Log location:
/d/n8n-compose/logs/n8n.log
- Command:
-
Freescout Integration
- New tickets arriving in system
- Custom fields populated correctly
- No integration errors in Freescout logs
- Ticket processing speed acceptable
-
Baramundi Job Queue
- Check job queue status
- Verify jobs accepted from n8n
- Monitor job completion rate
- Check for failed jobs
-
Alert System
- All critical alerts functioning
- No false positive alerts
- Escalation procedures working
- On-call team responsive
-
Database Performance
- Query performance acceptable
- No locks or deadlocks
- Disk space usage normal
- Command:
docker-compose exec postgres pg_stat_statements
Hourly System Status Report (First 6 hours)
Document every hour:
Hour 1 (T+1h)
- Total tickets processed: _____
- Total workflows executed: _____
- Failed executions: _____
- System health: [ ] Green [ ] Yellow [ ] Red
- Issues encountered: _____
Hour 2 (T+2h)
- Total tickets processed: _____
- Total workflows executed: _____
- Failed executions: _____
- System health: [ ] Green [ ] Yellow [ ] Red
- Issues encountered: _____
Hour 3-6 (T+3h to T+6h)
- Repeat above for each hour
- Escalate any issues immediately
- Document all changes or interventions
Functional Validation (T+2 hours and T+12 hours)
After 2 hours:
-
AI Suggestions Displayed
- Sample processed tickets show AI suggestions
- Suggestion accuracy acceptable
- Custom field updated with ai_suggestion
- Performance acceptable (< 5 second processing time)
-
Approval Workflow Operating
- HIGH priority tickets flagged for approval
- Approval custom field populated
- Notifications sent to approvers
- Approvals received and reflected in system
-
Knowledge Base Updates
- KB articles being created/updated
- Vector embeddings generated (Milvus)
- PostgreSQL KB table growing
- Query:
SELECT COUNT(*) FROM audit.kb_articles;
After 12 hours (overnight validation):
- Validate Overnight Processing
- All workflows executed correctly overnight
- No race conditions or deadlocks occurred
- Database backups completed successfully
- All alerts functioned as expected
Critical Metrics (Monitor Continuously)
# Check n8n workflow execution rate
curl -H "X-N8N-API-KEY: $N8N_API_KEY" \
http://localhost:5678/api/v1/executions?limit=100 | jq '.executions | length'
# Check database growth
docker exec n8n-postgres psql -U n8n_user -d n8n_production -c \
"SELECT schemaname, tablename, pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename))
FROM pg_tables ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC LIMIT 10;"
# Monitor CPU/Memory
docker stats --no-stream
Incident Response (If Issues Occur)
Critical Issue (System Down):
- Immediately notify team lead
- Assess severity and scope
- Execute rollback if necessary (see DEPLOYMENT.md)
- Document incident details
- Begin root cause analysis
Performance Degradation:
- Check system resources:
docker stats - Check database locks:
docker-compose logs postgres | grep LOCK - Scale resources if needed:
docker-compose up -d --scale n8n=2 - Monitor improvement
Integration Failures:
- Verify API credentials still valid
- Check external service status
- Review integration logs
- Test connectivity manually
- Retry or escalate
Phase 4: Post-24 Hour Validation (T+24 to T+7 days)
Day 2 Validation (T+24 hours)
-
Verify KI Suggestions Working
- Sample 10 random processed tickets
- AI suggestions present and relevant
- Suggestion accuracy rate > 80%
- Processing time < 5 seconds average
- Document findings: ________
-
Approval Workflow Performance
- All HIGH priority tickets flagged
- Approval response time < 2 hours
- Approval completion rate > 95%
- No pending approvals > 4 hours old
- Total approvals processed: _____
- Approval success rate: _____%
-
Baramundi Integration Validation
- Jobs submitted successfully
- Job queue processing normally
- Job completion rate > 90%
- No stuck or failed jobs
- Total jobs processed: _____
- Job success rate: _____%
-
Knowledge Base Growth
- KB articles being created
- Vector embeddings calculated
- Query performance acceptable
- Total KB articles: _____
- Total embeddings: _____
- Query response time: _____ ms
-
System Stability
- No service crashes
- No memory leaks
- Disk usage normal
- Database integrity verified
- No orphaned records
Day 7 Comprehensive Review (T+7 days)
-
Collect Statistics
Email Processing:
- Total emails processed: _____
- Success rate: _____%
- Average processing time: _____ seconds
- Error rate: _____%
AI Suggestions:
- Total suggestions generated: _____
- Acceptance rate: _____%
- Average accuracy: _____%
- Processing time p95: _____ seconds
Approvals:
- Total approval requests: _____
- Total approvals completed: _____
- Approval completion rate: _____%
- Average response time: _____ minutes
- HIGH priority count: _____
Baramundi Jobs:
- Total jobs submitted: _____
- Total jobs completed: _____
- Success rate: _____%
- Failed jobs: _____
Knowledge Base:
- Total KB articles created: _____
- Total articles updated: _____
- Total searches: _____
- Average search response: _____ ms
-
Performance Analysis
- n8n CPU usage normal: _____ %
- n8n Memory usage normal: _____ MB
- PostgreSQL query time p95: _____ ms
- Database size: _____ GB
- Backup size: _____ GB
-
Team Feedback Collected
- Operations team feedback: ________
- Support team feedback: ________
- End user feedback: ________
- Issues encountered: ________
- Improvement suggestions: ________
-
Issue Resolution Status
- All critical issues resolved
- All high priority issues resolved
- Medium priority issues tracked
- Minor issues documented for next release
- Issue tracking document: __________
Go-Live Success Criteria - Final Sign-Off
All criteria must be met to declare go-live successful:
-
Stability (99% uptime minimum)
- System remained operational for 7 consecutive days
- Unplanned downtime < 14.4 minutes total
- All services restarted cleanly without issues
-
Functionality (100% requirements met)
- Mail processing working correctly
- AI suggestions functional and accurate
- Approval workflow operational
- Baramundi job submission successful
- KB updates functioning
-
Performance (Acceptable for workload)
- Average email processing < 5 seconds
- Average workflow execution < 10 seconds
- Database queries < 1 second (p95)
- No performance degradation observed
-
Data Integrity (100% accuracy)
- All processed tickets correctly handled
- No duplicate records
- No data loss or corruption
- Audit trail complete and accurate
-
Monitoring (All systems active)
- Real-time dashboards operational
- Alerts functioning correctly
- Logs aggregated and searchable
- Performance metrics recorded
-
Team Readiness (100% trained)
- Operations team fully trained
- Support team fully trained
- All runbooks completed
- On-call schedule established
Sign-Off By:
Project Manager: _________________ Date: _______
Operations Lead: _________________ Date: _______
Technical Lead: _________________ Date: _______
Ongoing Monitoring (Post Go-Live)
Daily Checks (First 30 Days)
- Review system health dashboard
- Check backup completion status
- Review error logs for new issues
- Verify workflow execution metrics
- Check database growth rate
- Monitor alert frequency and relevance
Weekly Checks (Ongoing)
- Generate performance report
- Review all system logs
- Verify backup restore capability
- Update documentation as needed
- Team retrospective meeting
- Plan for optimization improvements
Monthly Reviews (Ongoing)
- Comprehensive system audit
- Capacity planning review
- Security assessment
- Performance optimization review
- Team training refresher (as needed)
- Update escalation procedures
Contacts and Escalation
Primary Contacts
Project Manager:
- Name: _____________________
- Phone: _____________________
- Email: _____________________
Technical Lead:
- Name: _____________________
- Phone: _____________________
- Email: _____________________
On-Call Engineer:
- Name: _____________________
- Phone: _____________________
- Email: _____________________
Escalation Matrix
Level 1 - Application Issue:
- On-call engineer
- Response time: 15 minutes
Level 2 - System Down:
- Technical lead + On-call engineer
- Response time: 5 minutes
Level 3 - Critical Data Loss:
- Technical lead + Project manager + Database admin
- Response time: Immediate
Related Documentation
- DEPLOYMENT.md - Deployment procedures and rollback
- MONITORING.md - Monitoring dashboard and alerts
- ARCHITECTURE.md - System architecture details
- TROUBLESHOOTING.md - Common issues and solutions