Files
n8n-compose/docs/GO-LIVE-CHECKLIST.md
2026-03-16 17:32:59 +01:00

15 KiB

Go-Live Checklist

Overview

This checklist ensures a smooth transition from staging to production. Follow all phases sequentially: Pre-Launch, Go-Live Day, Launch Period, and Post-Launch Monitoring.


Phase 1: One Week Before Go-Live

Timeline: T-7 days

Pre-Deployment Verification

  • E2E Tests Passed (100%)

    • All workflow tests successful: bash tests/curl-test-collection.sh
    • No critical bugs or failures
    • Test results documented
  • Staging Environment Verified

    • Deploy to staging identical to production
    • Run full load test (simulate 100+ concurrent tickets)
    • Verify integration with TEST Freescout account
    • Verify integration with TEST Baramundi account
    • All workflows processing correctly in staging
  • Production Database Setup

    • PostgreSQL audit schema initialized
    • Backup strategy configured and tested
    • Database performance baseline recorded
    • Disk space verified (minimum 20GB available)
  • API Credentials Verified

    • Freescout API key tested and active
    • Freescout custom fields created and working
    • Baramundi API key tested and active
    • n8n encryption key generated and secure

Team Readiness

  • Team Training Completed

    • Operations team trained on:
      • System architecture overview
      • Deployment and rollback procedures
      • Monitoring dashboard usage
      • Alert response procedures
      • Escalation paths
    • Support team trained on:
      • Workflow functionality overview
      • Expected behavior and timing
      • How to verify system health
      • When to escalate issues
  • Documentation Review

    • All team members reviewed DEPLOYMENT.md
    • All team members reviewed MONITORING.md
    • Runbooks reviewed and acknowledged
    • Contact list updated (on-call schedule)
  • Backup Strategy Finalized

    • Daily backup schedule defined
    • Backup retention policy set (7 days minimum)
    • Backup restore procedure tested
    • Backup storage verified (separate location from production)

Risk Mitigation

  • Rollback Plan Confirmed

    • Rollback procedures documented
    • Rollback tested in staging environment
    • Estimated rollback time: < 30 minutes
    • All team members trained on rollback
  • Communication Plan Ready

    • Stakeholder notification list prepared
    • Status page update process defined
    • Internal update frequency established (30min intervals initially)
    • Escalation contacts verified
  • Monitoring & Alerting

    • All monitoring dashboards configured
    • Alert recipients confirmed
    • Alert thresholds set and validated
    • On-call rotation established

Phase 2: Go-Live Day

Timeline: T-0 (Launch day)

Pre-Launch Checks (T-2 hours)

  • Final System Status

    • All Docker services running and healthy
    • docker-compose ps output verified
    • All services show "Up (healthy)"
    • No services in "Restarting" state
  • Service Health Verification

    • n8n health check: curl http://localhost:5678/api/v1/health
    • PostgreSQL connection: docker-compose exec postgres pg_isready
    • Milvus connectivity: Vector DB responding
    • External integrations reachable (Freescout, Baramundi)
  • Database Integrity

    • Audit schema verified: SELECT COUNT(*) FROM audit.workflows;
    • No corruption or errors in logs
    • Backup created and verified: ls -lh backups/
    • Backup restore tested
  • n8n Workflows Status

    • All 3 workflows imported successfully
    • Workflow A (Mail Processing): Ready
    • Workflow B (Approval Execution): Ready
    • Workflow C (KB Update): Ready
    • All workflows set to Inactive (will activate after final check)
  • Monitoring System Active

    • Monitoring dashboard accessible
    • All metric collectors running
    • Alert system armed and tested
    • Log aggregation working (docker-compose logs verified)
  • Final Pre-Launch Meeting

    • All team members present and ready
    • Roles and responsibilities confirmed:
      • Platform Lead: Overall coordination
      • n8n Administrator: Workflow management
      • Database Administrator: Database monitoring
      • System Administrator: Infrastructure monitoring
      • Support Lead: User support readiness
    • Communication channels verified (Slack, phone, etc.)

Launch Window (T-0 hours)

  • Final Backup (T-15 minutes)

    • Backup created immediately before activation
    • Backup file verified and tested
    • Backup location: backups/pre-golive-backup-$(date +%Y%m%d-%H%M%S).sql
  • Activate Workflows (T-0 minutes)

    • n8n Dashboard accessed
    • Workflow A (Mail Processing) activated:
      • Toggle "Active" switch ON
      • Verify activation confirmed in UI
      • Check logs: docker-compose logs -f n8n | grep "Workflow A"
    • Workflow B (Approval Execution) activated
    • Workflow C (KB Update) activated
    • All three workflows showing "Active" status
  • Launch Announcement

    • Internal team notified: "System is LIVE"
    • Stakeholders notified of go-live
    • Status page updated: "System operational"
    • Time of launch recorded: __________
  • Confirm System Accepting Requests

    • Send test email to Freescout inbox
    • Verify ticket created in Freescout
    • Verify n8n workflow triggered (check logs)
    • Verify workflow execution started

Phase 3: Launch Period Monitoring (First 24 Hours)

Timeline: T+0 to T+24 hours

Continuous Monitoring (Every 15 minutes)

  • n8n Workflow Execution

    • Command: docker-compose logs -f n8n | tail -50
    • Check for:
      • No error messages
      • Workflows executing successfully
      • No hung or stuck executions
    • Log location: /d/n8n-compose/logs/n8n.log
  • Freescout Integration

    • New tickets arriving in system
    • Custom fields populated correctly
    • No integration errors in Freescout logs
    • Ticket processing speed acceptable
  • Baramundi Job Queue

    • Check job queue status
    • Verify jobs accepted from n8n
    • Monitor job completion rate
    • Check for failed jobs
  • Alert System

    • All critical alerts functioning
    • No false positive alerts
    • Escalation procedures working
    • On-call team responsive
  • Database Performance

    • Query performance acceptable
    • No locks or deadlocks
    • Disk space usage normal
    • Command: docker-compose exec postgres pg_stat_statements

Hourly System Status Report (First 6 hours)

Document every hour:

Hour 1 (T+1h)

  • Total tickets processed: _____
  • Total workflows executed: _____
  • Failed executions: _____
  • System health: [ ] Green [ ] Yellow [ ] Red
  • Issues encountered: _____

Hour 2 (T+2h)

  • Total tickets processed: _____
  • Total workflows executed: _____
  • Failed executions: _____
  • System health: [ ] Green [ ] Yellow [ ] Red
  • Issues encountered: _____

Hour 3-6 (T+3h to T+6h)

  • Repeat above for each hour
  • Escalate any issues immediately
  • Document all changes or interventions

Functional Validation (T+2 hours and T+12 hours)

After 2 hours:

  • AI Suggestions Displayed

    • Sample processed tickets show AI suggestions
    • Suggestion accuracy acceptable
    • Custom field updated with ai_suggestion
    • Performance acceptable (< 5 second processing time)
  • Approval Workflow Operating

    • HIGH priority tickets flagged for approval
    • Approval custom field populated
    • Notifications sent to approvers
    • Approvals received and reflected in system
  • Knowledge Base Updates

    • KB articles being created/updated
    • Vector embeddings generated (Milvus)
    • PostgreSQL KB table growing
    • Query: SELECT COUNT(*) FROM audit.kb_articles;

After 12 hours (overnight validation):

  • Validate Overnight Processing
    • All workflows executed correctly overnight
    • No race conditions or deadlocks occurred
    • Database backups completed successfully
    • All alerts functioned as expected

Critical Metrics (Monitor Continuously)

# Check n8n workflow execution rate
curl -H "X-N8N-API-KEY: $N8N_API_KEY" \
  http://localhost:5678/api/v1/executions?limit=100 | jq '.executions | length'

# Check database growth
docker exec n8n-postgres psql -U n8n_user -d n8n_production -c \
  "SELECT schemaname, tablename, pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename))
   FROM pg_tables ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC LIMIT 10;"

# Monitor CPU/Memory
docker stats --no-stream

Incident Response (If Issues Occur)

Critical Issue (System Down):

  1. Immediately notify team lead
  2. Assess severity and scope
  3. Execute rollback if necessary (see DEPLOYMENT.md)
  4. Document incident details
  5. Begin root cause analysis

Performance Degradation:

  1. Check system resources: docker stats
  2. Check database locks: docker-compose logs postgres | grep LOCK
  3. Scale resources if needed: docker-compose up -d --scale n8n=2
  4. Monitor improvement

Integration Failures:

  1. Verify API credentials still valid
  2. Check external service status
  3. Review integration logs
  4. Test connectivity manually
  5. Retry or escalate

Phase 4: Post-24 Hour Validation (T+24 to T+7 days)

Day 2 Validation (T+24 hours)

  • Verify KI Suggestions Working

    • Sample 10 random processed tickets
    • AI suggestions present and relevant
    • Suggestion accuracy rate > 80%
    • Processing time < 5 seconds average
    • Document findings: ________
  • Approval Workflow Performance

    • All HIGH priority tickets flagged
    • Approval response time < 2 hours
    • Approval completion rate > 95%
    • No pending approvals > 4 hours old
    • Total approvals processed: _____
    • Approval success rate: _____%
  • Baramundi Integration Validation

    • Jobs submitted successfully
    • Job queue processing normally
    • Job completion rate > 90%
    • No stuck or failed jobs
    • Total jobs processed: _____
    • Job success rate: _____%
  • Knowledge Base Growth

    • KB articles being created
    • Vector embeddings calculated
    • Query performance acceptable
    • Total KB articles: _____
    • Total embeddings: _____
    • Query response time: _____ ms
  • System Stability

    • No service crashes
    • No memory leaks
    • Disk usage normal
    • Database integrity verified
    • No orphaned records

Day 7 Comprehensive Review (T+7 days)

  • Collect Statistics

    Email Processing:

    • Total emails processed: _____
    • Success rate: _____%
    • Average processing time: _____ seconds
    • Error rate: _____%

    AI Suggestions:

    • Total suggestions generated: _____
    • Acceptance rate: _____%
    • Average accuracy: _____%
    • Processing time p95: _____ seconds

    Approvals:

    • Total approval requests: _____
    • Total approvals completed: _____
    • Approval completion rate: _____%
    • Average response time: _____ minutes
    • HIGH priority count: _____

    Baramundi Jobs:

    • Total jobs submitted: _____
    • Total jobs completed: _____
    • Success rate: _____%
    • Failed jobs: _____

    Knowledge Base:

    • Total KB articles created: _____
    • Total articles updated: _____
    • Total searches: _____
    • Average search response: _____ ms
  • Performance Analysis

    • n8n CPU usage normal: _____ %
    • n8n Memory usage normal: _____ MB
    • PostgreSQL query time p95: _____ ms
    • Database size: _____ GB
    • Backup size: _____ GB
  • Team Feedback Collected

    • Operations team feedback: ________
    • Support team feedback: ________
    • End user feedback: ________
    • Issues encountered: ________
    • Improvement suggestions: ________
  • Issue Resolution Status

    • All critical issues resolved
    • All high priority issues resolved
    • Medium priority issues tracked
    • Minor issues documented for next release
    • Issue tracking document: __________

Go-Live Success Criteria - Final Sign-Off

All criteria must be met to declare go-live successful:

  • Stability (99% uptime minimum)

    • System remained operational for 7 consecutive days
    • Unplanned downtime < 14.4 minutes total
    • All services restarted cleanly without issues
  • Functionality (100% requirements met)

    • Mail processing working correctly
    • AI suggestions functional and accurate
    • Approval workflow operational
    • Baramundi job submission successful
    • KB updates functioning
  • Performance (Acceptable for workload)

    • Average email processing < 5 seconds
    • Average workflow execution < 10 seconds
    • Database queries < 1 second (p95)
    • No performance degradation observed
  • Data Integrity (100% accuracy)

    • All processed tickets correctly handled
    • No duplicate records
    • No data loss or corruption
    • Audit trail complete and accurate
  • Monitoring (All systems active)

    • Real-time dashboards operational
    • Alerts functioning correctly
    • Logs aggregated and searchable
    • Performance metrics recorded
  • Team Readiness (100% trained)

    • Operations team fully trained
    • Support team fully trained
    • All runbooks completed
    • On-call schedule established

Sign-Off By:

Project Manager: _________________ Date: _______

Operations Lead: _________________ Date: _______

Technical Lead: _________________ Date: _______


Ongoing Monitoring (Post Go-Live)

Daily Checks (First 30 Days)

  • Review system health dashboard
  • Check backup completion status
  • Review error logs for new issues
  • Verify workflow execution metrics
  • Check database growth rate
  • Monitor alert frequency and relevance

Weekly Checks (Ongoing)

  • Generate performance report
  • Review all system logs
  • Verify backup restore capability
  • Update documentation as needed
  • Team retrospective meeting
  • Plan for optimization improvements

Monthly Reviews (Ongoing)

  • Comprehensive system audit
  • Capacity planning review
  • Security assessment
  • Performance optimization review
  • Team training refresher (as needed)
  • Update escalation procedures

Contacts and Escalation

Primary Contacts

Project Manager:

  • Name: _____________________
  • Phone: _____________________
  • Email: _____________________

Technical Lead:

  • Name: _____________________
  • Phone: _____________________
  • Email: _____________________

On-Call Engineer:

  • Name: _____________________
  • Phone: _____________________
  • Email: _____________________

Escalation Matrix

Level 1 - Application Issue:

  • On-call engineer
  • Response time: 15 minutes

Level 2 - System Down:

  • Technical lead + On-call engineer
  • Response time: 5 minutes

Level 3 - Critical Data Loss:

  • Technical lead + Project manager + Database admin
  • Response time: Immediate