docs: deployment and go-live documentation
This commit is contained in:
400
docs/DEPLOYMENT.md
Normal file
400
docs/DEPLOYMENT.md
Normal file
@@ -0,0 +1,400 @@
|
|||||||
|
# Deployment Guide
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
|
||||||
|
Before deploying to production, ensure the following are available:
|
||||||
|
|
||||||
|
- Docker & Docker Compose (v20.0+)
|
||||||
|
- Freescout API Key (from Freescout instance admin panel)
|
||||||
|
- Baramundi API Access (with valid credentials)
|
||||||
|
- PostgreSQL Database Server or Container capability
|
||||||
|
- Git access to Gitea repository
|
||||||
|
- n8n instance access credentials
|
||||||
|
- System resources: 4GB RAM minimum, 20GB storage
|
||||||
|
|
||||||
|
## Architecture Overview
|
||||||
|
|
||||||
|
The deployment consists of:
|
||||||
|
- **n8n**: Workflow orchestration engine
|
||||||
|
- **PostgreSQL**: Database for audit logs and workflow state
|
||||||
|
- **Milvus**: Vector database for knowledge base embeddings
|
||||||
|
- **Freescout**: Helpdesk system (external integration)
|
||||||
|
- **Baramundi**: Asset management system (external integration)
|
||||||
|
|
||||||
|
## Deployment Steps
|
||||||
|
|
||||||
|
### 1. Clone Repository from Gitea
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git clone https://git.eks-intec.de/eksadmin/n8n-compose.git
|
||||||
|
cd n8n-compose
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Setup Environment Variables
|
||||||
|
|
||||||
|
Copy the example configuration and update with production values:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cp .env.example .env
|
||||||
|
```
|
||||||
|
|
||||||
|
Edit `.env` and configure the following:
|
||||||
|
|
||||||
|
```
|
||||||
|
# n8n Configuration
|
||||||
|
N8N_HOST=your-n8n-domain.com
|
||||||
|
N8N_PROTOCOL=https
|
||||||
|
N8N_PORT=5678
|
||||||
|
N8N_ENCRYPTION_KEY=your-secure-encryption-key
|
||||||
|
|
||||||
|
# PostgreSQL Configuration
|
||||||
|
POSTGRES_USER=n8n_user
|
||||||
|
POSTGRES_PASSWORD=your-secure-postgres-password
|
||||||
|
POSTGRES_DB=n8n_production
|
||||||
|
POSTGRES_HOST=postgres
|
||||||
|
|
||||||
|
# Freescout Integration
|
||||||
|
FREESCOUT_API_KEY=your-freescout-api-key
|
||||||
|
FREESCOUT_HOST=https://your-freescout-instance.com
|
||||||
|
FREESCOUT_ACCOUNT_ID=1
|
||||||
|
|
||||||
|
# Baramundi Integration
|
||||||
|
BARAMUNDI_API_KEY=your-baramundi-api-key
|
||||||
|
BARAMUNDI_HOST=https://your-baramundi-instance.com
|
||||||
|
BARAMUNDI_API_ENDPOINT=/api/v1
|
||||||
|
|
||||||
|
# Milvus Vector Database
|
||||||
|
MILVUS_HOST=milvus
|
||||||
|
MILVUS_PORT=19530
|
||||||
|
|
||||||
|
# Monitoring
|
||||||
|
LOG_LEVEL=info
|
||||||
|
NODE_ENV=production
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Prepare Database
|
||||||
|
|
||||||
|
Initialize the PostgreSQL audit schema:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker-compose up -d postgres
|
||||||
|
sleep 5
|
||||||
|
|
||||||
|
# Run audit schema initialization
|
||||||
|
docker exec -i n8n-postgres psql -U n8n_user -d n8n_production < sql/01-audit-schema.sql
|
||||||
|
```
|
||||||
|
|
||||||
|
Verify schema creation:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker exec n8n-postgres psql -U n8n_user -d n8n_production -c "\dt audit.*"
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4. Run Docker Services
|
||||||
|
|
||||||
|
Start all services:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker-compose up -d
|
||||||
|
```
|
||||||
|
|
||||||
|
Verify all services are running:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker-compose ps
|
||||||
|
```
|
||||||
|
|
||||||
|
Expected output:
|
||||||
|
```
|
||||||
|
NAME COMMAND STATUS
|
||||||
|
n8n n8n start Up (healthy)
|
||||||
|
postgres postgres Up (healthy)
|
||||||
|
milvus milvus server Up (healthy)
|
||||||
|
```
|
||||||
|
|
||||||
|
Wait for services to become healthy (typically 30-60 seconds):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker-compose logs -f
|
||||||
|
```
|
||||||
|
|
||||||
|
### 5. Setup Freescout Custom Fields
|
||||||
|
|
||||||
|
The Freescout integration requires custom fields for workflow state tracking. Run the setup script:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
bash scripts/setup-freescout-fields.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
This script creates:
|
||||||
|
- `ai_suggestion` field: AI-generated ticket suggestions
|
||||||
|
- `approval_status` field: Approval workflow state (pending/approved/rejected)
|
||||||
|
- `kb_reference` field: Knowledge base article references
|
||||||
|
|
||||||
|
Verify fields in Freescout Admin > Custom Fields.
|
||||||
|
|
||||||
|
### 6. Import n8n Workflows
|
||||||
|
|
||||||
|
Access n8n at `https://your-n8n-domain.com:5678`
|
||||||
|
|
||||||
|
#### Option A: Manual Import via UI
|
||||||
|
|
||||||
|
1. Open n8n Dashboard
|
||||||
|
2. Click "Import" button
|
||||||
|
3. Select workflow JSON file from `n8n-workflows/` directory
|
||||||
|
4. Confirm and save
|
||||||
|
|
||||||
|
#### Option B: Command Line Import
|
||||||
|
|
||||||
|
Workflows to import in order:
|
||||||
|
|
||||||
|
1. **Workflow A: Mail Processing**
|
||||||
|
```bash
|
||||||
|
curl -X POST http://localhost:5678/api/v1/workflows \
|
||||||
|
-H "X-N8N-API-KEY: $N8N_API_KEY" \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d @n8n-workflows/workflow-a-mail-processing.json
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Workflow B: Approval Execution**
|
||||||
|
```bash
|
||||||
|
curl -X POST http://localhost:5678/api/v1/workflows \
|
||||||
|
-H "X-N8N-API-KEY: $N8N_API_KEY" \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d @n8n-workflows/workflow-b-approval-execution.json
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Workflow C: Knowledge Base Update**
|
||||||
|
```bash
|
||||||
|
curl -X POST http://localhost:5678/api/v1/workflows \
|
||||||
|
-H "X-N8N-API-KEY: $N8N_API_KEY" \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d @n8n-workflows/workflow-c-kb-update.json
|
||||||
|
```
|
||||||
|
|
||||||
|
### 7. Run E2E Tests
|
||||||
|
|
||||||
|
Execute the comprehensive test suite:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
bash tests/curl-test-collection.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
Expected test results:
|
||||||
|
- Mail processing flow: PASS
|
||||||
|
- Approval workflow: PASS
|
||||||
|
- KB update cycle: PASS
|
||||||
|
- Baramundi integration: PASS
|
||||||
|
- Error handling: PASS
|
||||||
|
|
||||||
|
All tests must pass before proceeding to production activation.
|
||||||
|
|
||||||
|
### 8. Enable Workflows in Production
|
||||||
|
|
||||||
|
Once tests pass, activate all workflows:
|
||||||
|
|
||||||
|
1. Open n8n Dashboard
|
||||||
|
2. For each workflow (A, B, C):
|
||||||
|
- Click workflow
|
||||||
|
- Toggle "Active" switch ON
|
||||||
|
- Confirm activation
|
||||||
|
|
||||||
|
3. Verify in logs:
|
||||||
|
```bash
|
||||||
|
docker-compose logs -f n8n | grep "Workflow"
|
||||||
|
```
|
||||||
|
|
||||||
|
Expected log output:
|
||||||
|
```
|
||||||
|
n8n | [INFO] Workflow A (Mail Processing) activated
|
||||||
|
n8n | [INFO] Workflow B (Approval Execution) activated
|
||||||
|
n8n | [INFO] Workflow C (KB Update) activated
|
||||||
|
```
|
||||||
|
|
||||||
|
## Monitoring and Verification
|
||||||
|
|
||||||
|
### Health Checks
|
||||||
|
|
||||||
|
Monitor service health:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker-compose ps
|
||||||
|
docker-compose logs -f n8n
|
||||||
|
docker-compose logs -f postgres
|
||||||
|
```
|
||||||
|
|
||||||
|
### Database Verification
|
||||||
|
|
||||||
|
Check PostgreSQL audit tables:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker exec n8n-postgres psql -U n8n_user -d n8n_production -c \
|
||||||
|
"SELECT COUNT(*) FROM audit.workflow_executions;"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Workflow Execution Logs
|
||||||
|
|
||||||
|
View real-time workflow executions:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl -H "X-N8N-API-KEY: $N8N_API_KEY" \
|
||||||
|
http://localhost:5678/api/v1/executions?limit=10
|
||||||
|
```
|
||||||
|
|
||||||
|
For detailed monitoring setup, see [docs/MONITORING.md](MONITORING.md).
|
||||||
|
|
||||||
|
## Rollback Procedure
|
||||||
|
|
||||||
|
In case of critical issues, follow these rollback steps:
|
||||||
|
|
||||||
|
### Immediate Action (Within 5 minutes)
|
||||||
|
|
||||||
|
1. **Deactivate all workflows**
|
||||||
|
```bash
|
||||||
|
# Stop workflow execution
|
||||||
|
docker-compose exec -T n8n n8n stop --timeout 30
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Verify no ongoing operations**
|
||||||
|
```bash
|
||||||
|
docker-compose logs -f n8n | grep "execution"
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Notify stakeholders**
|
||||||
|
- Send alert to team
|
||||||
|
- Pause Freescout ticket processing
|
||||||
|
- Document incident time
|
||||||
|
|
||||||
|
### Rollback Steps (15-30 minutes)
|
||||||
|
|
||||||
|
1. **Option A: Revert to Last Known Good State**
|
||||||
|
```bash
|
||||||
|
# Stop services
|
||||||
|
docker-compose down
|
||||||
|
|
||||||
|
# Restore database from backup
|
||||||
|
docker-compose up -d postgres
|
||||||
|
sleep 10
|
||||||
|
psql -U n8n_user -d n8n_production < backups/pre-golive-backup.sql
|
||||||
|
|
||||||
|
# Start other services
|
||||||
|
docker-compose up -d n8n milvus
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Option B: Complete Reset (if corruption suspected)**
|
||||||
|
```bash
|
||||||
|
# Remove all containers and data
|
||||||
|
docker-compose down -v
|
||||||
|
|
||||||
|
# Restore from last full backup
|
||||||
|
docker-compose up -d postgres
|
||||||
|
sleep 10
|
||||||
|
psql -U n8n_user -d n8n_production < backups/full-restore.sql
|
||||||
|
|
||||||
|
# Restart all services
|
||||||
|
docker-compose up -d
|
||||||
|
|
||||||
|
# Re-import workflows
|
||||||
|
bash scripts/import-workflows.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Verify rollback success**
|
||||||
|
```bash
|
||||||
|
docker-compose ps
|
||||||
|
curl http://localhost:5678/api/v1/health
|
||||||
|
```
|
||||||
|
|
||||||
|
### Data Integrity Checks
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check audit logs consistency
|
||||||
|
docker exec n8n-postgres psql -U n8n_user -d n8n_production -c \
|
||||||
|
"SELECT status, COUNT(*) FROM audit.workflow_executions GROUP BY status;"
|
||||||
|
|
||||||
|
# Verify no orphaned records
|
||||||
|
docker exec n8n-postgres psql -U n8n_user -d n8n_production -c \
|
||||||
|
"SELECT COUNT(*) FROM audit.workflow_executions WHERE workflow_id NOT IN
|
||||||
|
(SELECT id FROM audit.workflows);"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Communication
|
||||||
|
|
||||||
|
After successful rollback:
|
||||||
|
1. Send all-clear notification
|
||||||
|
2. Schedule post-mortem meeting
|
||||||
|
3. Document root cause
|
||||||
|
4. Plan remediation
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### Service Startup Issues
|
||||||
|
|
||||||
|
**n8n won't start:**
|
||||||
|
```bash
|
||||||
|
docker-compose logs n8n
|
||||||
|
docker-compose up -d n8n --no-deps --build
|
||||||
|
```
|
||||||
|
|
||||||
|
**PostgreSQL connection fails:**
|
||||||
|
```bash
|
||||||
|
docker-compose exec postgres pg_isready
|
||||||
|
docker-compose logs postgres
|
||||||
|
```
|
||||||
|
|
||||||
|
**Milvus vector DB issues:**
|
||||||
|
```bash
|
||||||
|
docker-compose logs milvus
|
||||||
|
docker-compose restart milvus
|
||||||
|
```
|
||||||
|
|
||||||
|
### Integration Issues
|
||||||
|
|
||||||
|
**Freescout API authentication fails:**
|
||||||
|
- Verify API key in `.env`
|
||||||
|
- Check Freescout API endpoint accessibility
|
||||||
|
- Test with curl: `curl -H "Authorization: Bearer $FREESCOUT_API_KEY" https://freescout-host/api/v1/accounts`
|
||||||
|
|
||||||
|
**Baramundi connection problems:**
|
||||||
|
- Verify credentials and endpoint
|
||||||
|
- Check network connectivity
|
||||||
|
- Test API: `curl -H "Authorization: Bearer $BARAMUNDI_API_KEY" https://baramundi-host/api/v1/health`
|
||||||
|
|
||||||
|
### Performance Issues
|
||||||
|
|
||||||
|
**High CPU/Memory usage:**
|
||||||
|
```bash
|
||||||
|
docker stats
|
||||||
|
docker-compose logs -f n8n | grep "memory\|cpu"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Slow database queries:**
|
||||||
|
```bash
|
||||||
|
# Enable query logging
|
||||||
|
docker-compose exec postgres psql -U n8n_user -d n8n_production -c \
|
||||||
|
"ALTER SYSTEM SET log_statement = 'all';"
|
||||||
|
docker-compose restart postgres
|
||||||
|
```
|
||||||
|
|
||||||
|
## Support and Documentation
|
||||||
|
|
||||||
|
- n8n Documentation: https://docs.n8n.io
|
||||||
|
- Docker Compose Reference: https://docs.docker.com/compose
|
||||||
|
- PostgreSQL Admin Guide: https://www.postgresql.org/docs/current/admin.html
|
||||||
|
- Additional setup details: See [docs/ARCHITECTURE.md](ARCHITECTURE.md)
|
||||||
|
|
||||||
|
## Deployment Checklist
|
||||||
|
|
||||||
|
- [ ] All prerequisites installed and configured
|
||||||
|
- [ ] Repository cloned successfully
|
||||||
|
- [ ] `.env` file configured with production values
|
||||||
|
- [ ] PostgreSQL initialized with audit schema
|
||||||
|
- [ ] All Docker services running and healthy
|
||||||
|
- [ ] Freescout custom fields created
|
||||||
|
- [ ] All n8n workflows imported
|
||||||
|
- [ ] E2E test suite passed (100%)
|
||||||
|
- [ ] All workflows activated
|
||||||
|
- [ ] Monitoring configured and active
|
||||||
|
- [ ] Backup strategy in place
|
||||||
|
- [ ] Team trained on deployment and rollback
|
||||||
|
|
||||||
|
Once all items are checked, the system is ready for Go-Live.
|
||||||
514
docs/GO-LIVE-CHECKLIST.md
Normal file
514
docs/GO-LIVE-CHECKLIST.md
Normal file
@@ -0,0 +1,514 @@
|
|||||||
|
# Go-Live Checklist
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
This checklist ensures a smooth transition from staging to production. Follow all phases sequentially: Pre-Launch, Go-Live Day, Launch Period, and Post-Launch Monitoring.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 1: One Week Before Go-Live
|
||||||
|
|
||||||
|
Timeline: T-7 days
|
||||||
|
|
||||||
|
### Pre-Deployment Verification
|
||||||
|
|
||||||
|
- [ ] **E2E Tests Passed (100%)**
|
||||||
|
- All workflow tests successful: `bash tests/curl-test-collection.sh`
|
||||||
|
- No critical bugs or failures
|
||||||
|
- Test results documented
|
||||||
|
|
||||||
|
- [ ] **Staging Environment Verified**
|
||||||
|
- Deploy to staging identical to production
|
||||||
|
- Run full load test (simulate 100+ concurrent tickets)
|
||||||
|
- Verify integration with TEST Freescout account
|
||||||
|
- Verify integration with TEST Baramundi account
|
||||||
|
- All workflows processing correctly in staging
|
||||||
|
|
||||||
|
- [ ] **Production Database Setup**
|
||||||
|
- PostgreSQL audit schema initialized
|
||||||
|
- Backup strategy configured and tested
|
||||||
|
- Database performance baseline recorded
|
||||||
|
- Disk space verified (minimum 20GB available)
|
||||||
|
|
||||||
|
- [ ] **API Credentials Verified**
|
||||||
|
- Freescout API key tested and active
|
||||||
|
- Freescout custom fields created and working
|
||||||
|
- Baramundi API key tested and active
|
||||||
|
- n8n encryption key generated and secure
|
||||||
|
|
||||||
|
### Team Readiness
|
||||||
|
|
||||||
|
- [ ] **Team Training Completed**
|
||||||
|
- Operations team trained on:
|
||||||
|
- System architecture overview
|
||||||
|
- Deployment and rollback procedures
|
||||||
|
- Monitoring dashboard usage
|
||||||
|
- Alert response procedures
|
||||||
|
- Escalation paths
|
||||||
|
- Support team trained on:
|
||||||
|
- Workflow functionality overview
|
||||||
|
- Expected behavior and timing
|
||||||
|
- How to verify system health
|
||||||
|
- When to escalate issues
|
||||||
|
|
||||||
|
- [ ] **Documentation Review**
|
||||||
|
- All team members reviewed DEPLOYMENT.md
|
||||||
|
- All team members reviewed MONITORING.md
|
||||||
|
- Runbooks reviewed and acknowledged
|
||||||
|
- Contact list updated (on-call schedule)
|
||||||
|
|
||||||
|
- [ ] **Backup Strategy Finalized**
|
||||||
|
- Daily backup schedule defined
|
||||||
|
- Backup retention policy set (7 days minimum)
|
||||||
|
- Backup restore procedure tested
|
||||||
|
- Backup storage verified (separate location from production)
|
||||||
|
|
||||||
|
### Risk Mitigation
|
||||||
|
|
||||||
|
- [ ] **Rollback Plan Confirmed**
|
||||||
|
- Rollback procedures documented
|
||||||
|
- Rollback tested in staging environment
|
||||||
|
- Estimated rollback time: < 30 minutes
|
||||||
|
- All team members trained on rollback
|
||||||
|
|
||||||
|
- [ ] **Communication Plan Ready**
|
||||||
|
- Stakeholder notification list prepared
|
||||||
|
- Status page update process defined
|
||||||
|
- Internal update frequency established (30min intervals initially)
|
||||||
|
- Escalation contacts verified
|
||||||
|
|
||||||
|
- [ ] **Monitoring & Alerting**
|
||||||
|
- All monitoring dashboards configured
|
||||||
|
- Alert recipients confirmed
|
||||||
|
- Alert thresholds set and validated
|
||||||
|
- On-call rotation established
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 2: Go-Live Day
|
||||||
|
|
||||||
|
Timeline: T-0 (Launch day)
|
||||||
|
|
||||||
|
### Pre-Launch Checks (T-2 hours)
|
||||||
|
|
||||||
|
- [ ] **Final System Status**
|
||||||
|
- All Docker services running and healthy
|
||||||
|
- `docker-compose ps` output verified
|
||||||
|
- All services show "Up (healthy)"
|
||||||
|
- No services in "Restarting" state
|
||||||
|
|
||||||
|
- [ ] **Service Health Verification**
|
||||||
|
- n8n health check: `curl http://localhost:5678/api/v1/health`
|
||||||
|
- PostgreSQL connection: `docker-compose exec postgres pg_isready`
|
||||||
|
- Milvus connectivity: Vector DB responding
|
||||||
|
- External integrations reachable (Freescout, Baramundi)
|
||||||
|
|
||||||
|
- [ ] **Database Integrity**
|
||||||
|
- Audit schema verified: `SELECT COUNT(*) FROM audit.workflows;`
|
||||||
|
- No corruption or errors in logs
|
||||||
|
- Backup created and verified: `ls -lh backups/`
|
||||||
|
- Backup restore tested
|
||||||
|
|
||||||
|
- [ ] **n8n Workflows Status**
|
||||||
|
- All 3 workflows imported successfully
|
||||||
|
- Workflow A (Mail Processing): Ready
|
||||||
|
- Workflow B (Approval Execution): Ready
|
||||||
|
- Workflow C (KB Update): Ready
|
||||||
|
- All workflows set to Inactive (will activate after final check)
|
||||||
|
|
||||||
|
- [ ] **Monitoring System Active**
|
||||||
|
- Monitoring dashboard accessible
|
||||||
|
- All metric collectors running
|
||||||
|
- Alert system armed and tested
|
||||||
|
- Log aggregation working (docker-compose logs verified)
|
||||||
|
|
||||||
|
- [ ] **Final Pre-Launch Meeting**
|
||||||
|
- All team members present and ready
|
||||||
|
- Roles and responsibilities confirmed:
|
||||||
|
- Platform Lead: Overall coordination
|
||||||
|
- n8n Administrator: Workflow management
|
||||||
|
- Database Administrator: Database monitoring
|
||||||
|
- System Administrator: Infrastructure monitoring
|
||||||
|
- Support Lead: User support readiness
|
||||||
|
- Communication channels verified (Slack, phone, etc.)
|
||||||
|
|
||||||
|
### Launch Window (T-0 hours)
|
||||||
|
|
||||||
|
- [ ] **Final Backup (T-15 minutes)**
|
||||||
|
- Backup created immediately before activation
|
||||||
|
- Backup file verified and tested
|
||||||
|
- Backup location: `backups/pre-golive-backup-$(date +%Y%m%d-%H%M%S).sql`
|
||||||
|
|
||||||
|
- [ ] **Activate Workflows (T-0 minutes)**
|
||||||
|
- n8n Dashboard accessed
|
||||||
|
- Workflow A (Mail Processing) activated:
|
||||||
|
- Toggle "Active" switch ON
|
||||||
|
- Verify activation confirmed in UI
|
||||||
|
- Check logs: `docker-compose logs -f n8n | grep "Workflow A"`
|
||||||
|
- Workflow B (Approval Execution) activated
|
||||||
|
- Workflow C (KB Update) activated
|
||||||
|
- All three workflows showing "Active" status
|
||||||
|
|
||||||
|
- [ ] **Launch Announcement**
|
||||||
|
- Internal team notified: "System is LIVE"
|
||||||
|
- Stakeholders notified of go-live
|
||||||
|
- Status page updated: "System operational"
|
||||||
|
- Time of launch recorded: __________
|
||||||
|
|
||||||
|
- [ ] **Confirm System Accepting Requests**
|
||||||
|
- Send test email to Freescout inbox
|
||||||
|
- Verify ticket created in Freescout
|
||||||
|
- Verify n8n workflow triggered (check logs)
|
||||||
|
- Verify workflow execution started
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 3: Launch Period Monitoring (First 24 Hours)
|
||||||
|
|
||||||
|
Timeline: T+0 to T+24 hours
|
||||||
|
|
||||||
|
### Continuous Monitoring (Every 15 minutes)
|
||||||
|
|
||||||
|
- [ ] **n8n Workflow Execution**
|
||||||
|
- Command: `docker-compose logs -f n8n | tail -50`
|
||||||
|
- Check for:
|
||||||
|
- No error messages
|
||||||
|
- Workflows executing successfully
|
||||||
|
- No hung or stuck executions
|
||||||
|
- Log location: `/d/n8n-compose/logs/n8n.log`
|
||||||
|
|
||||||
|
- [ ] **Freescout Integration**
|
||||||
|
- New tickets arriving in system
|
||||||
|
- Custom fields populated correctly
|
||||||
|
- No integration errors in Freescout logs
|
||||||
|
- Ticket processing speed acceptable
|
||||||
|
|
||||||
|
- [ ] **Baramundi Job Queue**
|
||||||
|
- Check job queue status
|
||||||
|
- Verify jobs accepted from n8n
|
||||||
|
- Monitor job completion rate
|
||||||
|
- Check for failed jobs
|
||||||
|
|
||||||
|
- [ ] **Alert System**
|
||||||
|
- All critical alerts functioning
|
||||||
|
- No false positive alerts
|
||||||
|
- Escalation procedures working
|
||||||
|
- On-call team responsive
|
||||||
|
|
||||||
|
- [ ] **Database Performance**
|
||||||
|
- Query performance acceptable
|
||||||
|
- No locks or deadlocks
|
||||||
|
- Disk space usage normal
|
||||||
|
- Command: `docker-compose exec postgres pg_stat_statements`
|
||||||
|
|
||||||
|
### Hourly System Status Report (First 6 hours)
|
||||||
|
|
||||||
|
Document every hour:
|
||||||
|
|
||||||
|
**Hour 1 (T+1h)**
|
||||||
|
- Total tickets processed: _____
|
||||||
|
- Total workflows executed: _____
|
||||||
|
- Failed executions: _____
|
||||||
|
- System health: [ ] Green [ ] Yellow [ ] Red
|
||||||
|
- Issues encountered: _____
|
||||||
|
|
||||||
|
**Hour 2 (T+2h)**
|
||||||
|
- Total tickets processed: _____
|
||||||
|
- Total workflows executed: _____
|
||||||
|
- Failed executions: _____
|
||||||
|
- System health: [ ] Green [ ] Yellow [ ] Red
|
||||||
|
- Issues encountered: _____
|
||||||
|
|
||||||
|
**Hour 3-6 (T+3h to T+6h)**
|
||||||
|
- Repeat above for each hour
|
||||||
|
- Escalate any issues immediately
|
||||||
|
- Document all changes or interventions
|
||||||
|
|
||||||
|
### Functional Validation (T+2 hours and T+12 hours)
|
||||||
|
|
||||||
|
**After 2 hours:**
|
||||||
|
- [ ] **AI Suggestions Displayed**
|
||||||
|
- Sample processed tickets show AI suggestions
|
||||||
|
- Suggestion accuracy acceptable
|
||||||
|
- Custom field updated with ai_suggestion
|
||||||
|
- Performance acceptable (< 5 second processing time)
|
||||||
|
|
||||||
|
- [ ] **Approval Workflow Operating**
|
||||||
|
- HIGH priority tickets flagged for approval
|
||||||
|
- Approval custom field populated
|
||||||
|
- Notifications sent to approvers
|
||||||
|
- Approvals received and reflected in system
|
||||||
|
|
||||||
|
- [ ] **Knowledge Base Updates**
|
||||||
|
- KB articles being created/updated
|
||||||
|
- Vector embeddings generated (Milvus)
|
||||||
|
- PostgreSQL KB table growing
|
||||||
|
- Query: `SELECT COUNT(*) FROM audit.kb_articles;`
|
||||||
|
|
||||||
|
**After 12 hours (overnight validation):**
|
||||||
|
- [ ] **Validate Overnight Processing**
|
||||||
|
- All workflows executed correctly overnight
|
||||||
|
- No race conditions or deadlocks occurred
|
||||||
|
- Database backups completed successfully
|
||||||
|
- All alerts functioned as expected
|
||||||
|
|
||||||
|
### Critical Metrics (Monitor Continuously)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check n8n workflow execution rate
|
||||||
|
curl -H "X-N8N-API-KEY: $N8N_API_KEY" \
|
||||||
|
http://localhost:5678/api/v1/executions?limit=100 | jq '.executions | length'
|
||||||
|
|
||||||
|
# Check database growth
|
||||||
|
docker exec n8n-postgres psql -U n8n_user -d n8n_production -c \
|
||||||
|
"SELECT schemaname, tablename, pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename))
|
||||||
|
FROM pg_tables ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC LIMIT 10;"
|
||||||
|
|
||||||
|
# Monitor CPU/Memory
|
||||||
|
docker stats --no-stream
|
||||||
|
```
|
||||||
|
|
||||||
|
### Incident Response (If Issues Occur)
|
||||||
|
|
||||||
|
**Critical Issue (System Down):**
|
||||||
|
1. Immediately notify team lead
|
||||||
|
2. Assess severity and scope
|
||||||
|
3. Execute rollback if necessary (see DEPLOYMENT.md)
|
||||||
|
4. Document incident details
|
||||||
|
5. Begin root cause analysis
|
||||||
|
|
||||||
|
**Performance Degradation:**
|
||||||
|
1. Check system resources: `docker stats`
|
||||||
|
2. Check database locks: `docker-compose logs postgres | grep LOCK`
|
||||||
|
3. Scale resources if needed: `docker-compose up -d --scale n8n=2`
|
||||||
|
4. Monitor improvement
|
||||||
|
|
||||||
|
**Integration Failures:**
|
||||||
|
1. Verify API credentials still valid
|
||||||
|
2. Check external service status
|
||||||
|
3. Review integration logs
|
||||||
|
4. Test connectivity manually
|
||||||
|
5. Retry or escalate
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 4: Post-24 Hour Validation (T+24 to T+7 days)
|
||||||
|
|
||||||
|
### Day 2 Validation (T+24 hours)
|
||||||
|
|
||||||
|
- [ ] **Verify KI Suggestions Working**
|
||||||
|
- Sample 10 random processed tickets
|
||||||
|
- AI suggestions present and relevant
|
||||||
|
- Suggestion accuracy rate > 80%
|
||||||
|
- Processing time < 5 seconds average
|
||||||
|
- Document findings: ________
|
||||||
|
|
||||||
|
- [ ] **Approval Workflow Performance**
|
||||||
|
- [ ] All HIGH priority tickets flagged
|
||||||
|
- [ ] Approval response time < 2 hours
|
||||||
|
- [ ] Approval completion rate > 95%
|
||||||
|
- [ ] No pending approvals > 4 hours old
|
||||||
|
- Total approvals processed: _____
|
||||||
|
- Approval success rate: _____%
|
||||||
|
|
||||||
|
- [ ] **Baramundi Integration Validation**
|
||||||
|
- [ ] Jobs submitted successfully
|
||||||
|
- [ ] Job queue processing normally
|
||||||
|
- [ ] Job completion rate > 90%
|
||||||
|
- [ ] No stuck or failed jobs
|
||||||
|
- Total jobs processed: _____
|
||||||
|
- Job success rate: _____%
|
||||||
|
|
||||||
|
- [ ] **Knowledge Base Growth**
|
||||||
|
- [ ] KB articles being created
|
||||||
|
- [ ] Vector embeddings calculated
|
||||||
|
- [ ] Query performance acceptable
|
||||||
|
- Total KB articles: _____
|
||||||
|
- Total embeddings: _____
|
||||||
|
- Query response time: _____ ms
|
||||||
|
|
||||||
|
- [ ] **System Stability**
|
||||||
|
- [ ] No service crashes
|
||||||
|
- [ ] No memory leaks
|
||||||
|
- [ ] Disk usage normal
|
||||||
|
- [ ] Database integrity verified
|
||||||
|
- [ ] No orphaned records
|
||||||
|
|
||||||
|
### Day 7 Comprehensive Review (T+7 days)
|
||||||
|
|
||||||
|
- [ ] **Collect Statistics**
|
||||||
|
|
||||||
|
**Email Processing:**
|
||||||
|
- Total emails processed: _____
|
||||||
|
- Success rate: _____%
|
||||||
|
- Average processing time: _____ seconds
|
||||||
|
- Error rate: _____%
|
||||||
|
|
||||||
|
**AI Suggestions:**
|
||||||
|
- Total suggestions generated: _____
|
||||||
|
- Acceptance rate: _____%
|
||||||
|
- Average accuracy: _____%
|
||||||
|
- Processing time p95: _____ seconds
|
||||||
|
|
||||||
|
**Approvals:**
|
||||||
|
- Total approval requests: _____
|
||||||
|
- Total approvals completed: _____
|
||||||
|
- Approval completion rate: _____%
|
||||||
|
- Average response time: _____ minutes
|
||||||
|
- HIGH priority count: _____
|
||||||
|
|
||||||
|
**Baramundi Jobs:**
|
||||||
|
- Total jobs submitted: _____
|
||||||
|
- Total jobs completed: _____
|
||||||
|
- Success rate: _____%
|
||||||
|
- Failed jobs: _____
|
||||||
|
|
||||||
|
**Knowledge Base:**
|
||||||
|
- Total KB articles created: _____
|
||||||
|
- Total articles updated: _____
|
||||||
|
- Total searches: _____
|
||||||
|
- Average search response: _____ ms
|
||||||
|
|
||||||
|
- [ ] **Performance Analysis**
|
||||||
|
- [ ] n8n CPU usage normal: _____ %
|
||||||
|
- [ ] n8n Memory usage normal: _____ MB
|
||||||
|
- [ ] PostgreSQL query time p95: _____ ms
|
||||||
|
- [ ] Database size: _____ GB
|
||||||
|
- [ ] Backup size: _____ GB
|
||||||
|
|
||||||
|
- [ ] **Team Feedback Collected**
|
||||||
|
- [ ] Operations team feedback: ________
|
||||||
|
- [ ] Support team feedback: ________
|
||||||
|
- [ ] End user feedback: ________
|
||||||
|
- [ ] Issues encountered: ________
|
||||||
|
- [ ] Improvement suggestions: ________
|
||||||
|
|
||||||
|
- [ ] **Issue Resolution Status**
|
||||||
|
- [ ] All critical issues resolved
|
||||||
|
- [ ] All high priority issues resolved
|
||||||
|
- [ ] Medium priority issues tracked
|
||||||
|
- [ ] Minor issues documented for next release
|
||||||
|
- Issue tracking document: __________
|
||||||
|
|
||||||
|
### Go-Live Success Criteria - Final Sign-Off
|
||||||
|
|
||||||
|
All criteria must be met to declare go-live successful:
|
||||||
|
|
||||||
|
- [ ] **Stability (99% uptime minimum)**
|
||||||
|
- System remained operational for 7 consecutive days
|
||||||
|
- Unplanned downtime < 14.4 minutes total
|
||||||
|
- All services restarted cleanly without issues
|
||||||
|
|
||||||
|
- [ ] **Functionality (100% requirements met)**
|
||||||
|
- Mail processing working correctly
|
||||||
|
- AI suggestions functional and accurate
|
||||||
|
- Approval workflow operational
|
||||||
|
- Baramundi job submission successful
|
||||||
|
- KB updates functioning
|
||||||
|
|
||||||
|
- [ ] **Performance (Acceptable for workload)**
|
||||||
|
- Average email processing < 5 seconds
|
||||||
|
- Average workflow execution < 10 seconds
|
||||||
|
- Database queries < 1 second (p95)
|
||||||
|
- No performance degradation observed
|
||||||
|
|
||||||
|
- [ ] **Data Integrity (100% accuracy)**
|
||||||
|
- All processed tickets correctly handled
|
||||||
|
- No duplicate records
|
||||||
|
- No data loss or corruption
|
||||||
|
- Audit trail complete and accurate
|
||||||
|
|
||||||
|
- [ ] **Monitoring (All systems active)**
|
||||||
|
- Real-time dashboards operational
|
||||||
|
- Alerts functioning correctly
|
||||||
|
- Logs aggregated and searchable
|
||||||
|
- Performance metrics recorded
|
||||||
|
|
||||||
|
- [ ] **Team Readiness (100% trained)**
|
||||||
|
- Operations team fully trained
|
||||||
|
- Support team fully trained
|
||||||
|
- All runbooks completed
|
||||||
|
- On-call schedule established
|
||||||
|
|
||||||
|
**Sign-Off By:**
|
||||||
|
|
||||||
|
Project Manager: _________________ Date: _______
|
||||||
|
|
||||||
|
Operations Lead: _________________ Date: _______
|
||||||
|
|
||||||
|
Technical Lead: _________________ Date: _______
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Ongoing Monitoring (Post Go-Live)
|
||||||
|
|
||||||
|
### Daily Checks (First 30 Days)
|
||||||
|
|
||||||
|
- [ ] Review system health dashboard
|
||||||
|
- [ ] Check backup completion status
|
||||||
|
- [ ] Review error logs for new issues
|
||||||
|
- [ ] Verify workflow execution metrics
|
||||||
|
- [ ] Check database growth rate
|
||||||
|
- [ ] Monitor alert frequency and relevance
|
||||||
|
|
||||||
|
### Weekly Checks (Ongoing)
|
||||||
|
|
||||||
|
- [ ] Generate performance report
|
||||||
|
- [ ] Review all system logs
|
||||||
|
- [ ] Verify backup restore capability
|
||||||
|
- [ ] Update documentation as needed
|
||||||
|
- [ ] Team retrospective meeting
|
||||||
|
- [ ] Plan for optimization improvements
|
||||||
|
|
||||||
|
### Monthly Reviews (Ongoing)
|
||||||
|
|
||||||
|
- [ ] Comprehensive system audit
|
||||||
|
- [ ] Capacity planning review
|
||||||
|
- [ ] Security assessment
|
||||||
|
- [ ] Performance optimization review
|
||||||
|
- [ ] Team training refresher (as needed)
|
||||||
|
- [ ] Update escalation procedures
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Contacts and Escalation
|
||||||
|
|
||||||
|
### Primary Contacts
|
||||||
|
|
||||||
|
**Project Manager:**
|
||||||
|
- Name: _____________________
|
||||||
|
- Phone: _____________________
|
||||||
|
- Email: _____________________
|
||||||
|
|
||||||
|
**Technical Lead:**
|
||||||
|
- Name: _____________________
|
||||||
|
- Phone: _____________________
|
||||||
|
- Email: _____________________
|
||||||
|
|
||||||
|
**On-Call Engineer:**
|
||||||
|
- Name: _____________________
|
||||||
|
- Phone: _____________________
|
||||||
|
- Email: _____________________
|
||||||
|
|
||||||
|
### Escalation Matrix
|
||||||
|
|
||||||
|
**Level 1 - Application Issue:**
|
||||||
|
- On-call engineer
|
||||||
|
- Response time: 15 minutes
|
||||||
|
|
||||||
|
**Level 2 - System Down:**
|
||||||
|
- Technical lead + On-call engineer
|
||||||
|
- Response time: 5 minutes
|
||||||
|
|
||||||
|
**Level 3 - Critical Data Loss:**
|
||||||
|
- Technical lead + Project manager + Database admin
|
||||||
|
- Response time: Immediate
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Related Documentation
|
||||||
|
|
||||||
|
- [DEPLOYMENT.md](DEPLOYMENT.md) - Deployment procedures and rollback
|
||||||
|
- [MONITORING.md](MONITORING.md) - Monitoring dashboard and alerts
|
||||||
|
- [ARCHITECTURE.md](ARCHITECTURE.md) - System architecture details
|
||||||
|
- [TROUBLESHOOTING.md](TROUBLESHOOTING.md) - Common issues and solutions
|
||||||
Reference in New Issue
Block a user