456 lines
12 KiB
Markdown
456 lines
12 KiB
Markdown
# Monitoring & Logging Setup
|
|
|
|
## Overview
|
|
|
|
This document provides comprehensive monitoring and logging guidelines for the n8n AI Support Automation system. It includes key metrics, troubleshooting procedures, and log inspection commands.
|
|
|
|
## Key Metrics
|
|
|
|
### 1. Mail Processing Rate (Workflow A)
|
|
|
|
**Description:** Track the number of conversations processed through the system.
|
|
|
|
**N8N Logs:**
|
|
```bash
|
|
docker-compose logs -f n8n | grep "processed"
|
|
```
|
|
|
|
**PostgreSQL Query:**
|
|
```sql
|
|
SELECT COUNT(*) as total_executions,
|
|
COUNT(CASE WHEN status = 'success' THEN 1 END) as successful_executions,
|
|
ROUND(100.0 * COUNT(CASE WHEN status = 'success' THEN 1 END) / COUNT(*), 2) as success_rate
|
|
FROM workflow_executions
|
|
WHERE workflow_name = 'workflow-a';
|
|
```
|
|
|
|
**Expected Behavior:**
|
|
- Consistent processing rate (depends on Freescout mail polling interval)
|
|
- Success rate > 95%
|
|
- Monitor for sudden drops in processing rate
|
|
|
|
---
|
|
|
|
### 2. Approval Rate (Workflow B)
|
|
|
|
**Description:** Monitor the ratio of approved vs rejected KB updates from the AI suggestions.
|
|
|
|
**PostgreSQL Query:**
|
|
```sql
|
|
SELECT status, COUNT(*) as count,
|
|
ROUND(100.0 * COUNT(*) / SUM(COUNT(*)) OVER (), 2) as percentage
|
|
FROM knowledge_base_updates
|
|
GROUP BY status
|
|
ORDER BY count DESC;
|
|
```
|
|
|
|
**Alternative Query for detailed breakdown:**
|
|
```sql
|
|
SELECT
|
|
status,
|
|
COUNT(*) as count,
|
|
AVG(EXTRACT(EPOCH FROM (updated_at - created_at))) as avg_approval_time_seconds
|
|
FROM knowledge_base_updates
|
|
GROUP BY status;
|
|
```
|
|
|
|
**Expected Behavior:**
|
|
- Majority of updates should be APPROVED (typically 70-90%)
|
|
- REJECTED rate should be < 15%
|
|
- PENDING updates should be resolved within 24 hours
|
|
|
|
---
|
|
|
|
### 3. KB Growth (Workflow C)
|
|
|
|
**Description:** Track the growth of the knowledge base as new information is added.
|
|
|
|
**Milvus Query:**
|
|
```bash
|
|
# First, connect to Milvus
|
|
docker-compose exec milvus python3 -c "
|
|
from pymilvus import connections, Collection
|
|
|
|
connections.connect('default', host='localhost', port=19530)
|
|
collection = Collection('knowledge_base')
|
|
print(f'Total vectors: {collection.num_entities}')
|
|
"
|
|
```
|
|
|
|
**PostgreSQL Query for tracking:**
|
|
```sql
|
|
SELECT COUNT(*) as total_entries,
|
|
COUNT(DISTINCT source) as unique_sources,
|
|
MAX(created_at) as latest_entry
|
|
FROM knowledge_base
|
|
WHERE status = 'approved';
|
|
```
|
|
|
|
**Daily Growth Query:**
|
|
```sql
|
|
SELECT DATE(created_at) as date, COUNT(*) as entries_added
|
|
FROM knowledge_base
|
|
WHERE status = 'approved'
|
|
GROUP BY DATE(created_at)
|
|
ORDER BY date DESC
|
|
LIMIT 30;
|
|
```
|
|
|
|
**Expected Behavior:**
|
|
- +1 vector per approved ticket (approximately)
|
|
- Steady growth correlates with approved KB updates
|
|
- Monitor for stalled growth (may indicate Milvus issues)
|
|
|
|
---
|
|
|
|
### 4. Error Rate
|
|
|
|
**Description:** Monitor workflow execution errors across all workflows.
|
|
|
|
**PostgreSQL Query - Overall Error Rate:**
|
|
```sql
|
|
SELECT
|
|
COUNT(*) as total_executions,
|
|
COUNT(CASE WHEN status = 'ERROR' THEN 1 END) as error_count,
|
|
ROUND(100.0 * COUNT(CASE WHEN status = 'ERROR' THEN 1 END) / COUNT(*), 2) as error_percentage
|
|
FROM workflow_executions;
|
|
```
|
|
|
|
**Detailed Error Analysis:**
|
|
```sql
|
|
SELECT
|
|
workflow_name,
|
|
status,
|
|
COUNT(*) as count,
|
|
ROUND(100.0 * COUNT(*) / SUM(COUNT(*)) OVER (PARTITION BY workflow_name), 2) as percentage
|
|
FROM workflow_executions
|
|
GROUP BY workflow_name, status
|
|
ORDER BY workflow_name, error_count DESC;
|
|
```
|
|
|
|
**Error Details for Investigation:**
|
|
```sql
|
|
SELECT
|
|
workflow_name,
|
|
status,
|
|
error_message,
|
|
COUNT(*) as occurrences,
|
|
MAX(executed_at) as latest_error
|
|
FROM workflow_executions
|
|
WHERE status = 'ERROR'
|
|
GROUP BY workflow_name, status, error_message
|
|
ORDER BY occurrences DESC;
|
|
```
|
|
|
|
**Expected Behavior:**
|
|
- Error rate < 5%
|
|
- No recurring errors (indicates systemic issue)
|
|
- Quick recovery from transient errors
|
|
|
|
---
|
|
|
|
## Troubleshooting Guide
|
|
|
|
### Workflow A (Mail Processing) - Not Running
|
|
|
|
**Symptoms:**
|
|
- No new conversations being processed
|
|
- N8N logs show no activity
|
|
- PostgreSQL query returns unchanged row count
|
|
|
|
**Troubleshooting Steps:**
|
|
|
|
1. **Check if workflow trigger is active:**
|
|
```bash
|
|
docker-compose logs -f n8n | grep "workflow-a"
|
|
```
|
|
|
|
2. **Verify Cron trigger configuration:**
|
|
- Log into n8n UI at `https://<SUBDOMAIN>.<DOMAIN>`
|
|
- Navigate to workflow-a
|
|
- Check cron expression (typically: `0 */5 * * * *` for every 5 minutes)
|
|
- Verify "Active" toggle is ON
|
|
|
|
3. **Test Freescout API credentials:**
|
|
```bash
|
|
docker-compose exec n8n curl -X GET \
|
|
-H "Authorization: Bearer ${FREESCOUT_API_TOKEN}" \
|
|
https://<freescout-instance>/api/v1/conversations
|
|
```
|
|
|
|
4. **Check Freescout API reachability:**
|
|
```bash
|
|
docker-compose exec n8n ping <freescout-instance>
|
|
docker-compose exec n8n curl -I https://<freescout-instance>/api/v1/health
|
|
```
|
|
|
|
5. **Review n8n logs for errors:**
|
|
```bash
|
|
docker-compose logs n8n | grep -i "error\|exception" | tail -20
|
|
```
|
|
|
|
6. **Verify PostgreSQL connection:**
|
|
```bash
|
|
docker-compose logs n8n | grep -i "database\|postgres"
|
|
```
|
|
|
|
---
|
|
|
|
### Workflow B (AI Suggestions) - Not Triggering
|
|
|
|
**Symptoms:**
|
|
- No new AI suggestions in Freescout
|
|
- workflow_executions table shows no recent B entries
|
|
- knowledge_base_updates status stuck in PENDING
|
|
|
|
**Troubleshooting Steps:**
|
|
|
|
1. **Check if Freescout custom field is being updated:**
|
|
```sql
|
|
SELECT * FROM freescout_conversation_custom_fields
|
|
WHERE field_name = 'AI_SUGGESTION_STATUS'
|
|
ORDER BY updated_at DESC
|
|
LIMIT 10;
|
|
```
|
|
|
|
2. **Verify polling interval:**
|
|
- Check n8n workflow B settings
|
|
- Polling trigger should be running (typically every 1 minute)
|
|
- Confirm: `docker-compose logs n8n | grep -i "polling\|workflow-b"`
|
|
|
|
3. **Check webhook configuration:**
|
|
```bash
|
|
# If using webhook instead of polling
|
|
docker-compose logs -f n8n | grep -i "webhook"
|
|
```
|
|
|
|
4. **Review Freescout API response:**
|
|
```bash
|
|
docker-compose exec postgres psql -U kb_user -d n8n_kb -c \
|
|
"SELECT * FROM api_logs WHERE endpoint LIKE '%conversation%' ORDER BY timestamp DESC LIMIT 5;"
|
|
```
|
|
|
|
5. **Verify OpenAI/AI provider connectivity:**
|
|
```bash
|
|
docker-compose logs n8n | grep -i "openai\|api\|llm" | tail -20
|
|
```
|
|
|
|
6. **Check if there are unprocessed conversations:**
|
|
```sql
|
|
SELECT COUNT(*) as pending_conversations
|
|
FROM workflow_executions
|
|
WHERE workflow_name = 'workflow-a'
|
|
AND status = 'success'
|
|
AND ai_suggestion_generated = false
|
|
AND created_at > NOW() - INTERVAL '1 hour';
|
|
```
|
|
|
|
---
|
|
|
|
### Workflow C (KB Storage) - Not Saving to Milvus
|
|
|
|
**Symptoms:**
|
|
- knowledge_base table updates but Milvus count doesn't increase
|
|
- KB search returns no results
|
|
- Milvus health check failures
|
|
|
|
**Troubleshooting Steps:**
|
|
|
|
1. **Check Milvus health status:**
|
|
```bash
|
|
docker-compose exec milvus curl -s http://localhost:9091/healthz | jq .
|
|
```
|
|
|
|
2. **Verify Milvus is running:**
|
|
```bash
|
|
docker-compose ps milvus
|
|
docker-compose logs milvus | tail -30
|
|
```
|
|
|
|
3. **Check if embeddings are being generated:**
|
|
```sql
|
|
SELECT COUNT(*) as embeddings_generated
|
|
FROM knowledge_base
|
|
WHERE embedding IS NOT NULL;
|
|
```
|
|
|
|
4. **Verify Milvus connection in n8n logs:**
|
|
```bash
|
|
docker-compose logs n8n | grep -i "milvus\|embedding" | tail -20
|
|
```
|
|
|
|
5. **Test Milvus directly:**
|
|
```bash
|
|
docker-compose exec milvus python3 << 'EOF'
|
|
from pymilvus import connections, Collection
|
|
connections.connect('default', host='localhost', port=19530)
|
|
try:
|
|
collection = Collection('knowledge_base')
|
|
print(f'✓ Milvus connected, collection entities: {collection.num_entities}')
|
|
except Exception as e:
|
|
print(f'✗ Milvus error: {e}')
|
|
EOF
|
|
```
|
|
|
|
6. **Check for rate limiting or connection timeouts:**
|
|
```bash
|
|
docker-compose logs n8n | grep -i "timeout\|connection\|refused" | tail -20
|
|
```
|
|
|
|
7. **Verify vector dimension matches:**
|
|
- Check embedding model (should match Milvus collection definition)
|
|
- Default: 1536 dimensions (OpenAI embeddings)
|
|
```sql
|
|
SELECT vector_dimension FROM milvus_schema WHERE collection_name = 'knowledge_base';
|
|
```
|
|
|
|
---
|
|
|
|
## Logs & Debugging Commands
|
|
|
|
### View Real-time Logs
|
|
|
|
**N8N Logs:**
|
|
```bash
|
|
# All n8n logs
|
|
docker-compose logs -f n8n
|
|
|
|
# Follow specific keywords
|
|
docker-compose logs -f n8n | grep -i "error\|workflow\|processed"
|
|
|
|
# Last 100 lines
|
|
docker-compose logs --tail 100 n8n
|
|
```
|
|
|
|
**PostgreSQL Logs:**
|
|
```bash
|
|
# View recent PostgreSQL operations
|
|
docker-compose logs -f postgres
|
|
|
|
# Check database activity
|
|
docker-compose exec postgres psql -U kb_user -d n8n_kb -c \
|
|
"SELECT now(), datname, usename, state FROM pg_stat_activity;"
|
|
```
|
|
|
|
**Milvus Logs:**
|
|
```bash
|
|
# View Milvus startup and operation logs
|
|
docker-compose logs -f milvus
|
|
|
|
# Check Milvus status
|
|
docker-compose exec milvus curl -s http://localhost:9091/healthz
|
|
```
|
|
|
|
### Database Inspection
|
|
|
|
**Recent Workflow Executions:**
|
|
```bash
|
|
docker-compose exec postgres psql -U kb_user -d n8n_kb -c \
|
|
"SELECT workflow_name, status, executed_at, error_message FROM workflow_executions ORDER BY executed_at DESC LIMIT 10;"
|
|
```
|
|
|
|
**KB Updates Status:**
|
|
```bash
|
|
docker-compose exec postgres psql -U kb_user -d n8n_kb -c \
|
|
"SELECT status, COUNT(*) FROM knowledge_base_updates GROUP BY status;"
|
|
```
|
|
|
|
**Last 24h Activity:**
|
|
```bash
|
|
docker-compose exec postgres psql -U kb_user -d n8n_kb -c \
|
|
"SELECT DATE(executed_at) as date, workflow_name, status, COUNT(*) as count
|
|
FROM workflow_executions
|
|
WHERE executed_at > NOW() - INTERVAL '24 hours'
|
|
GROUP BY DATE(executed_at), workflow_name, status
|
|
ORDER BY date DESC, workflow_name;"
|
|
```
|
|
|
|
### Performance Monitoring
|
|
|
|
**PostgreSQL Connection Count:**
|
|
```bash
|
|
docker-compose exec postgres psql -U kb_user -d n8n_kb -c \
|
|
"SELECT count(*) as connections FROM pg_stat_activity;"
|
|
```
|
|
|
|
**PostgreSQL Cache Hit Ratio:**
|
|
```bash
|
|
docker-compose exec postgres psql -U kb_user -d n8n_kb -c \
|
|
"SELECT sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) as ratio
|
|
FROM pg_statio_user_tables;"
|
|
```
|
|
|
|
**Disk Usage:**
|
|
```bash
|
|
docker-compose exec postgres psql -U kb_user -d n8n_kb -c \
|
|
"SELECT schemaname, tablename, pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename))
|
|
FROM pg_tables
|
|
WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
|
|
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;"
|
|
```
|
|
|
|
### Debugging Network Issues
|
|
|
|
**Test connectivity between services:**
|
|
```bash
|
|
# From n8n to PostgreSQL
|
|
docker-compose exec n8n ping postgres
|
|
|
|
# From n8n to Milvus
|
|
docker-compose exec n8n curl -v http://milvus:19530/api/v1/health
|
|
|
|
# From n8n to Freescout
|
|
docker-compose exec n8n ping <freescout-host>
|
|
```
|
|
|
|
---
|
|
|
|
## Alert Thresholds
|
|
|
|
Configure monitoring/alerting for these conditions:
|
|
|
|
| Metric | Threshold | Action |
|
|
|--------|-----------|--------|
|
|
| Error Rate | > 5% | Page on-call, review workflow logs |
|
|
| KB Growth Stalled | 0 entries in 4 hours | Check Milvus health and embeddings |
|
|
| Approval Rate | < 50% | Review AI suggestion quality |
|
|
| Processing Rate | Drop > 50% | Check Freescout connection |
|
|
| Milvus Health | Not healthy | Restart Milvus, check etcd/minio |
|
|
| PostgreSQL Connections | > 80% of max | Investigate connection leaks |
|
|
|
|
---
|
|
|
|
## Regular Maintenance
|
|
|
|
### Daily
|
|
- [ ] Check error rate < 5%
|
|
- [ ] Verify KB growth is progressing
|
|
- [ ] Review Freescout API response times
|
|
|
|
### Weekly
|
|
- [ ] Analyze approval rate trends
|
|
- [ ] Check PostgreSQL disk usage
|
|
- [ ] Review n8n workflow performance
|
|
|
|
### Monthly
|
|
- [ ] Full system health audit
|
|
- [ ] Database maintenance (VACUUM, ANALYZE)
|
|
- [ ] Log rotation verification
|
|
- [ ] Capacity planning review
|
|
|
|
---
|
|
|
|
## Version Information
|
|
|
|
- **n8n**: Latest from `docker.n8n.io/n8nio/n8n`
|
|
- **PostgreSQL**: 15-alpine
|
|
- **Milvus**: v2.4.0
|
|
- **Logging Driver**: json-file with max 100MB per file, 10 files rotation
|
|
|
|
## Contact & Escalation
|
|
|
|
For issues not resolved by this guide:
|
|
1. Collect logs: `docker-compose logs > system_logs.txt`
|
|
2. Export database state for analysis
|
|
3. Contact DevOps team with reproducible steps
|