# Monitoring & Logging Setup ## Overview This document provides comprehensive monitoring and logging guidelines for the n8n AI Support Automation system. It includes key metrics, troubleshooting procedures, and log inspection commands. ## Key Metrics ### 1. Mail Processing Rate (Workflow A) **Description:** Track the number of conversations processed through the system. **N8N Logs:** ```bash docker-compose logs -f n8n | grep "processed" ``` **PostgreSQL Query:** ```sql SELECT COUNT(*) as total_executions, COUNT(CASE WHEN status = 'success' THEN 1 END) as successful_executions, ROUND(100.0 * COUNT(CASE WHEN status = 'success' THEN 1 END) / COUNT(*), 2) as success_rate FROM workflow_executions WHERE workflow_name = 'workflow-a'; ``` **Expected Behavior:** - Consistent processing rate (depends on Freescout mail polling interval) - Success rate > 95% - Monitor for sudden drops in processing rate --- ### 2. Approval Rate (Workflow B) **Description:** Monitor the ratio of approved vs rejected KB updates from the AI suggestions. **PostgreSQL Query:** ```sql SELECT status, COUNT(*) as count, ROUND(100.0 * COUNT(*) / SUM(COUNT(*)) OVER (), 2) as percentage FROM knowledge_base_updates GROUP BY status ORDER BY count DESC; ``` **Alternative Query for detailed breakdown:** ```sql SELECT status, COUNT(*) as count, AVG(EXTRACT(EPOCH FROM (updated_at - created_at))) as avg_approval_time_seconds FROM knowledge_base_updates GROUP BY status; ``` **Expected Behavior:** - Majority of updates should be APPROVED (typically 70-90%) - REJECTED rate should be < 15% - PENDING updates should be resolved within 24 hours --- ### 3. KB Growth (Workflow C) **Description:** Track the growth of the knowledge base as new information is added. **Milvus Query:** ```bash # First, connect to Milvus docker-compose exec milvus python3 -c " from pymilvus import connections, Collection connections.connect('default', host='localhost', port=19530) collection = Collection('knowledge_base') print(f'Total vectors: {collection.num_entities}') " ``` **PostgreSQL Query for tracking:** ```sql SELECT COUNT(*) as total_entries, COUNT(DISTINCT source) as unique_sources, MAX(created_at) as latest_entry FROM knowledge_base WHERE status = 'approved'; ``` **Daily Growth Query:** ```sql SELECT DATE(created_at) as date, COUNT(*) as entries_added FROM knowledge_base WHERE status = 'approved' GROUP BY DATE(created_at) ORDER BY date DESC LIMIT 30; ``` **Expected Behavior:** - +1 vector per approved ticket (approximately) - Steady growth correlates with approved KB updates - Monitor for stalled growth (may indicate Milvus issues) --- ### 4. Error Rate **Description:** Monitor workflow execution errors across all workflows. **PostgreSQL Query - Overall Error Rate:** ```sql SELECT COUNT(*) as total_executions, COUNT(CASE WHEN status = 'ERROR' THEN 1 END) as error_count, ROUND(100.0 * COUNT(CASE WHEN status = 'ERROR' THEN 1 END) / COUNT(*), 2) as error_percentage FROM workflow_executions; ``` **Detailed Error Analysis:** ```sql SELECT workflow_name, status, COUNT(*) as count, ROUND(100.0 * COUNT(*) / SUM(COUNT(*)) OVER (PARTITION BY workflow_name), 2) as percentage FROM workflow_executions GROUP BY workflow_name, status ORDER BY workflow_name, error_count DESC; ``` **Error Details for Investigation:** ```sql SELECT workflow_name, status, error_message, COUNT(*) as occurrences, MAX(executed_at) as latest_error FROM workflow_executions WHERE status = 'ERROR' GROUP BY workflow_name, status, error_message ORDER BY occurrences DESC; ``` **Expected Behavior:** - Error rate < 5% - No recurring errors (indicates systemic issue) - Quick recovery from transient errors --- ## Troubleshooting Guide ### Workflow A (Mail Processing) - Not Running **Symptoms:** - No new conversations being processed - N8N logs show no activity - PostgreSQL query returns unchanged row count **Troubleshooting Steps:** 1. **Check if workflow trigger is active:** ```bash docker-compose logs -f n8n | grep "workflow-a" ``` 2. **Verify Cron trigger configuration:** - Log into n8n UI at `https://.` - Navigate to workflow-a - Check cron expression (typically: `0 */5 * * * *` for every 5 minutes) - Verify "Active" toggle is ON 3. **Test Freescout API credentials:** ```bash docker-compose exec n8n curl -X GET \ -H "Authorization: Bearer ${FREESCOUT_API_TOKEN}" \ https:///api/v1/conversations ``` 4. **Check Freescout API reachability:** ```bash docker-compose exec n8n ping docker-compose exec n8n curl -I https:///api/v1/health ``` 5. **Review n8n logs for errors:** ```bash docker-compose logs n8n | grep -i "error\|exception" | tail -20 ``` 6. **Verify PostgreSQL connection:** ```bash docker-compose logs n8n | grep -i "database\|postgres" ``` --- ### Workflow B (AI Suggestions) - Not Triggering **Symptoms:** - No new AI suggestions in Freescout - workflow_executions table shows no recent B entries - knowledge_base_updates status stuck in PENDING **Troubleshooting Steps:** 1. **Check if Freescout custom field is being updated:** ```sql SELECT * FROM freescout_conversation_custom_fields WHERE field_name = 'AI_SUGGESTION_STATUS' ORDER BY updated_at DESC LIMIT 10; ``` 2. **Verify polling interval:** - Check n8n workflow B settings - Polling trigger should be running (typically every 1 minute) - Confirm: `docker-compose logs n8n | grep -i "polling\|workflow-b"` 3. **Check webhook configuration:** ```bash # If using webhook instead of polling docker-compose logs -f n8n | grep -i "webhook" ``` 4. **Review Freescout API response:** ```bash docker-compose exec postgres psql -U kb_user -d n8n_kb -c \ "SELECT * FROM api_logs WHERE endpoint LIKE '%conversation%' ORDER BY timestamp DESC LIMIT 5;" ``` 5. **Verify OpenAI/AI provider connectivity:** ```bash docker-compose logs n8n | grep -i "openai\|api\|llm" | tail -20 ``` 6. **Check if there are unprocessed conversations:** ```sql SELECT COUNT(*) as pending_conversations FROM workflow_executions WHERE workflow_name = 'workflow-a' AND status = 'success' AND ai_suggestion_generated = false AND created_at > NOW() - INTERVAL '1 hour'; ``` --- ### Workflow C (KB Storage) - Not Saving to Milvus **Symptoms:** - knowledge_base table updates but Milvus count doesn't increase - KB search returns no results - Milvus health check failures **Troubleshooting Steps:** 1. **Check Milvus health status:** ```bash docker-compose exec milvus curl -s http://localhost:9091/healthz | jq . ``` 2. **Verify Milvus is running:** ```bash docker-compose ps milvus docker-compose logs milvus | tail -30 ``` 3. **Check if embeddings are being generated:** ```sql SELECT COUNT(*) as embeddings_generated FROM knowledge_base WHERE embedding IS NOT NULL; ``` 4. **Verify Milvus connection in n8n logs:** ```bash docker-compose logs n8n | grep -i "milvus\|embedding" | tail -20 ``` 5. **Test Milvus directly:** ```bash docker-compose exec milvus python3 << 'EOF' from pymilvus import connections, Collection connections.connect('default', host='localhost', port=19530) try: collection = Collection('knowledge_base') print(f'✓ Milvus connected, collection entities: {collection.num_entities}') except Exception as e: print(f'✗ Milvus error: {e}') EOF ``` 6. **Check for rate limiting or connection timeouts:** ```bash docker-compose logs n8n | grep -i "timeout\|connection\|refused" | tail -20 ``` 7. **Verify vector dimension matches:** - Check embedding model (should match Milvus collection definition) - Default: 1536 dimensions (OpenAI embeddings) ```sql SELECT vector_dimension FROM milvus_schema WHERE collection_name = 'knowledge_base'; ``` --- ## Logs & Debugging Commands ### View Real-time Logs **N8N Logs:** ```bash # All n8n logs docker-compose logs -f n8n # Follow specific keywords docker-compose logs -f n8n | grep -i "error\|workflow\|processed" # Last 100 lines docker-compose logs --tail 100 n8n ``` **PostgreSQL Logs:** ```bash # View recent PostgreSQL operations docker-compose logs -f postgres # Check database activity docker-compose exec postgres psql -U kb_user -d n8n_kb -c \ "SELECT now(), datname, usename, state FROM pg_stat_activity;" ``` **Milvus Logs:** ```bash # View Milvus startup and operation logs docker-compose logs -f milvus # Check Milvus status docker-compose exec milvus curl -s http://localhost:9091/healthz ``` ### Database Inspection **Recent Workflow Executions:** ```bash docker-compose exec postgres psql -U kb_user -d n8n_kb -c \ "SELECT workflow_name, status, executed_at, error_message FROM workflow_executions ORDER BY executed_at DESC LIMIT 10;" ``` **KB Updates Status:** ```bash docker-compose exec postgres psql -U kb_user -d n8n_kb -c \ "SELECT status, COUNT(*) FROM knowledge_base_updates GROUP BY status;" ``` **Last 24h Activity:** ```bash docker-compose exec postgres psql -U kb_user -d n8n_kb -c \ "SELECT DATE(executed_at) as date, workflow_name, status, COUNT(*) as count FROM workflow_executions WHERE executed_at > NOW() - INTERVAL '24 hours' GROUP BY DATE(executed_at), workflow_name, status ORDER BY date DESC, workflow_name;" ``` ### Performance Monitoring **PostgreSQL Connection Count:** ```bash docker-compose exec postgres psql -U kb_user -d n8n_kb -c \ "SELECT count(*) as connections FROM pg_stat_activity;" ``` **PostgreSQL Cache Hit Ratio:** ```bash docker-compose exec postgres psql -U kb_user -d n8n_kb -c \ "SELECT sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) as ratio FROM pg_statio_user_tables;" ``` **Disk Usage:** ```bash docker-compose exec postgres psql -U kb_user -d n8n_kb -c \ "SELECT schemaname, tablename, pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) FROM pg_tables WHERE schemaname NOT IN ('pg_catalog', 'information_schema') ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;" ``` ### Debugging Network Issues **Test connectivity between services:** ```bash # From n8n to PostgreSQL docker-compose exec n8n ping postgres # From n8n to Milvus docker-compose exec n8n curl -v http://milvus:19530/api/v1/health # From n8n to Freescout docker-compose exec n8n ping ``` --- ## Alert Thresholds Configure monitoring/alerting for these conditions: | Metric | Threshold | Action | |--------|-----------|--------| | Error Rate | > 5% | Page on-call, review workflow logs | | KB Growth Stalled | 0 entries in 4 hours | Check Milvus health and embeddings | | Approval Rate | < 50% | Review AI suggestion quality | | Processing Rate | Drop > 50% | Check Freescout connection | | Milvus Health | Not healthy | Restart Milvus, check etcd/minio | | PostgreSQL Connections | > 80% of max | Investigate connection leaks | --- ## Regular Maintenance ### Daily - [ ] Check error rate < 5% - [ ] Verify KB growth is progressing - [ ] Review Freescout API response times ### Weekly - [ ] Analyze approval rate trends - [ ] Check PostgreSQL disk usage - [ ] Review n8n workflow performance ### Monthly - [ ] Full system health audit - [ ] Database maintenance (VACUUM, ANALYZE) - [ ] Log rotation verification - [ ] Capacity planning review --- ## Version Information - **n8n**: Latest from `docker.n8n.io/n8nio/n8n` - **PostgreSQL**: 15-alpine - **Milvus**: v2.4.0 - **Logging Driver**: json-file with max 100MB per file, 10 files rotation ## Contact & Escalation For issues not resolved by this guide: 1. Collect logs: `docker-compose logs > system_logs.txt` 2. Export database state for analysis 3. Contact DevOps team with reproducible steps