Files
n8n-compose/docs/MONITORING.md
2026-03-16 17:32:28 +01:00

12 KiB

Monitoring & Logging Setup

Overview

This document provides comprehensive monitoring and logging guidelines for the n8n AI Support Automation system. It includes key metrics, troubleshooting procedures, and log inspection commands.

Key Metrics

1. Mail Processing Rate (Workflow A)

Description: Track the number of conversations processed through the system.

N8N Logs:

docker-compose logs -f n8n | grep "processed"

PostgreSQL Query:

SELECT COUNT(*) as total_executions,
       COUNT(CASE WHEN status = 'success' THEN 1 END) as successful_executions,
       ROUND(100.0 * COUNT(CASE WHEN status = 'success' THEN 1 END) / COUNT(*), 2) as success_rate
FROM workflow_executions
WHERE workflow_name = 'workflow-a';

Expected Behavior:

  • Consistent processing rate (depends on Freescout mail polling interval)
  • Success rate > 95%
  • Monitor for sudden drops in processing rate

2. Approval Rate (Workflow B)

Description: Monitor the ratio of approved vs rejected KB updates from the AI suggestions.

PostgreSQL Query:

SELECT status, COUNT(*) as count,
       ROUND(100.0 * COUNT(*) / SUM(COUNT(*)) OVER (), 2) as percentage
FROM knowledge_base_updates
GROUP BY status
ORDER BY count DESC;

Alternative Query for detailed breakdown:

SELECT
    status,
    COUNT(*) as count,
    AVG(EXTRACT(EPOCH FROM (updated_at - created_at))) as avg_approval_time_seconds
FROM knowledge_base_updates
GROUP BY status;

Expected Behavior:

  • Majority of updates should be APPROVED (typically 70-90%)
  • REJECTED rate should be < 15%
  • PENDING updates should be resolved within 24 hours

3. KB Growth (Workflow C)

Description: Track the growth of the knowledge base as new information is added.

Milvus Query:

# First, connect to Milvus
docker-compose exec milvus python3 -c "
from pymilvus import connections, Collection

connections.connect('default', host='localhost', port=19530)
collection = Collection('knowledge_base')
print(f'Total vectors: {collection.num_entities}')
"

PostgreSQL Query for tracking:

SELECT COUNT(*) as total_entries,
       COUNT(DISTINCT source) as unique_sources,
       MAX(created_at) as latest_entry
FROM knowledge_base
WHERE status = 'approved';

Daily Growth Query:

SELECT DATE(created_at) as date, COUNT(*) as entries_added
FROM knowledge_base
WHERE status = 'approved'
GROUP BY DATE(created_at)
ORDER BY date DESC
LIMIT 30;

Expected Behavior:

  • +1 vector per approved ticket (approximately)
  • Steady growth correlates with approved KB updates
  • Monitor for stalled growth (may indicate Milvus issues)

4. Error Rate

Description: Monitor workflow execution errors across all workflows.

PostgreSQL Query - Overall Error Rate:

SELECT
    COUNT(*) as total_executions,
    COUNT(CASE WHEN status = 'ERROR' THEN 1 END) as error_count,
    ROUND(100.0 * COUNT(CASE WHEN status = 'ERROR' THEN 1 END) / COUNT(*), 2) as error_percentage
FROM workflow_executions;

Detailed Error Analysis:

SELECT
    workflow_name,
    status,
    COUNT(*) as count,
    ROUND(100.0 * COUNT(*) / SUM(COUNT(*)) OVER (PARTITION BY workflow_name), 2) as percentage
FROM workflow_executions
GROUP BY workflow_name, status
ORDER BY workflow_name, error_count DESC;

Error Details for Investigation:

SELECT
    workflow_name,
    status,
    error_message,
    COUNT(*) as occurrences,
    MAX(executed_at) as latest_error
FROM workflow_executions
WHERE status = 'ERROR'
GROUP BY workflow_name, status, error_message
ORDER BY occurrences DESC;

Expected Behavior:

  • Error rate < 5%
  • No recurring errors (indicates systemic issue)
  • Quick recovery from transient errors

Troubleshooting Guide

Workflow A (Mail Processing) - Not Running

Symptoms:

  • No new conversations being processed
  • N8N logs show no activity
  • PostgreSQL query returns unchanged row count

Troubleshooting Steps:

  1. Check if workflow trigger is active:

    docker-compose logs -f n8n | grep "workflow-a"
    
  2. Verify Cron trigger configuration:

    • Log into n8n UI at https://<SUBDOMAIN>.<DOMAIN>
    • Navigate to workflow-a
    • Check cron expression (typically: 0 */5 * * * * for every 5 minutes)
    • Verify "Active" toggle is ON
  3. Test Freescout API credentials:

    docker-compose exec n8n curl -X GET \
      -H "Authorization: Bearer ${FREESCOUT_API_TOKEN}" \
      https://<freescout-instance>/api/v1/conversations
    
  4. Check Freescout API reachability:

    docker-compose exec n8n ping <freescout-instance>
    docker-compose exec n8n curl -I https://<freescout-instance>/api/v1/health
    
  5. Review n8n logs for errors:

    docker-compose logs n8n | grep -i "error\|exception" | tail -20
    
  6. Verify PostgreSQL connection:

    docker-compose logs n8n | grep -i "database\|postgres"
    

Workflow B (AI Suggestions) - Not Triggering

Symptoms:

  • No new AI suggestions in Freescout
  • workflow_executions table shows no recent B entries
  • knowledge_base_updates status stuck in PENDING

Troubleshooting Steps:

  1. Check if Freescout custom field is being updated:

    SELECT * FROM freescout_conversation_custom_fields
    WHERE field_name = 'AI_SUGGESTION_STATUS'
    ORDER BY updated_at DESC
    LIMIT 10;
    
  2. Verify polling interval:

    • Check n8n workflow B settings
    • Polling trigger should be running (typically every 1 minute)
    • Confirm: docker-compose logs n8n | grep -i "polling\|workflow-b"
  3. Check webhook configuration:

    # If using webhook instead of polling
    docker-compose logs -f n8n | grep -i "webhook"
    
  4. Review Freescout API response:

    docker-compose exec postgres psql -U kb_user -d n8n_kb -c \
      "SELECT * FROM api_logs WHERE endpoint LIKE '%conversation%' ORDER BY timestamp DESC LIMIT 5;"
    
  5. Verify OpenAI/AI provider connectivity:

    docker-compose logs n8n | grep -i "openai\|api\|llm" | tail -20
    
  6. Check if there are unprocessed conversations:

    SELECT COUNT(*) as pending_conversations
    FROM workflow_executions
    WHERE workflow_name = 'workflow-a'
    AND status = 'success'
    AND ai_suggestion_generated = false
    AND created_at > NOW() - INTERVAL '1 hour';
    

Workflow C (KB Storage) - Not Saving to Milvus

Symptoms:

  • knowledge_base table updates but Milvus count doesn't increase
  • KB search returns no results
  • Milvus health check failures

Troubleshooting Steps:

  1. Check Milvus health status:

    docker-compose exec milvus curl -s http://localhost:9091/healthz | jq .
    
  2. Verify Milvus is running:

    docker-compose ps milvus
    docker-compose logs milvus | tail -30
    
  3. Check if embeddings are being generated:

    SELECT COUNT(*) as embeddings_generated
    FROM knowledge_base
    WHERE embedding IS NOT NULL;
    
  4. Verify Milvus connection in n8n logs:

    docker-compose logs n8n | grep -i "milvus\|embedding" | tail -20
    
  5. Test Milvus directly:

    docker-compose exec milvus python3 << 'EOF'
    from pymilvus import connections, Collection
    connections.connect('default', host='localhost', port=19530)
    try:
        collection = Collection('knowledge_base')
        print(f'✓ Milvus connected, collection entities: {collection.num_entities}')
    except Exception as e:
        print(f'✗ Milvus error: {e}')
    EOF
    
  6. Check for rate limiting or connection timeouts:

    docker-compose logs n8n | grep -i "timeout\|connection\|refused" | tail -20
    
  7. Verify vector dimension matches:

    • Check embedding model (should match Milvus collection definition)
    • Default: 1536 dimensions (OpenAI embeddings)
    SELECT vector_dimension FROM milvus_schema WHERE collection_name = 'knowledge_base';
    

Logs & Debugging Commands

View Real-time Logs

N8N Logs:

# All n8n logs
docker-compose logs -f n8n

# Follow specific keywords
docker-compose logs -f n8n | grep -i "error\|workflow\|processed"

# Last 100 lines
docker-compose logs --tail 100 n8n

PostgreSQL Logs:

# View recent PostgreSQL operations
docker-compose logs -f postgres

# Check database activity
docker-compose exec postgres psql -U kb_user -d n8n_kb -c \
  "SELECT now(), datname, usename, state FROM pg_stat_activity;"

Milvus Logs:

# View Milvus startup and operation logs
docker-compose logs -f milvus

# Check Milvus status
docker-compose exec milvus curl -s http://localhost:9091/healthz

Database Inspection

Recent Workflow Executions:

docker-compose exec postgres psql -U kb_user -d n8n_kb -c \
  "SELECT workflow_name, status, executed_at, error_message FROM workflow_executions ORDER BY executed_at DESC LIMIT 10;"

KB Updates Status:

docker-compose exec postgres psql -U kb_user -d n8n_kb -c \
  "SELECT status, COUNT(*) FROM knowledge_base_updates GROUP BY status;"

Last 24h Activity:

docker-compose exec postgres psql -U kb_user -d n8n_kb -c \
  "SELECT DATE(executed_at) as date, workflow_name, status, COUNT(*) as count
   FROM workflow_executions
   WHERE executed_at > NOW() - INTERVAL '24 hours'
   GROUP BY DATE(executed_at), workflow_name, status
   ORDER BY date DESC, workflow_name;"

Performance Monitoring

PostgreSQL Connection Count:

docker-compose exec postgres psql -U kb_user -d n8n_kb -c \
  "SELECT count(*) as connections FROM pg_stat_activity;"

PostgreSQL Cache Hit Ratio:

docker-compose exec postgres psql -U kb_user -d n8n_kb -c \
  "SELECT sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) as ratio
   FROM pg_statio_user_tables;"

Disk Usage:

docker-compose exec postgres psql -U kb_user -d n8n_kb -c \
  "SELECT schemaname, tablename, pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename))
   FROM pg_tables
   WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
   ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;"

Debugging Network Issues

Test connectivity between services:

# From n8n to PostgreSQL
docker-compose exec n8n ping postgres

# From n8n to Milvus
docker-compose exec n8n curl -v http://milvus:19530/api/v1/health

# From n8n to Freescout
docker-compose exec n8n ping <freescout-host>

Alert Thresholds

Configure monitoring/alerting for these conditions:

Metric Threshold Action
Error Rate > 5% Page on-call, review workflow logs
KB Growth Stalled 0 entries in 4 hours Check Milvus health and embeddings
Approval Rate < 50% Review AI suggestion quality
Processing Rate Drop > 50% Check Freescout connection
Milvus Health Not healthy Restart Milvus, check etcd/minio
PostgreSQL Connections > 80% of max Investigate connection leaks

Regular Maintenance

Daily

  • Check error rate < 5%
  • Verify KB growth is progressing
  • Review Freescout API response times

Weekly

  • Analyze approval rate trends
  • Check PostgreSQL disk usage
  • Review n8n workflow performance

Monthly

  • Full system health audit
  • Database maintenance (VACUUM, ANALYZE)
  • Log rotation verification
  • Capacity planning review

Version Information

  • n8n: Latest from docker.n8n.io/n8nio/n8n
  • PostgreSQL: 15-alpine
  • Milvus: v2.4.0
  • Logging Driver: json-file with max 100MB per file, 10 files rotation

Contact & Escalation

For issues not resolved by this guide:

  1. Collect logs: docker-compose logs > system_logs.txt
  2. Export database state for analysis
  3. Contact DevOps team with reproducible steps