Мониторинг инфраструктуры
Обзор
┌─────────────────┐
│ Grafana │
│ mon.b2g.kz │
│ :3000 │
└────────┬────────┘
│
┌────────┴────────┐
│ Prometheus │
│ :9090 │
└────────┬────────┘
│ scrape
┌─────────┬───────┬──────┼──────┬────────┬─────────┐
│ │ │ │ │ │ │
┌────┴───┐ ┌───┴───┐ ┌─┴──┐ ┌─┴──┐ ┌─┴──┐ ┌───┴───┐ ┌───┴───┐
│ lb-1 │ │ lb-2 │ │web1│ │web2│ │db-1│ │ db-2 │ │redis-1│
│ :9100 │ │ :9100 │ │9100│ │9100│ │9100│ │ :9100 │ │ :9100 │
└────────┘ └───────┘ └────┘ └────┘ └────┘ └───────┘ └───────┘
Компоненты
Prometheus (mon-1)
| Параметр |
Значение |
| URL |
http://10.10.19.60:9090 |
| Version |
3.9.1 |
| Retention |
30 дней |
| Scrape Interval |
15s |
| Config |
/etc/prometheus/prometheus.yml |
| Data Dir |
/var/lib/prometheus |
Grafana (mon-1)
| Параметр |
Значение |
| Internal URL |
http://10.10.19.60:3000 |
| External URL |
https://mon.b2g.kz |
| Version |
12.3.1 |
| Admin User |
admin |
| Admin Password |
GrafanaAdmin2026 |
| Data Dir |
/var/lib/grafana |
Node Exporter
| VM |
IP |
Port |
Version |
| lb-1 |
10.10.19.10 |
9100 |
1.7.0 |
| lb-2 |
10.10.19.11 |
9100 |
1.8.2 |
| web-1 |
10.10.19.21 |
9100 |
1.7.0 |
| web-2 |
10.10.19.22 |
9100 |
1.7.0 |
| db-1 |
10.10.19.31 |
9100 |
1.7.0 |
| db-2 |
10.10.19.32 |
9100 |
1.7.0 |
| redis-1 |
10.10.19.40 |
9100 |
1.7.0 |
| voip-1 |
10.10.19.51 |
9100 |
1.7.0 |
| voip-2 |
10.10.19.52 |
9100 |
1.7.0 |
| mon-1 |
10.10.19.60 |
9100 |
1.10.2 |
Prometheus конфигурация
/etc/prometheus/prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
alerting:
alertmanagers:
- static_configs:
- targets: []
rule_files:
- "alerts/*.yml"
scrape_configs:
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]
- job_name: "node"
static_configs:
- targets:
- "10.10.19.10:9100" # lb-1
- "10.10.19.11:9100" # lb-2
- "10.10.19.21:9100" # web-1
- "10.10.19.22:9100" # web-2
- "10.10.19.31:9100" # db-1
- "10.10.19.32:9100" # db-2
- "10.10.19.40:9100" # redis-1
- "10.10.19.51:9100" # voip-1
- "10.10.19.52:9100" # voip-2
- "10.10.19.60:9100" # mon-1
Grafana Dashboards
Установленные дашборды
| Dashboard |
ID |
Описание |
| Node Exporter Full |
1860 |
Полные системные метрики всех VM |
Доступ
- Внутренний: http://10.10.19.60:3000
- Внешний: https://mon.b2g.kz (через lb-1/lb-2 reverse proxy)
Настройка Data Source
Name: Prometheus
Type: Prometheus
URL: http://localhost:9090
Access: Server (default)
Алерты
Настроенные алерты (папка: Sintegra Alerts)
| Alert |
Условие |
Duration |
Severity |
| High CPU Usage |
> 80% |
5 min |
warning |
| High Memory Usage |
> 85% |
5 min |
warning |
| High Disk Usage |
> 90% |
1 min |
critical |
| VM Unavailable |
up < 1 |
1 min |
critical |
Пример правила алерта
# /etc/prometheus/alerts/node.yml
groups:
- name: node_alerts
rules:
- alert: HighCPUUsage
expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
for: 5m
labels:
severity: warning
annotations:
summary: "High CPU usage on {{ $labels.instance }}"
description: "CPU usage is above 80% for 5 minutes"
- alert: HighMemoryUsage
expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 85
for: 5m
labels:
severity: warning
annotations:
summary: "High memory usage on {{ $labels.instance }}"
- alert: HighDiskUsage
expr: (1 - (node_filesystem_avail_bytes / node_filesystem_size_bytes)) * 100 > 90
for: 1m
labels:
severity: critical
annotations:
summary: "High disk usage on {{ $labels.instance }}"
- alert: VMUnavailable
expr: up == 0
for: 1m
labels:
severity: critical
annotations:
summary: "VM {{ $labels.instance }} is unavailable"
Управление сервисами
Prometheus
# Статус
sudo systemctl status prometheus
# Перезапуск
sudo systemctl restart prometheus
# Проверка конфигурации
promtool check config /etc/prometheus/prometheus.yml
# Reload конфигурации без рестарта
curl -X POST http://localhost:9090/-/reload
Grafana
# Статус
sudo systemctl status grafana-server
# Перезапуск
sudo systemctl restart grafana-server
# Логи
sudo journalctl -u grafana-server -f
Node Exporter
# Статус
sudo systemctl status node_exporter
# Перезапуск
sudo systemctl restart node_exporter
# Проверка метрик
curl http://localhost:9100/metrics | head -50
Nginx Reverse Proxy для Grafana
/etc/nginx/sites-available/mon.b2g.kz (на lb-1, lb-2)
server {
listen 80;
server_name mon.b2g.kz;
return 301 https://$server_name$request_uri;
}
server {
listen 443 ssl http2;
server_name mon.b2g.kz;
ssl_certificate /etc/nginx/ssl/mon.b2g.kz.crt;
ssl_certificate_key /etc/nginx/ssl/mon.b2g.kz.key;
location / {
proxy_pass http://10.10.19.60:3000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
}
Полезные PromQL запросы
CPU Usage
100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
Memory Usage
(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100
Disk Usage
(1 - (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"})) * 100
Network Traffic
rate(node_network_receive_bytes_total{device="ens192"}[5m])
rate(node_network_transmit_bytes_total{device="ens192"}[5m])
Обновлено: 2026-01-19