Grafana 还能这么玩？性能监控 + 网络监控可视化（点亮地球）-Tech • VPS • Network vlx@tech

目标：
用 1 台 Master VPS + 多台 Agent VPS，构建一个可以 同时监控服务器资源 + 全国三网延迟 / 丢包 的完整监控体系。

一、整体架构说明（先搞清楚你在做什么）

在开始动手前，先用一句话说明这个系统在干嘛：

Master VPS 负责“收集 + 画图”，Agent VPS 负责“执行探测”

📌 架构角色说明

角色	数量	作用
Master VPS	1台运行	Prometheus + Grafana + Blackbox，用来集中采集和展示
Agent VPS	多台	分布在不同地域，负责 Ping / TCP / HTTP 探测
Node Exporter	每台都有	采集 CPU / 内存 / 网络 / 磁盘
Blackbox Exporter	每台都有	执行 Ping / TCP / HTTP 探测

📊 最终你能看到什么？

✅ 10 台服务器 CPU / 内存 / 流量一览
✅ 不同地区 → 北京 / 上海 / 广州 / 深圳三网延迟
✅ 国际目标（Google / Cloudflare）连通质量
✅ 丢包率、RTT 抖动趋势

二、第一阶段：所有服务器统一安装 Docker

⚠️ 下面步骤：10 台服务器全部都要做

1. 获取 root 权限

sudo -i

(如果已经是 # 号开头，忽略此步)
2. 安装 Docker & Docker Compose

curl -fsSL https://get.docker.com | bash
systemctl enable --now docker

3. 创建工作目录

mkdir -p /opt/monitor/config

第二阶段：配置其他VPS被控端 (Agent)

请在除了 Master VPS以外的其他VPS机器上，重复执行以下步骤。

1. 写入黑盒探测配置 复制下面整段代码，粘贴到终端回车：

cat > /opt/monitor/config/blackbox.yml <<EOF
modules:
  http_2xx:
    prober: http
    timeout: 5s
  icmp:
    prober: icmp
    timeout: 5s
    icmp:
      preferred_ip_protocol: "ip4"
  tcp_connect:
    prober: tcp
    timeout: 5s
EOF

📌 这个文件定义了 Agent 能执行哪些探测类型：

icmp：Ping（最重要）
http_2xx：HTTP 可用性
tcp_connect：端口连通性

2. 写入启动文件 复制下面整段代码，粘贴到终端回车：

cat > /opt/monitor/docker-compose.yml <<EOF
services:
  # 汇报 CPU/内存
  node_exporter:
    image: prom/node-exporter:latest
    container_name: node_exporter
    ports:
      - "9100:9100"
    restart: always

  # 执行 Ping 任务
  blackbox_exporter:
    image: prom/blackbox-exporter:master
    container_name: blackbox_exporter
    ports:
      - "9115:9115"
    volumes:
      - ./config/blackbox.yml:/etc/blackbox_exporter/config.yml
    cap_add:
      - NET_RAW
    restart: always
EOF

⚠️ NET_RAW 是 Ping 必须的权限，否则 ICMP 会失败。

3. 启动服务

cd /opt/monitor
docker compose up -d

看到以下两个容器启动即可：

node_exporter
blackbox_exporter
👉 请在其他 VPS上全部完成以上步骤

⚠️ 重要防火墙设置：请务必在腾讯云/阿里云/AWS 等后台的安全组中，放行 9100 和 9115 端口（TCP），或者其他VPS中放行端口。
否则 Master VPS会抓不到数据。

第三阶段：配置 Master 主控端

请在剩下的那Master主控端服务器上操作。

1. 创建目录结构

mkdir -p /opt/monitor/config/targets

2. 写入探测目标列表 (北上广深 + 三大运营商) 复制下面整段代码（包含了详细的 IP 列表），粘贴到终端回车：

cat > /opt/monitor/config/targets/ping_hosts.yml <<EOF
# === 北京 ===
- targets: ['202.96.12.9']
  labels: { name: 'BJ Telecom', city: 'Beijing', isp: 'CT', group: 'CN' }
- targets: ['123.125.96.156']
  labels: { name: 'BJ Unicom', city: 'Beijing', isp: 'CU', group: 'CN' }
- targets: ['211.136.25.153']
  labels: { name: 'BJ Mobile', city: 'Beijing', isp: 'CM', group: 'CN' }

# === 上海 ===
- targets: ['202.96.209.133']
  labels: { name: 'SH Telecom', city: 'Shanghai', isp: 'CT', group: 'CN' }
- targets: ['210.22.70.3']
  labels: { name: 'SH Unicom', city: 'Shanghai', isp: 'CU', group: 'CN' }
- targets: ['211.136.112.200']
  labels: { name: 'SH Mobile', city: 'Shanghai', isp: 'CM', group: 'CN' }

# === 广州 ===
- targets: ['202.96.128.86']
  labels: { name: 'GZ Telecom', city: 'Guangzhou', isp: 'CT', group: 'CN' }
- targets: ['210.21.196.6']
  labels: { name: 'GZ Unicom', city: 'Guangzhou', isp: 'CU', group: 'CN' }
- targets: ['211.139.145.229']
  labels: { name: 'GZ Mobile', city: 'Guangzhou', isp: 'CM', group: 'CN' }

# === 深圳 ===
- targets: ['202.96.134.133']
  labels: { name: 'SZ Telecom', city: 'Shenzhen', isp: 'CT', group: 'CN' }
- targets: ['58.250.0.1']
  labels: { name: 'SZ Unicom', city: 'Shenzhen', isp: 'CU', group: 'CN' }
- targets: ['120.196.165.24']
  labels: { name: 'SZ Mobile', city: 'Shenzhen', isp: 'CM', group: 'CN' }

# === 国际 ===
- targets: ['8.8.8.8']
  labels: { name: 'Google DNS', city: 'Global', isp: 'Google', group: 'Global' }
- targets: ['1.1.1.1']
  labels: { name: 'Cloudflare', city: 'Global', isp: 'CF', group: 'Global' }
EOF

3. 写入 Blackbox 基础配置

cat > /opt/monitor/config/blackbox.yml <<EOF
modules:
  http_2xx:
    prober: http
    timeout: 5s
  icmp:
    prober: icmp
    timeout: 5s
    icmp:
      preferred_ip_protocol: "ip4"
EOF

4. 编写 Prometheus 配置文件 (最关键一步) 请使用编辑器打开文件：

vim /opt/monitor/config/prometheus.yml

复制下面的内容，并修改其中的 IP 地址为你的真实 IP：

global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  # ---------------------------------------------------
  # 任务一：收集所有机器的 CPU/内存/流量 (Node Exporter)
  # ---------------------------------------------------
  - job_name: 'server-stats'
    static_configs:
      - targets: ['node_exporter:9100']
        labels: { instance: 'Master-Local', env: 'prod' }   # Master 自己
      - targets: ['192.168.1.2:9100']
        labels: { instance: 'VPS1', env: 'prod' }     
      - targets: ['192.168.1.3:9100']     # [修改] Agent 1 的公网 IP
        labels: { instance: 'VPS2', env: 'prod' }     # [修改] Agent 2 的公网 IP
        # ... 请把剩下其他的VPS的 IP 按照格式继续写下去 ...

  # ---------------------------------------------------
  # 任务二：让 Master 自己去 Ping 北上广深
  # ---------------------------------------------------
  - job_name: 'probe-Master'
    metrics_path: /probe
    params:
      module: [icmp]
    file_sd_configs:
      - files: ['/etc/prometheus/targets/ping_hosts.yml']
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [name]
        target_label: instance
      - target_label: __address__
        replacement: blackbox_exporter:9115

  # ---------------------------------------------------
  # 任务三：让 Agent 1 去 Ping 北上广深
  # ---------------------------------------------------
  - job_name: 'VPS1'  # 建议改为 'VPS-HK' (地区名)
    metrics_path: /probe
    params:
      module: [icmp]
    file_sd_configs:
      - files: ['/etc/prometheus/targets/ping_hosts.yml']
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [name]
        target_label: instance
      - target_label: __address__
        # !!! 下面填 Agent 1 的公网 IP !!!
        replacement: 192.168.1.2:9115 

  # ---------------------------------------------------
  # 任务四：让 Agent 2 去 Ping 北上广深
  # ---------------------------------------------------
  - job_name: 'VPS2'  # 建议改为 'VPS-US' (地区名)
    metrics_path: /probe
    params:
      module: [icmp]
    file_sd_configs:
      - files: ['/etc/prometheus/targets/ping_hosts.yml']
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [name]
        target_label: instance
      - target_label: __address__
        # !!! 下面填 Agent 2 的公网 IP !!!
        replacement: 192.168.1.3:9115

  # ... 请复制上面的任务块，直到把 9 台 Agent 都配置完 ...

(按 ESC 输入":",再输入:wq回车保存退出)

5. 启动 Master

cat > /opt/monitor/docker-compose.yml <<EOF
services:
  node_exporter:
    image: prom/node-exporter:latest
    container_name: node_exporter
    ports:
      - "9100:9100"
    restart: always

  blackbox_exporter:
    image: prom/blackbox-exporter:master
    container_name: blackbox_exporter
    ports:
      - "9115:9115"
    volumes:
      - ./config/blackbox.yml:/etc/blackbox_exporter/config.yml
    cap_add:
      - NET_RAW
    restart: always

  prometheus:
    image: prom/prometheus:latest
    container_name: prometheus
    ports:
      - "9090:9090"
    volumes:
      - ./config/prometheus.yml:/etc/prometheus/prometheus.yml
      - ./config/targets:/etc/prometheus/targets
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
    depends_on:
      - blackbox_exporter
    restart: always

  grafana:
    image: grafana/grafana:latest
    container_name: grafana
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
    depends_on:
      - prometheus
    restart: always
EOF

启动命令：

cd /opt/monitor
docker compose up -d

启动成功后，你会看到：

Prometheus（http://<Master-IP>:9090）
Grafana（http://<Master-IP>:3000）
Node Exporter（http://<Master-IP>:9100）
Blackbox Exporter（http://<Master-IP>:9115）

第四阶段：配置可视化 (Grafana)

1️⃣ 登录 Grafana

浏览器访问：

http://<Master-IP>:3000

账号 / 密码：

admin / admin

2️⃣ 添加 Prometheus 数据源

Connections → Data Sources
Add new data source → Prometheus
URL 填写：

http://prometheus:9090

点击 Save & Test，看到绿色 Success 即成功。

3️⃣ 导入【系统资源监控】面板

Dashboard ID：22869
效果：
- CPU
- 内存
- 网络流量
- 10 台机器一览

4️⃣ 导入【三网延迟 / 丢包】面板

Dashboard ID：22500
使用方式：
- Job：选择 probe-Agent-1 / probe-US / probe-HK
- Instance：选择 BJ Telecom / SH Unicom 等

第五阶段：Caddy反代 (Grafana&Prometheus)

在主控vps中输入以下代码：

curl -sL https://raw.githubusercontent.com/vlongx/caddy/main/caddy.sh -o /tmp/caddy.sh && bash /tmp/caddy.sh

按照提示进行配置即可完成

声明：
本文引用Bene大佬的帖子教程部份内容及Dashboard ID，大家可以去看看原文
https://www.nodeseek.com/post-218787-1
https://www.nodeseek.com/post-229346-1

目录CONTENT

Grafana 还能这么玩？性能监控 + 网络监控可视化（点亮地球）