diff --git a/hosts/skydick/README.md b/hosts/skydick/README.md index 4c96761..e90741a 100644 --- a/hosts/skydick/README.md +++ b/hosts/skydick/README.md @@ -1,6 +1,10 @@ # Skydick -`skydick` is the multi-user data-pool host at `10.0.1.1`. +`skydick` is the multi-user data-pool and monitoring host at `10.0.1.1`. + +## Services + +### Data pool (SMB / NFS / iSCSI) The full user guide and admin runbook live in [`DATAPOOL.md`](./DATAPOOL.md). @@ -12,3 +16,54 @@ - share paths for SMB and NFS - per-user dataset layout - verification and troubleshooting + +### InfluxDB (fleet monitoring) + +InfluxDB v2 runs as a native NixOS service (`services.influxdb2`), declared in +`modules/influxdb.nix`. It stores time-series data for the entire fleet. + +- **Data path**: `/var/lib/influxdb2` → bind-mounted from ZFS dataset `dick/system/influxdb` + (recordsize=128K, zstd compression, 500G quota) +- **API**: `http://10.0.1.1:8086` +- **Organization**: `door1` +- **Buckets**: `door1` (door1 host + iDRAC + IoT sensors), `skydick` (skydick host metrics) +- **Retention**: infinite (ZFS handles the storage) +- **Auth**: operator token in agenix (`secrets/influxdb-token.age`), same token + used by Telegraf on both skydick and door1 +- **InfluxQL v1 compat**: enabled via DBRP mapping (`door1` database → `door1` bucket), + used by Grafana dashboards that predate the Flux migration + +#### Data sources writing here + +| Source | Bucket | Transport | What | +|--------|--------|-----------|------| +| skydick Telegraf (local) | `skydick` | localhost:8086 | CPU, mem, disk, ZFS, SMART, sensors, zpool health | +| door1 Telegraf (Docker) | `door1` | 10.0.1.1:8086 | CPU, mem, disk, GPU, Docker, iDRAC SNMP, UPS, WireGuard, Fala IoT sensor | + +#### ZFS dataset + +```bash +# Created once (already done): +zfs create -o recordsize=128K -o mountpoint=/srv/system/influxdb \ + -o quota=500G dick/system/influxdb +chown influxdb:influxdb /srv/system/influxdb + +# Check: +zfs list -o name,used,avail,compressratio dick/system/influxdb +``` + +#### Grafana connection + +Grafana runs on door1 (Docker) and queries skydick over 10GbE. Three datasources: + +| UID | Name | Type | Bucket | Used by | +|-----|------|------|--------|---------| +| `eei6px00mwsu8e` | influxdb-door1 | InfluxQL | door1 | Legacy dashboards (iDRAC, system, storage, UPS, GPU, Docker) | +| `influxdb` | InfluxDB | Flux | door1 | Server room dashboard, newer alert rules | +| `influxdb-skydick` | InfluxDB-Skydick | Flux | skydick | Skydick system dashboards | + +### Telegraf + +Declared in `modules/monitoring.nix`. Collects host metrics and writes to the +local InfluxDB. The `influxUrl` option defaults to `http://10.0.1.1:8086` but +skydick overrides it to `http://127.0.0.1:8086` for local writes.