|
nfs-rdma: bind listener lifecycle to nfs-server.service
The nfsd-rdma-listener oneshot writes "rdma 20049" to
/proc/fs/nfsd/portlist exactly once (at boot or first activation).
It then stays in `active (exited)`. But the nfsd kernel module's
portlist is wiped when nfs-server.service STOPS — and every
`nixos-rebuild switch` that touches services.nfs.server or its
config cycles nfs-server. After the cycle, port 20049 silently
goes dead while the listener unit shows "active" (it never re-ran).
Caught today (2026-05-26):
01:05:31 ldx ran nixos-rebuild switch on skydick
01:05:49 nfs-server stopped → portlist wiped
01:05:52 nfs-server started, port 2049 back up
nfsd-rdma-listener.service NOT restarted (still in
"active exited"), so 20049 stays unbound
~14h later: door-pek's NFS-RDMA mount finally hangs hard, df
blocks, telegraf disk input stops emitting, my
disk-data-full alert goes NoData and pages with
"[no value]" labels. qBittorrent unhealthy from
its /mnt/media-touching healthcheck. *arr health
issues climb.
Fix: add `partOf = [ "nfs-server.service" ]`. systemd then
propagates restart from the parent unit to the listener oneshot,
which re-runs its ExecStart (re-writes "rdma 20049" into
portlist) on every nfs-server cycle. Also added the listener
to nfs-server.service's wantedBy so a stop-then-start sequence
brings the listener back too.
Manual mitigation already applied — port 20049 is back up after
`sudo systemctl restart nfsd-rdma-listener.service`. This commit
is the structural fix so the next nfs-server cycle doesn't
silently break NFS-RDMA again.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
|
|---|
|
|
| hosts/skydick/datapool.nix |
|---|