nfs-rdma: bind listener lifecycle to nfs-server.service
The nfsd-rdma-listener oneshot writes "rdma 20049" to
/proc/fs/nfsd/portlist exactly once (at boot or first activation).
It then stays in `active (exited)`. But the nfsd kernel module's
portlist is wiped when nfs-server.service STOPS — and every
`nixos-rebuild switch` that touches services.nfs.server or its
config cycles nfs-server. After the cycle, port 20049 silently
goes dead while the listener unit shows "active" (it never re-ran).

Caught today (2026-05-26):
  01:05:31  ldx ran nixos-rebuild switch on skydick
  01:05:49  nfs-server stopped → portlist wiped
  01:05:52  nfs-server started, port 2049 back up
            nfsd-rdma-listener.service NOT restarted (still in
            "active exited"), so 20049 stays unbound
  ~14h later: door-pek's NFS-RDMA mount finally hangs hard, df
            blocks, telegraf disk input stops emitting, my
            disk-data-full alert goes NoData and pages with
            "[no value]" labels. qBittorrent unhealthy from
            its /mnt/media-touching healthcheck. *arr health
            issues climb.

Fix: add `partOf = [ "nfs-server.service" ]`. systemd then
propagates restart from the parent unit to the listener oneshot,
which re-runs its ExecStart (re-writes "rdma 20049" into
portlist) on every nfs-server cycle. Also added the listener
to nfs-server.service's wantedBy so a stop-then-start sequence
brings the listener back too.

Manual mitigation already applied — port 20049 is back up after
`sudo systemctl restart nfsd-rdma-listener.service`. This commit
is the structural fix so the next nfs-server cycle doesn't
silently break NFS-RDMA again.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
1 parent 3778868 commit a51f7a4a3343e99789bdd16d000bfe064c92ffa8
@Dixiao-L Dixiao-L authored 13 days ago
Showing 1 changed file
View
hosts/skydick/datapool.nix