| 2026-05-26 |

nfs-rdma: bind listener lifecycle to nfs-server.service
...
The nfsd-rdma-listener oneshot writes "rdma 20049" to
/proc/fs/nfsd/portlist exactly once (at boot or first activation).
It then stays in `active (exited)`. But the nfsd kernel module's
portlist is wiped when nfs-server.service STOPS — and every
`nixos-rebuild switch` that touches services.nfs.server or its
config cycles nfs-server. After the cycle, port 20049 silently
goes dead while the listener unit shows "active" (it never re-ran).
Caught today (2026-05-26):
01:05:31 ldx ran nixos-rebuild switch on skydick
01:05:49 nfs-server stopped → portlist wiped
01:05:52 nfs-server started, port 2049 back up
nfsd-rdma-listener.service NOT restarted (still in
"active exited"), so 20049 stays unbound
~14h later: door-pek's NFS-RDMA mount finally hangs hard, df
blocks, telegraf disk input stops emitting, my
disk-data-full alert goes NoData and pages with
"[no value]" labels. qBittorrent unhealthy from
its /mnt/media-touching healthcheck. *arr health
issues climb.
Fix: add `partOf = [ "nfs-server.service" ]`. systemd then
propagates restart from the parent unit to the listener oneshot,
which re-runs its ExecStart (re-writes "rdma 20049" into
portlist) on every nfs-server cycle. Also added the listener
to nfs-server.service's wantedBy so a stop-then-start sequence
brings the listener back too.
Manual mitigation already applied — port 20049 is back up after
`sudo systemctl restart nfsd-rdma-listener.service`. This commit
is the structural fix so the next nfs-server cycle doesn't
silently break NFS-RDMA again.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
|
skydick: onboard zhuyz24 to datapool
...
LDAP user (uidNumber 2200000020). Local override pins UID; GID stays at
100 (users) per existing convention so on-disk ownership matches what
NFS exports anongid to.
|
| 2026-05-16 |

Revert "skydick/samba: advertise RSS + speed for SMB Multichannel"
...
The `interfaces = "lo;capability=DYNAMIC,speed=1 10.0.1.1;capability=RSS,speed=..."`
directive from 4f21721 is malformed for Samba's parser. Samba's
interfaces list uses spaces between entries and semicolons to attach
multichannel options to a single interface, but the parser in 4.22
splits on semicolons FIRST, producing 6 invalid tokens instead of 2
tagged interfaces:
lo capability=DYNAMIC speed=1 10.0.1.1 capability=RSS speed=80000000000
Symptom chain caused by this:
- getaddrinfo failed for "capability=DYNAMIC" (logged in smbd debug)
- interfaces table corrupted → rpcd_classic crash-loops with
NT_STATUS_CONNECTION_DISCONNECTED on svcctl endpoint init
- SMB auth from real clients rejected as "Authentication error"
even with valid credentials (LDAP backend was fine; the proximate
cause was the broken RPC fabric below the auth layer)
- Both ldx@MacBook and ldx@Mac-mini couldn't connect via Finder
SMB to \\10.0.1.1\ldx (verified with smbutil view -N //[email protected]
→ "server rejected the authentication")
- LDAP entries were intact (sambaNTPassword still 981fb5a6...,
POSIX userPassword still SSHA-hashed, account flags [U ])
Reverting drops `interfaces`, `bind interfaces only`, and keeps only
`server multi channel support = yes` so clients can still negotiate
multichannel (just without the server-side RSS advertise hint).
Re-enabling capability advertising can be tried later with verified
syntax. Candidate per the Samba wiki examples:
interfaces = bond40g;capability=RSS,speed=80000000000
(without loopback's DYNAMIC tag, which may be what tripped the parser).
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
ldx
committed
28 days ago
|
| 2026-05-15 |

skydick/samba: advertise RSS + speed for SMB Multichannel
...
Closest thing to "use the whole bond from SMB" without the
ksmbd/userspace fork dilemma. `server multi channel support = yes`
was already on but the client side (macOS, Windows) needs an
explicit capability hint to actually open more than one TCP
stream — without `interfaces ... ;capability=RSS,speed=...`,
Sequoia clients negotiate a single channel and that's it.
Now advertised:
lo capability=DYNAMIC speed=1 (loopback)
10.0.1.1 capability=RSS speed=80_000_000_000 (bond40g aggregate)
Effect: Sequoia 15.4+ opens up to 32 channels per SMB3 session;
LACP layer3+4 xmit hash distributes those streams across both
40 GbE bond slaves. Expected SMB throughput improvement is 2-4×
on bulk Finder copies vs single-channel TCP, with zero feature
loss (still userspace smbd: LDAP, Spotlight, fruit, recycle, TM
all intact).
Paired with `bind interfaces only = yes` so smbd doesn't listen
on any incidental interface (the existing `hosts allow` IP filter
stays as a second layer).
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
ldx
authored
29 days ago
Dixiao-L
committed
29 days ago
|

Revert "skydick/samba: enable SMB-Direct"
...
The previous commit (407a0b3) was based on a wrong premise. Userspace
Samba's smbd does NOT implement an SMB-Direct (RDMA) transport even
with --with-smb-direct passed to waf — the flag is silently accepted
but the resulting binary contains no ibverbs code (verified post-
deploy: ldd /bin/smbd shows no libibverbs linkage, smbd doesn't
listen on port 5445, and testparm rejects "smb direct" as an unknown
parameter).
SMB Direct in Linux is implemented in the kernel server `ksmbd`
(net/smb/server/ in the kernel tree), which is a separate
implementation from Samba. ksmbd would lose us:
- passdb backend = ldapsam (LDAP-backed posix users)
- Spotlight + tinysparql tracker integration
- vfs_fruit (metadata stream / macOS attrs / Time Machine sparse-
bundle support — central to ldx-timemachine share)
Not a worthwhile trade for the SMB workload, which is interactive
Finder browsing not bulk throughput. NFS-over-RDMA on the same
RoCE fabric (mlx4_ib via bond40g) covers the bulk-throughput case
already.
Replaced the misleading "SMB Direct" comment block with an explicit
"why this is NOT enabled" note so this doesn't get re-tried.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
ldx
committed
29 days ago
|

skydick/samba: enable SMB-Direct (SMB3 over RDMA, port 5445)
...
Two coordinated changes:
1. sambaFull overlay extended to build with SMB-Direct support:
- rdma-core added to buildInputs (provides libibverbs + librdmacm)
- --with-smb-direct passed via configureFlags so waf wires up the
transport layer at compile time
2. settings.global gains `smb direct = yes` + 8 MiB read/write knobs
matching the NFS rsize/wsize on the same fabric. smbd now advertises
capability 0x40 on protocol negotiate; clients that speak SMB-Direct
(Win Server / Win Pro for Workstations / macOS Sequoia 15.4+) can
upgrade SMB3 sessions onto the bond40g RoCE fabric. Clients without
SMB-Direct silently fall back to plain TCP on 445.
The 2×40 GbE bond40g (ConnectX-3, post-cutover 2026-05-15) is the same
RDMA fabric NFS uses; SMB-Direct shares it without contention since
the queue-pair fanout is per-session. The "10 GbE NIC" comment in the
settings block is stale — replaced with the current 80 Gbps reality.
Build cost: sambaFull overlay forces a local rebuild on deploy
(~10-15 min, one CPU bound on smbd compilation).
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
ldx
committed
29 days ago
|
skydick/nfs: enable NFS-over-RDMA listener on port 20049
...
Additive to the existing TCP listener — clients choose one transport
per mount, so adding RDMA doesn't disrupt anything. The hardware path
exists: mlx5_bond_0 (the LACP bond's RDMA representation) is ACTIVE
with link_layer=Ethernet (RoCEv2). Bonded RoCE on ConnectX-5 surfaces
both 25 GbE slaves as a single RDMA device, so RDMA traffic uses the
full 50 Gbps aggregate via the firmware's own LAG handling.
Clients (door-pek) use proto=rdma,port=20049 in nfs.nix to opt in.
RDMA transports have intrinsic parallelism (queue pairs), so nconnect
becomes a no-op — drop it from the mount options when switching.
Idempotent listener registration: nfsd's portlist accepts duplicate
adds with EINVAL, so the oneshot pre-checks before writing.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
ldx
committed
on 15 May
|
| 2026-05-14 |

skydick/nfs: crossmnt on per-user exports so child datasets are reachable
...
Per-user namespace is structured as:
dick/users/ldx — parent (quota boundary, no content of its own)
dick/users/ldx/files — SMB-exposed personal files (\\SKYDICK\ldx)
dick/users/ldx/bt-state — *arr / qBT runtime state
dick/users/ldx/timemachine — macOS sparsebundle target (\\SKYDICK\ldx-timemachine)
dick/users/ldx/vm — VM disk roots
Without crossmnt on the parent export, NFS clients mounting
/srv/users/ldx only see the parent dataset and hit empty placeholders
where the children mount. 2026-05-14 incident: door-pek's baidunetdisk
container bound /mnt/users/ldx/baidu (top-level placeholder location)
because /mnt/users/ldx/files showed empty over NFSv3 — downloads landed
outside the SMB-visible namespace until the dataset boundary was
diagnosed.
Adding crossmnt makes the children visible from the existing parent
export with no separate export entries; equivalent to `nohide` on each
child. Options (all_squash, anonuid=1000) inherit naturally — exactly
the behaviour the parent already provides.
Also applied to /srv/users/ye-lw21 for parity (same dataset shape).
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
ldx
committed
on 14 May
|
| 2026-04-01 |
skydick: use async NFS export for media dataset
...
Media data is re-downloadable torrents — sync write guarantees are
unnecessary. Switching to async bypasses SLOG round-trips and improves
write throughput from 358 to 490 MB/s. All other exports remain sync.
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
|
skydick: add mirrored NVMe special vdev + mirrored SLOG
...
Replaced single-drive SLOG + L2ARC with dual-Optane mirrored setup:
- 690G mirrored special vdev for metadata + files ≤128K
- 8G mirrored SLOG for sync writes
- special_small_blocks=128K set in ZFS properties service
- nvme1 formatted to 4Kn to match nvme0
The special vdev is the biggest performance win for an HDD pool: all
metadata lookups, directory listings, and small files now hit NVMe.
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
|
| 2026-03-29 |
Add InfluxDB v2 on skydick for fleet monitoring
...
- New modules/influxdb.nix: declarative InfluxDB v2 with ZFS-backed
storage (dick/system/influxdb, bind-mounted to /var/lib/influxdb2)
- monitoring.nix: make influxUrl configurable (default: skydick)
- skydick/default.nix: enable influxdb, point telegraf to localhost
- datapool.nix: document influxdb dataset in hierarchy + creation cmds
Consolidates all monitoring data (door1 + skydick + IoT sensors) into
a single InfluxDB on the ZFS storage server for infinite retention.
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
|
| 2026-03-24 |
skydick: document drive10 added as second hot spare
...
sg_format completed on drive10 (c9bcfa0f). Both LUNs added as spares,
bringing the pool to 8 mirrors + 2 hot spares (4 spare LUNs total).
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
|
| 2026-03-23 |
skydick: document pool expansion to 8 mirrors (~50.9T)
...
Added 4 new SAS Mach2 drives (drive6-9) as 4 mirror vdevs. Updated
drive inventory, layout diagram, expansion commands, and runbook
with sg_format/wipefs steps. drive10 pending sg_format completion.
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
|
| 2026-03-16 |
skydick: fix localsearch, I/O schedulers, wait-online, NIC tuning
...
- Replace broken localsearch oneshot with proper miner daemon running as
ldx on ldx's session bus (lingering enabled) so Samba Spotlight queries
from macOS clients actually work
- Fix systemd-networkd-wait-online 2-min boot timeout (anyInterface=true)
- Add storage-tuning service to enforce mq-deadline on SAS HDDs and
increase Mellanox ring buffers (1024→4096) at boot
- Simplify udev I/O scheduler rules to match by rotational attribute
instead of hardcoded kernel device names
- Update TM dataset recordsize comments to reflect 1M (applied on pool)
- Fix deprecated linuxPackages_6_6.perf → perf
ZFS properties applied separately on skydick:
com.sun:auto-snapshot=true on dick (was unset — no snapshots taken)
com.sun:auto-snapshot=false on dick/users/ldx/timemachine
recordsize=1M on dick/users/ldx/timemachine
Co-Authored-By: Claude Opus 4.6 <[email protected]>
|
skydick: drop fruit:time machine max size (buggy bandsize parser)
...
Samba's fruit_get_bandsize() regex-based plist parser fails on valid
Info.plist files. Rely on ZFS refquota on dick/users/ldx instead.
Co-Authored-By: Claude Opus 4.6 <[email protected]>
|
skydick: upgrade to nixos-25.11, add Spotlight + recycle bin
...
- Upgrade nixpkgs from nixos-24.11 to nixos-25.11 (Samba 4.20→4.22)
- Build sambaFull with Spotlight/tracker support via overlay:
- Patch waf to detect tracker-sparql-3.0 (upstream only checks ≤2.0)
- Patch rpcd_mdssvc for tinysparql 3.x bus API rename
(get_async/get_finish → bus_new_async/bus_new_finish)
- Disable tevent_glib_tracker test (uses removed tracker 2.x API, test-only)
- Add icu for Unicode normalisation required by Spotlight
- Add Spotlight search with tracker backend for Finder search over SMB
- Add localsearch indexer service for public, media, and ldx files
- Add recycle bin (vfs recycle) for public/homes shares
- Add global fruit VFS for Apple compatibility
- Move fruit:model=TimeCapsule to ldx-timemachine share only
- Disable Spotlight on timemachine share
- Fix package renames for 25.11: targetcli→targetcli-fb, dstat→dool
Co-Authored-By: Claude Opus 4.6 <[email protected]>
|
skydick: bump TM max size to 3T for three Macs
...
1T was too tight — 579G already used across 3 sparsebundles left only
~450G visible to macOS. 3T leaves headroom for growth while keeping 7T
of the 10T ldx quota available for other datasets.
Co-Authored-By: Claude Opus 4.6 <[email protected]>
|
skydick: rename TM share to ldx-timemachine
...
Per-user naming makes ownership unambiguous. The share points to ldx's
dedicated timemachine ZFS dataset, not a shared location.
Co-Authored-By: Claude Opus 4.6 <[email protected]>
|
skydick: add Samba Time Machine share for macOS backups
...
Exposes dick/users/ldx/timemachine as an SMB share with Apple fruit VFS
extensions (fruit:time machine = yes) so Macs can back up directly to
skydick instead of door1. Capped at 1T via fruit:time machine max size.
Co-Authored-By: Claude Opus 4.6 <[email protected]>
|
skydick: add timemachine dataset for macOS backups
...
Dedicated ZFS dataset with recordsize=64K and zstd compression, better
matched for Time Machine sparsebundle band files than the media dataset
(1M recordsize, compression=off) where backups were previously dumped.
Co-Authored-By: Claude Opus 4.6 <[email protected]>
|
| 2026-03-15 |
Merge branch 'main' of https://gitbucket.skyw.top/git/Skyworks/skyworks-Nix-infra
|
skydick: clean SMB tuning and enforce datapool atime
|
add wg-peers to SMB allowed ips
|
Add RSS support flag from server side
|
skydick: keep SMB passwords synced from LDAP
|
skydick: switch Samba to ldapsam, rename ylw→ye-lw21, drop legacy datasets
...
- Samba passdb backend changed from tdbsam to ldapsam:ldap://10.0.0.1
- Added samba-ldap-admin-password oneshot to seed LDAP admin cred before smbd
- Pinned storage group to GID 997 to match LDAP posixGroup
- Renamed ylw to ye-lw21 across all hosts (users.nix, skydick, xlab-gateway)
- Removed legacy tmpfiles and NFS exports (share/backup/torrent/vm destroyed)
- Added bootstrap LDIF for sambaDomain, storage group, machines OU
Co-Authored-By: Claude Opus 4.6 <[email protected]>
|
| 2026-03-14 |
skydick: redesign datapool with per-user datasets and service model
...
Replace flat purpose-first layout (share/media/torrent/backup/vm) with
user-first hierarchy:
- dick/public: shared collaborative files
- dick/media: shared media with data/ + library/ in one hardlink domain
- dick/users/<user>/{files,bt-state,vm}: per-user private trees with
ZFS quotas, per-user NFS all_squash, and Samba [homes]
- dick/system/{backup,vm}: admin-only system datasets
- dick/templates/vm: read-only shared VM base images
NFS exports split media into rw writer (all_squash to qbittorrent) and
ro reader (/media/library). Per-user exports use explicit anonuid/gid.
Samba uses [public] for shared, [homes] for per-user, [media] ro for
library. Legacy exports preserved for active migration.
Add DATAPOOL.md with user/admin guide covering SMB/NFS connection,
new-user provisioning, quotas, and troubleshooting.
Co-Authored-By: Claude Opus 4.6 <[email protected]>
|
skydick: increase NFS server threads to 64
...
Default 8 threads is insufficient for 10GbE throughput.
64 threads allow better parallelism for concurrent NFS clients.
Co-Authored-By: Claude Opus 4.6 <[email protected]>
|
skydick: use all_squash for media/torrent NFS exports
...
Map all NFS client UIDs to qbittorrent:storage (900:997) on
media and torrent exports. Eliminates need for UID/GID
coordination between NFS clients and server.
Co-Authored-By: Claude Opus 4.6 <[email protected]>
|
skydick: fix qbittorrent UID collision with ylw
...
UID 1002 was already assigned to ylw on skydick. Change qbittorrent
system user to UID 900 to avoid the collision. NFS sec=sys maps by
UID number, so this must not conflict with any normal user.
Co-Authored-By: Claude Opus 4.6 <[email protected]>
|