| 2026-06-11 |
skydick: lock ldx NFS export to single client (10.0.75.15 + v6 /128)
...
Was exported to the whole 10.0.0.0/16 LAN; restrict to ldx's host only,
per request. v6 uses the /128 (fd99:23eb:1682::75:15), NOT a /64 — the
LAN ULA /64 is shared by every host, so a /64 would defeat the lock
(unlike zhuyz24's /64, which is their own private remote network).
Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
Dixiao-L
committed
2 days ago
|
Merge PR #2 (xuelin): skydisk allow zzy via xuelin ip
|
skydisk: allow zzy via xuelin ip
Xuelin Yang
committed
2 days ago
|
| 2026-06-09 |
skydick/networking: disable IPv6 temporary addresses on bond40g
...
The IPv6 SLAAC config aimed to pin the stable ::d1c0 token address as the
source, but NixOS emits net.ipv6.conf.bond40g.use_tempaddr=2 by default,
so the kernel generated and preferred a rotating temporary privacy
address. networkd's IPv6PrivacyExtensions=false does not override that
sysctl. Set the interface tempAddress = "disabled" so NixOS emits
use_tempaddr=0 — a storage server needs a fixed IPv6 source for ACLs,
logging, and reverse DNS.
Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
|
skydick/networking: mkForce IPv6PrivacyExtensions to fix eval conflict
...
The merged IPv6 PR set IPv6PrivacyExtensions = false on the 40-bond40g
networkd unit, but NixOS unconditionally sets it to "kernel" at normal
priority for every interface in networking.interfaces. Equal-priority
double definition aborted evaluation. mkForce resolves it while keeping
the intent: disable privacy extensions so the stable ::d1c0 token
address is the source for outbound IPv6.
Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
|
| 2026-06-08 |
skydisk: IPv6 不加 via 路由
Xuelin Yang
committed
5 days ago
|
skydisk: add IPv6 address
Xuelin Yang
committed
5 days ago
|
| 2026-05-26 |

nfs-rdma: bind listener lifecycle to nfs-server.service
...
The nfsd-rdma-listener oneshot writes "rdma 20049" to
/proc/fs/nfsd/portlist exactly once (at boot or first activation).
It then stays in `active (exited)`. But the nfsd kernel module's
portlist is wiped when nfs-server.service STOPS — and every
`nixos-rebuild switch` that touches services.nfs.server or its
config cycles nfs-server. After the cycle, port 20049 silently
goes dead while the listener unit shows "active" (it never re-ran).
Caught today (2026-05-26):
01:05:31 ldx ran nixos-rebuild switch on skydick
01:05:49 nfs-server stopped → portlist wiped
01:05:52 nfs-server started, port 2049 back up
nfsd-rdma-listener.service NOT restarted (still in
"active exited"), so 20049 stays unbound
~14h later: door-pek's NFS-RDMA mount finally hangs hard, df
blocks, telegraf disk input stops emitting, my
disk-data-full alert goes NoData and pages with
"[no value]" labels. qBittorrent unhealthy from
its /mnt/media-touching healthcheck. *arr health
issues climb.
Fix: add `partOf = [ "nfs-server.service" ]`. systemd then
propagates restart from the parent unit to the listener oneshot,
which re-runs its ExecStart (re-writes "rdma 20049" into
portlist) on every nfs-server cycle. Also added the listener
to nfs-server.service's wantedBy so a stop-then-start sequence
brings the listener back too.
Manual mitigation already applied — port 20049 is back up after
`sudo systemctl restart nfsd-rdma-listener.service`. This commit
is the structural fix so the next nfs-server cycle doesn't
silently break NFS-RDMA again.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
|
skydick: onboard zhuyz24 to datapool
...
LDAP user (uidNumber 2200000020). Local override pins UID; GID stays at
100 (users) per existing convention so on-disk ownership matches what
NFS exports anongid to.
|
| 2026-05-16 |

Revert "skydick/samba: advertise RSS + speed for SMB Multichannel"
...
The `interfaces = "lo;capability=DYNAMIC,speed=1 10.0.1.1;capability=RSS,speed=..."`
directive from 4f21721 is malformed for Samba's parser. Samba's
interfaces list uses spaces between entries and semicolons to attach
multichannel options to a single interface, but the parser in 4.22
splits on semicolons FIRST, producing 6 invalid tokens instead of 2
tagged interfaces:
lo capability=DYNAMIC speed=1 10.0.1.1 capability=RSS speed=80000000000
Symptom chain caused by this:
- getaddrinfo failed for "capability=DYNAMIC" (logged in smbd debug)
- interfaces table corrupted → rpcd_classic crash-loops with
NT_STATUS_CONNECTION_DISCONNECTED on svcctl endpoint init
- SMB auth from real clients rejected as "Authentication error"
even with valid credentials (LDAP backend was fine; the proximate
cause was the broken RPC fabric below the auth layer)
- Both ldx@MacBook and ldx@Mac-mini couldn't connect via Finder
SMB to \\10.0.1.1\ldx (verified with smbutil view -N //[email protected]
→ "server rejected the authentication")
- LDAP entries were intact (sambaNTPassword still 981fb5a6...,
POSIX userPassword still SSHA-hashed, account flags [U ])
Reverting drops `interfaces`, `bind interfaces only`, and keeps only
`server multi channel support = yes` so clients can still negotiate
multichannel (just without the server-side RSS advertise hint).
Re-enabling capability advertising can be tried later with verified
syntax. Candidate per the Samba wiki examples:
interfaces = bond40g;capability=RSS,speed=80000000000
(without loopback's DYNAMIC tag, which may be what tripped the parser).
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
ldx
committed
28 days ago
|
| 2026-05-15 |

skydick/samba: advertise RSS + speed for SMB Multichannel
...
Closest thing to "use the whole bond from SMB" without the
ksmbd/userspace fork dilemma. `server multi channel support = yes`
was already on but the client side (macOS, Windows) needs an
explicit capability hint to actually open more than one TCP
stream — without `interfaces ... ;capability=RSS,speed=...`,
Sequoia clients negotiate a single channel and that's it.
Now advertised:
lo capability=DYNAMIC speed=1 (loopback)
10.0.1.1 capability=RSS speed=80_000_000_000 (bond40g aggregate)
Effect: Sequoia 15.4+ opens up to 32 channels per SMB3 session;
LACP layer3+4 xmit hash distributes those streams across both
40 GbE bond slaves. Expected SMB throughput improvement is 2-4×
on bulk Finder copies vs single-channel TCP, with zero feature
loss (still userspace smbd: LDAP, Spotlight, fruit, recycle, TM
all intact).
Paired with `bind interfaces only = yes` so smbd doesn't listen
on any incidental interface (the existing `hosts allow` IP filter
stays as a second layer).
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
ldx
authored
29 days ago
Dixiao-L
committed
29 days ago
|
skydick/networking: tidy bond40g comments + reference cisco Po5
ldx
committed
29 days ago
|
skydick/networking: drop bond0, rename references to bond40g
...
bond0 (ConnectX-4 LX 25G, active-backup) was carrying 10.0.1.1/16
until the 2026-05-15 cutover onto bond40g (2× 40G ConnectX-3 LACP
layer3+4, MTU 9200). With cutover done and verified — Aggregator ID
1 on both slaves, jumbo end-to-end to gateway, traffic flowing —
the old bond is dead weight.
* Remove the `bonds.bond0` and `interfaces.bond0` blocks.
* Rename the remaining active `bond0` references to `bond40g`:
- `systemd.network.networks."40-bond0"` → `."40-bond40g"`
- `"net.ipv6.conf.bond0.accept_ra"` sysctl
- `skyworks.monitoring.netInterfaces = [ "bond0" ]`
- wait-online and RA-leak comments
* The freed enp4s0f0np0/enp4s0f1np1 are now standalone DOWN,
available for future use.
The live kernel `bond0` device persisted past nixos-rebuild
because networkd doesn't destroy unmanaged ifaces; cleaned up
manually with `ip link set <slave> nomaster; ip link del bond0`.
ldx
authored
29 days ago
ldx
committed
29 days ago
|

Revert "skydick/samba: enable SMB-Direct"
...
The previous commit (407a0b3) was based on a wrong premise. Userspace
Samba's smbd does NOT implement an SMB-Direct (RDMA) transport even
with --with-smb-direct passed to waf — the flag is silently accepted
but the resulting binary contains no ibverbs code (verified post-
deploy: ldd /bin/smbd shows no libibverbs linkage, smbd doesn't
listen on port 5445, and testparm rejects "smb direct" as an unknown
parameter).
SMB Direct in Linux is implemented in the kernel server `ksmbd`
(net/smb/server/ in the kernel tree), which is a separate
implementation from Samba. ksmbd would lose us:
- passdb backend = ldapsam (LDAP-backed posix users)
- Spotlight + tinysparql tracker integration
- vfs_fruit (metadata stream / macOS attrs / Time Machine sparse-
bundle support — central to ldx-timemachine share)
Not a worthwhile trade for the SMB workload, which is interactive
Finder browsing not bulk throughput. NFS-over-RDMA on the same
RoCE fabric (mlx4_ib via bond40g) covers the bulk-throughput case
already.
Replaced the misleading "SMB Direct" comment block with an explicit
"why this is NOT enabled" note so this doesn't get re-tried.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
ldx
committed
29 days ago
|

skydick/samba: enable SMB-Direct (SMB3 over RDMA, port 5445)
...
Two coordinated changes:
1. sambaFull overlay extended to build with SMB-Direct support:
- rdma-core added to buildInputs (provides libibverbs + librdmacm)
- --with-smb-direct passed via configureFlags so waf wires up the
transport layer at compile time
2. settings.global gains `smb direct = yes` + 8 MiB read/write knobs
matching the NFS rsize/wsize on the same fabric. smbd now advertises
capability 0x40 on protocol negotiate; clients that speak SMB-Direct
(Win Server / Win Pro for Workstations / macOS Sequoia 15.4+) can
upgrade SMB3 sessions onto the bond40g RoCE fabric. Clients without
SMB-Direct silently fall back to plain TCP on 445.
The 2×40 GbE bond40g (ConnectX-3, post-cutover 2026-05-15) is the same
RDMA fabric NFS uses; SMB-Direct shares it without contention since
the queue-pair fanout is per-session. The "10 GbE NIC" comment in the
settings block is stale — replaced with the current 80 Gbps reality.
Build cost: sambaFull overlay forces a local rebuild on deploy
(~10-15 min, one CPU bound on smbd compilation).
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
ldx
committed
29 days ago
|
skydick/nfs: enable NFS-over-RDMA listener on port 20049
...
Additive to the existing TCP listener — clients choose one transport
per mount, so adding RDMA doesn't disrupt anything. The hardware path
exists: mlx5_bond_0 (the LACP bond's RDMA representation) is ACTIVE
with link_layer=Ethernet (RoCEv2). Bonded RoCE on ConnectX-5 surfaces
both 25 GbE slaves as a single RDMA device, so RDMA traffic uses the
full 50 Gbps aggregate via the firmware's own LAG handling.
Clients (door-pek) use proto=rdma,port=20049 in nfs.nix to opt in.
RDMA transports have intrinsic parallelism (queue pairs), so nconnect
becomes a no-op — drop it from the mount options when switching.
Idempotent listener registration: nfsd's portlist accepts duplicate
adds with EINVAL, so the oneshot pre-checks before writing.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
ldx
committed
on 15 May
|
| 2026-05-14 |
skydick/networking: skyw VLAN MTU 9000 → 9200 jumbo frames
...
Match the skyw storage VLAN end-to-end:
cisco Po4 (switch port-channel): 9216
skydick (bond0 + skyw VLAN): 9200 ← this commit
door-pek (bond0 + skyw VLAN): 9200
The 9000 → 9200 bump leaves 16 bytes of headroom under cisco Po4 9216
for VLAN tag + L2 overhead.
Pairs with nix-infra commit 0xxxxxxx (door-pek/networking: skyw VLAN
MTU 9200 jumbo frames).
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
ldx
committed
on 14 May
|

skydick/nfs: crossmnt on per-user exports so child datasets are reachable
...
Per-user namespace is structured as:
dick/users/ldx — parent (quota boundary, no content of its own)
dick/users/ldx/files — SMB-exposed personal files (\\SKYDICK\ldx)
dick/users/ldx/bt-state — *arr / qBT runtime state
dick/users/ldx/timemachine — macOS sparsebundle target (\\SKYDICK\ldx-timemachine)
dick/users/ldx/vm — VM disk roots
Without crossmnt on the parent export, NFS clients mounting
/srv/users/ldx only see the parent dataset and hit empty placeholders
where the children mount. 2026-05-14 incident: door-pek's baidunetdisk
container bound /mnt/users/ldx/baidu (top-level placeholder location)
because /mnt/users/ldx/files showed empty over NFSv3 — downloads landed
outside the SMB-visible namespace until the dataset boundary was
diagnosed.
Adding crossmnt makes the children visible from the existing parent
export with no separate export entries; equivalent to `nohide` on each
child. Options (all_squash, anonuid=1000) inherit naturally — exactly
the behaviour the parent already provides.
Also applied to /srv/users/ye-lw21 for parity (same dataset shape).
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
ldx
committed
on 14 May
|
| 2026-05-08 |
monitoring: add nodeExporter option, enable on skydick
...
Replaces telegraf-as-only-monitoring with a declarative node-exporter that
the skyw-gw Prometheus scrapes directly. Telegraf->InfluxDB(door1) keeps
running until door1 retirement so the legacy skydick.json grafana
dashboard does not go dark mid-migration.
ldx
committed
on 8 May
|
| 2026-05-06 |
skydick: also disable RA in systemd-networkd userspace
...
Sysctl accept_ra=0 only stops the kernel — systemd-networkd does
its own RA processing in userspace and was caching the link-DNS
even after the kernel sysctl was applied. Override the auto-
generated 40-bond0.network with networkConfig.IPv6AcceptRA=false.
ldx
committed
on 6 May
|
skydick: suppress IPv6 RA processing on bond0
...
`networking.enableIPv6 = false` only disables IPv6 forwarding/use;
the kernel still accepts router advertisements unless told otherwise.
The gateway's radvd was seeding fd99:23eb:1682::1 as a per-link DNS
on bond0, which then took precedence in systemd-resolved for AAAA
queries — making blocked names error as 'Connection refused' instead
of returning a clean NXDOMAIN through 10.0.0.1's mosdns.
Set accept_ra=0 globally + on bond0 explicitly. Existing 'enableIPv6
= false' continues to handle the higher-level disable.
ldx
committed
on 6 May
|
skydick: route DNS via 10.0.0.1 only, AliDNS as fallback
...
Was: nameservers = [ "10.0.0.1" "223.5.5.5" ] — both treated as
primary by systemd-resolved, which then load-balanced to AliDNS
and bypassed mosdns's analytics blocking (resolvectl confirmed
hm.baidu.com / google-analytics.com leaking through).
Now: 10.0.0.1 only as primary, AliDNS demoted to fallbackDns so
it activates only when 10.0.0.1 is unreachable.
ldx
committed
on 6 May
|
| 2026-04-01 |
skydick: use async NFS export for media dataset
...
Media data is re-downloadable torrents — sync write guarantees are
unnecessary. Switching to async bypasses SLOG round-trips and improves
write throughput from 358 to 490 MB/s. All other exports remain sync.
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
|
skydick: add mirrored NVMe special vdev + mirrored SLOG
...
Replaced single-drive SLOG + L2ARC with dual-Optane mirrored setup:
- 690G mirrored special vdev for metadata + files ≤128K
- 8G mirrored SLOG for sync writes
- special_small_blocks=128K set in ZFS properties service
- nvme1 formatted to 4Kn to match nvme0
The special vdev is the biggest performance win for an HDD pool: all
metadata lookups, directory listings, and small files now hit NVMe.
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
|
| 2026-03-30 |
Update skydick README with InfluxDB and monitoring docs
...
Documents the fleet monitoring architecture: InfluxDB on ZFS,
Telegraf data sources, Grafana datasource layout, and ZFS
dataset management.
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
|
| 2026-03-29 |
Add InfluxDB v2 on skydick for fleet monitoring
...
- New modules/influxdb.nix: declarative InfluxDB v2 with ZFS-backed
storage (dick/system/influxdb, bind-mounted to /var/lib/influxdb2)
- monitoring.nix: make influxUrl configurable (default: skydick)
- skydick/default.nix: enable influxdb, point telegraf to localhost
- datapool.nix: document influxdb dataset in hierarchy + creation cmds
Consolidates all monitoring data (door1 + skydick + IoT sensors) into
a single InfluxDB on the ZFS storage server for infinite retention.
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
|
| 2026-03-24 |
harden and fix: nftables input chain, sudo, agenix, ZFS, NAT priority
...
- Add inet input_filter table to xlab-gateway (policy drop on WAN)
- Restrict NOPASSWD sudo to ldx only; ylw uses password sudo via wheel
- Restructure secrets.nix with admins list, prepare for ylw ed25519 key
- Add ye-lw21 to trusted-users in common.nix
- Remove contradictory relatime=on when atime=off on rpool
- Fix NAT postrouting priority: filter → srcnat
- Remove duplicate nixpkgs.hostPlatform from xlab-gateway hardware-configuration
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
|
skydick: document drive10 added as second hot spare
...
sg_format completed on drive10 (c9bcfa0f). Both LUNs added as spares,
bringing the pool to 8 mirrors + 2 hot spares (4 spare LUNs total).
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
|
| 2026-03-23 |
skydick: document pool expansion to 8 mirrors (~50.9T)
...
Added 4 new SAS Mach2 drives (drive6-9) as 4 mirror vdevs. Updated
drive inventory, layout diagram, expansion commands, and runbook
with sg_format/wipefs steps. drive10 pending sg_format completion.
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
|
| 2026-03-16 |
skydick: fix localsearch, I/O schedulers, wait-online, NIC tuning
...
- Replace broken localsearch oneshot with proper miner daemon running as
ldx on ldx's session bus (lingering enabled) so Samba Spotlight queries
from macOS clients actually work
- Fix systemd-networkd-wait-online 2-min boot timeout (anyInterface=true)
- Add storage-tuning service to enforce mq-deadline on SAS HDDs and
increase Mellanox ring buffers (1024→4096) at boot
- Simplify udev I/O scheduler rules to match by rotational attribute
instead of hardcoded kernel device names
- Update TM dataset recordsize comments to reflect 1M (applied on pool)
- Fix deprecated linuxPackages_6_6.perf → perf
ZFS properties applied separately on skydick:
com.sun:auto-snapshot=true on dick (was unset — no snapshots taken)
com.sun:auto-snapshot=false on dick/users/ldx/timemachine
recordsize=1M on dick/users/ldx/timemachine
Co-Authored-By: Claude Opus 4.6 <[email protected]>
|