systemd hardening causes starting with 0 workers and partial outbound federation, ignores `clusterLimit` config option
so after finding out #586 (closed), i started fiddling with systemd unit hardening, and i have stumbled upon the offending directive: ProcSubset=pid
this is my systemd unit for reference:
sharkey.service
[Unit]
Description=Sharkey fedi daemon
[Service]
Type=simple
User=sharkey
Group=sharkey
SupplementaryGroups=redis postgres
ExecStart=/usr/bin/pnpm start
WorkingDirectory=/opt/sharkey/sharkey
Environment="NODE_OPTIONS=--max-old-space-size=8192"
Environment="NODE_ENV=production"
TimeoutSec=60
Restart=always
ReadWritePaths=/opt/sharkey
RuntimeDirectory=sharkey
RuntimeDirectoryMode=0750
SecureBits=noroot noroot-locked no-setuid-fixup no-setuid-fixup-locked
#RestrictFileSystems=btrfs @temporary
#RestrictFileSystems=~@known @privileged-api @network @historical-block @common-block @auxiliary-api @basic-api
#SocketBindDeny=any
DynamicUser=true
CapabilityBoundingSet=
AmbientCapabilities=
MemoryHigh=15000M
MemoryMax=17000M
DevicePolicy=closed
PrivateTmp=false
ProtectSystem=strict
ProtectHome=true
PrivateDevices=true
NoNewPrivileges=true
RestrictNamespaces=true
RestrictRealtime=true
RestrictSUIDSGID=true
#ProcSubset=pid
#RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX AF_NETLINK
ProtectControlGroups=true
ProtectHostname=true
ProtectKernelLogs=true
ProtectKernelModules=true
ProtectKernelTunables=true
ProtectProc=invisible
PrivateIPC=true
RemoveIPC=true
PrivateUsers=true
#MemoryDenyWriteExecute=true
LockPersonality=true
ProtectClock=true
UMask=0077
#SystemCallFilter=@system-service @resources @privileged
#SystemCallFilter=@system-service @ipc
#SystemCallFilter=@system-service @resources
SystemCallFilter=~@clock @debug @module @mount @reboot @swap @cpu-emulation @obsolete
#SystemCallFilter=~@clock @debug @module @mount @reboot @swap @cpu-emulation @obsolete @timer @chown @setuid @keyring @ipc
SystemCallArchitectures=native
LimitNOFILE=9999999
[Install]
WantedBy=multi-user.target
the relevant like is #ProcSubset=pid
. this directive restricts how an application can read the /proc
filesystem for privacy and host security reasons: https://www.freedesktop.org/software/systemd/man/latest/systemd.exec.html#ProcSubset=
now, i don't know why sharkey needs this filesystem to determine how many workers it should let me listen on. but i fail to see any valid reason this should ever be 0
just because it can't read some APIs in /proc
. falling back to 0
is what caused #586 (closed) for me. it should just fallback to 1
like it defaults to, or actually respect clusterLimit
.
the next problem i found is this entire time i had clusterLimit
set to 8, because i have tons of resources. so, why didn't it just use this if this is what determines how many workers to listen on? if i have ProcSubset=pid
set, no matter what sharkey will just completely ignore clusterLimit
and always start with 0 workers.
my kind requests with this one are:
- if somehow sharkey fails to determine how many workers it should let me listen on (why is this a thing though? it seems very arbitrary, especially if in use by autoscaling environments), it should fallback to 1, or use the defined
clusterLimit
- respect
clusterLimit
in all cases