Linux | /contrib/famzah

April 2, 2026
by Ivan Zahariev Leave a comment

How to Limit the Download Speed of a Docker Container

TL;DR: if a container uses the usual Docker bridge networking, you can often limit its download speed from the host with tc and a TBF qdisc. The useful detail is that the container is typically connected through a veth pair, so traffic that is incoming (download) for the container is outgoing (upload) on the host-side veth.

Limiting upload is trivial because Linux shapes egress directly. Limiting download is more awkward, because the packets are already arriving from the outside world and you can’t control how fast a remote host sends packets. But with the usual Docker bridge setup, the host is effectively the last router hop before the container. Once the packets are about to cross from the host namespace into the container namespace, they are host egress and container ingress at the same time. That is the point where we can apply the limit.

This is not meant to be a universal recipe for every Docker setup. It is a short working example for the common case where the container sits behind the default bridge plumbing and has a host-side veth. If you use network_mode: host, macvlan, ipvlan, rootless Docker, or anything more exotic, this exact approach may not apply.

Here is the script I used for one of my Compose services:

cd ~/your-docker-project-compose-dir

CID=$(docker compose ps -q ollama)
PID=$(docker inspect -f '{{.State.Pid}}' "$CID")
IFACE=$(sudo nsenter -t "$PID" -n sh -c "ip -o -4 route show to default | awk '{print \$5}'")
IDX=$(sudo nsenter -t "$PID" -n -m cat /sys/class/net/$IFACE/iflink)
VETH=$(ip -o link | awk -F': ' -v idx="$IDX" '$1 == idx {print $2}' | cut -d@ -f1)

echo "CID=$CID"
echo "PID=$PID"
echo "IFACE=$IFACE"
echo "IDX=$IDX"
echo "VETH=$VETH"

sudo tc qdisc replace dev "$VETH" root tbf rate 40000kbit burst 32kbit latency 400ms
sudo tc qdisc show dev "$VETH"

In this example, 40000kbit means 40 Mbit/s, which is about 5 MBytes/s. The script finds the container PID, enters its network namespace just long enough to discover the default interface, resolves the matching host-side peer, and then attaches the rate limit there.

One practical caveat is that the host-side veth name is not stable across container recreation. If you rebuild or recreate the container, run the script again. To remove the limit later, resolve $VETH the same way and then run sudo tc qdisc del dev "$VETH" root.

March 29, 2026
by Ivan Zahariev Leave a comment

Speed up SSH connections by splitting known_hosts per host

For a while, my mpssh runs were getting slow. I use it daily against about 1400 Linux hosts, and a trivial true command across 999 parallel SSH sessions had drifted to roughly two minutes. During the run, my desktop would get a sharp CPU spike, and the mpssh executions started interfering with interactive work. I started wondering whether newer OpenSSH packages, the growing host count, or even ssh-agent were to blame.

It turned out that the biggest win was splitting my 2.1 MB ~/.ssh/known_hosts into one small file per host. The ssh_config(5) documentation says that UserKnownHostsFile accepts runtime tokens such as %h, so a path like ~/.ssh/known_hosts_single/%h is valid.

I did not prove the exact lookup algorithm OpenSSH uses internally, so I will not speculate too much there. But the benchmark was clear enough: once I stopped feeding SSH a monolithic known_hosts file, the runtime dropped from about two minutes to about thirty seconds with the same host list and the same default 50 ms delay between forks.

Benchmark Summary

Setup	Best time	What it showed
Baseline, default SSH behavior, monolithic `known_hosts`, parallelism of 999	2m03.482s	This was the original pain point.
Per-host `known_hosts`, default 50 ms delay	26.840s	About 4.6x faster without any aggressive client-side tuning.
Same per-host setup, but 0 ms delay	16.228s	Faster again, but much harsher on local CPU.
Per-host setup plus agent/key experiments	Roughly 27-32s at 50 ms	Disabling `ssh-agent` or switching RSA to Ed25519 did not materially change the result.

The spawn delay also mattered, but in a different way. Reducing it from the default 50 ms to 5 ms or 0 ms shaved off more seconds, but it also pushed much harder on local CPU. In one 0 ms run, CPU idle dropped to 0% for about five seconds. That is why I kept the default 50 ms in normal use. Getting down to about 27 to 30 seconds while keeping the machine responsive was already good enough.

I also chased a couple of dead ends. I saw ssh-agent spike to 100% CPU often enough that it looked suspicious, so I tested a temporary passwordless key and also forced IdentityAgent=none. I also tried Ed25519 instead of my older RSA key. Neither changed the overall picture in a meaningful way.

My ~/.ssh/config is also fairly large. I even tried splitting the alias-heavy part into a separate include file of about 78 KB, guarded by a Match originalhost stanza, because mpssh uses the full hostnames and those aliases are irrelevant for the benchmarked hosts. That did not help either. OpenSSH still reads the included file in order to parse it, even if it does not end up matching the current host. I still keep that Match stanza around, though, because it may become useful in the future if OpenSSH ever starts handling this case more efficiently.

			
# mpssh uses full hostnames, so this alias file is irrelevant here
Match originalhost ??,???,????
    Include config.short-host-aliases

How To Split `known_hosts` Per Host

I wrote a small helper script for this and put it in the mpssh repository. The script reads hostnames from standard input or from a file, resolves hostnames to IP addresses, extracts matching entries from the monolithic file with ssh-keygen -F, and writes one small file per host into ~/.ssh/known_hosts_single. It also handles custom-port entries such as [git.example.com]:7999.

If HashKnownHosts was enabled in your SSH configuration, converting usually requires a plain-text list of all your servers, because the monolithic file does not contain readable hostnames anymore. If HashKnownHosts was disabled, you can usually extract that list from the existing monolithic known_hosts file with a simple cat and awk pipeline.

Here is the migration flow I used, rewritten with generic hostnames and paths:

mv ~/.ssh/known_hosts ~/.ssh/known_hosts.monolith
mkdir -p ~/.ssh/known_hosts_single

python3 known_hosts_single/convert.py \
  --known-hosts-file ~/.ssh/known_hosts.monolith \
  --input-file ./servers.list \
  --progress

If you want to test a couple of entries first, the script can also read from standard input:

printf '%s\n' example.com '[git.example.com]:7999' 203.0.113.10 | \
python3 known_hosts_single/convert.py \
  --known-hosts-file ~/.ssh/known_hosts.monolith \
  --progress

Then edit ~/.ssh/config so that SSH uses the per-host files. I explicitly disable GlobalKnownHostsFile because my setup does not rely on a system-wide known_hosts file. If yours does, do not copy that line. I also set HashKnownHosts no, because once the host identity is already visible in the %h filename, hashing the contents of the tiny per-host file no longer buys much. I kept strict host key checking enabled because this was a performance optimization, not a security shortcut:

Host *
    GlobalKnownHostsFile none
    UserKnownHostsFile ~/.ssh/known_hosts_single/%h
    HashKnownHosts no
    StrictHostKeyChecking yes

The important part is %h. SSH expands it to the target hostname, so each connection only opens the tiny file for that host instead of making every connection consult one large shared file.

Reproducing The Benchmark

For an apples-to-apples comparison, these are the important commands. I kept -p 999 because that was the clean baseline I measured before and after the change:

# Baseline
time mpssh -p 999 -u root -f ./servers.list true

# Same host list, but with per-host known_hosts files
time mpssh -p 999 -u root -f ./servers.list \
  -O 'o UserKnownHostsFile=~/.ssh/known_hosts_single/%h' \
  -O 'o StrictHostKeyChecking=yes' \
  true

# More aggressive spawning
time mpssh -p 999 -d 0 -u root -f ./servers.list \
  -O 'o UserKnownHostsFile=~/.ssh/known_hosts_single/%h' \
  -O 'o StrictHostKeyChecking=yes' \
  true

If you want to experiment further, mpssh also lets you adjust the delay between SSH forks with -d MSEC. In my case, lower values were useful for benchmarks but not for everyday use because they pushed too much CPU pressure back onto the local machine.

One more thing worth keeping in mind is ControlMaster with ControlPersist. That OpenSSH feature can reuse an already established connection to the same host for later sessions. I have not benchmarked it for this workload, but for repeated connections to the same machines it has the potential to reduce SSH connection setup overhead a lot.

Long story short, if you fan out SSH connections to hundreds or thousands of hosts, do not assume that the network or the private key type is the only thing worth checking. A large known_hosts file can be enough to waste more than a minute and a lot of CPU per batch. Splitting it per host kept host key verification in place and made mpssh feel fast again.

December 20, 2024
by Ivan Zahariev Leave a comment

Achieving Zero-Downtime PHP-FPM Restarts and Atomic Updates

The Challenge with PHP-FPM Restarts

PHP-FPM (FastCGI Process Manager) is a powerful solution for managing PHP processes, but it poses challenges when updating PHP applications or configurations without impacting active requests. A key setting in PHP-FPM is process_control_timeout, which dictates how long the master process waits for a child to finish before forcefully terminating it on graceful restart. If a slow child lingers, incoming requests queue up, causing delays for the duration of the timeout. This delay can lead to significant downtime, especially for high-traffic applications.

The Solution: Zero-Downtime Restarts and Atomic Deployments

These challenges are addressed through a proof-of-concept designed to enable zero-downtime PHP-FPM restarts and atomic source code updates, ensuring seamless service continuity. Key components of the solution include:

Redundant PHP-FPM Pools with Replicas:
- PHP-FPM instances are managed in redundant pools with at least two replicas.
- While one replica restarts, the other remains active, ensuring no downtime.
Load Balancer Management:
- A lightweight load balancer dynamically toggles traffic between replicas during restarts, making the switch invisible to users.
Atomic Code Deployment:
- Instead of directly using the “release” symlink as the PHP-FPM working directory, the release target is mounted in an isolated user namespace before starting the PHP-FPM master processes.
- This ensures that changes to the symlink location can be made immediately for new deployments, while running PHP-FPM masters continue to use their isolated view of the release directory until they are restarted at a convenient time.

Important Considerations

Static Content and Atomicity:
- In this setup, Apache serves static content (files not ending in “.php") directly by following the “release” symlink. This means that updates to the “release” symlink immediately impact Apache-served static files, diverging from the atomic deployment of PHP sources.
- To achieve true atomic deployment of both static content and PHP files, the setup must be reworked. This could involve putting both PHP and static file serving behind a single backend and managing traffic between two such backends in the same way that traffic is currently managed between PHP-FPM backends.
Temporary Capacity Reduction:
- During the restart of one replica (PHP-FPM master), only half of the capacity is available. This capacity reduction should be considered in the context of the expected traffic load during deployments or restarts.
Increased Memory Usage:
- Running two (or more) identical PHP-FPM masters introduces higher memory consumption, as each master maintains its own independent OPcache. This redundancy ensures reliability and atomicity but comes at the cost of increased resource usage.

Demo, documentation and source code

The full code and setup guide are available on GitHub. Contributions and feedback are welcome!

August 27, 2024
by Ivan Zahariev Leave a comment

Troubleshooting Zigbee Network Stability and Device Connectivity | Raspberry Pi + ConBee

I have been running a Zigbee network with 20 Aqara & IKEA devices for almost two years now. The coordinator is a Raspberry Pi 400 server with a ConBee II USB dongle. It worked more or less flawlessly with very few issues until recently. I had to reboot the coordinator server in order to migrate its root file system from an SD card to a USB-connected SSD drive.

Once I rebooted the coordinator server, almost half of the devices couldn’t reconnect to the network. I upgraded the Zigbee2MQTT daemon, which bridges Zigbee to MQTT events but this didn’t help. A few more restarts didn’t help either. There are plenty of comments on the Internet that the Zigbee USB dongles aren’t stable and some users suggested that the USB port of the Raspberry Pi may be to blame, because it couldn’t supply enough (stable) power. This made sense and I purchased an active USB 3.0 hub with external power supply, which is especially designed for power hungry USB devices like HDDs, etc. Once I plugged in the ConBee II USB dongle into the active powered USB 3.0 hub, my Zigbee network started working again flawlessly!

There was only one more minor issue. While the network was incomplete I tried to remove and then re-pair an Aqara sensor, which didn’t work. This is something that other people are experiencing, too. Even after the Zigbee network was stable again, I couldn’t re-pair the sensor. I brought it close to the coordinator, the LED of the sensor was blinking as expected during re-pair but nothing was registered by the coordinator. Finally, I suspected that the battery of the sensor was too weak and couldn’t provide enough power for the re-pairing interview between the sensor and the coordinator. Once I replaced the battery, the re-pairing worked flawlessly on the first attempt. The LED was also blinking much stronger now. To be honest, the Zigbee2MQTT web interface had been showing the battery at 0% for a long time., but since the sensor was communicating events successfully, I didn’t pay attention to this indicator. I have a few more sensors with a 0% battery report but they all are still communicating events successfully.

May 24, 2023
by Ivan Zahariev Leave a comment

Arduino ESP32 development under Linux

I am using a Windows desktop and I run Linux as a Virtualbox guest. ESP32 development under Windows is super easy to set up – you only need to install the Arduino IDE. Unfortunately, it really bugged me that compilation time is so slow. I’m not enough experienced and test a lot on the real hardware, and slow compilation really slows down the entire development process.

The Arduino IDE v2 has improved and supports parallel compilation within a submodule, but still it works slower than expected on my computer and is not ideally parallelized. Additionally, I noticed that some libraries are recompiled every time which is a huge waste of time (and resources) because libraries don’t change. Only my sketch source code changes.

I decided to see how compilation performs on Linux. The Arduino project has a command-line tool to manage, compile and upload code. The tool is still in active development but works flawlessly for me.

Here is how I installed it on my Linux desktop:

wget https://downloads.arduino.cc/arduino-cli/arduino-cli_latest_Linux_64bit.tar.gz
tar -xf arduino-cli_latest_Linux_64bit.tar.gz
mv arduino-cli ~/bin/
rm arduino-cli_latest_Linux_64bit.tar.gz

arduino-cli config init
vi /home/famzah/.arduino15/arduino-cli.yaml
  # board_manager -> additional_urls -> add "https://raw.githubusercontent.com/espressif/arduino-esp32/gh-pages/package_esp32_index.json"
  # library -> enable_unsafe_install -> set to "true" # so that we can install .zip libraries
arduino-cli core update-index
arduino-cli core install esp32:esp32
arduino-cli board listall # you must see lots of "esp32" boards now

Here is how to compile a sketch and upload it to an ESP32 board:

cd %the_folder_where_you_saved_your_Arduino_sketch%

arduino-cli compile --warnings all -b esp32:esp32:nodemcu-32s WifiRelay.ino --output-dir /tmp/this-sketch-tmp

arduino-cli upload -v --input-dir /tmp/this-sketch-tmp -p /dev/ttyUSB0 -b esp32:esp32:nodemcu-32s

arduino-cli monitor -p /dev/ttyUSB0 -c baudrate=115200

I have created a small script to automate this task which also uses a dynamically created temporary folder, in order to avoid race conditions and artefacts. The sketch folder on my Windows host is shared (read-only) with my Linux guest. I still write the sketch source code on my Windows machine using the Arduino IDE.

The ESP32 board is connected to my Windows host but I need to communicate with it from my Linux guest. This is done easily in Virtualbox using USB filters. A USB filter allows you to do a direct pass-through of the ESP32 serial port. It also works if you plug in the ESP32 board dynamically while the Linux machine is already running.

March 5, 2023
by Ivan Zahariev 1 Comment

Custom key fetch script for cryptsetup @ systemd

Disk encryption in a Linux environment is a common setup and is easily done as demonstrated by this tutorial, for example. However the easy part vanishes quickly if you don’t want to keep the encryption key on the same machine and if you don’t want to enter the key manually on each server reboot.

How do we auto-mount (unattended boot) an encrypted disk but protect the data if the server gets stolen physically? One possible solution is to store the encryption key in a remote network location and fetch it ephemerally (if you are still allowed to) when the server starts. Cloudflare did this for their servers in Russia due to the military conflict.

A custom script to fetch the encryption key, use it and then discard it sounds like a good approach. Such a script is broadly called a “keyscript”. But this is not supported by systemd in the standard “/etc/crypttab” file which describes the encrypted block devices that are set up during system boot. Luckily, there is another way to get this done by using the following feature of “/etc/crypttab”:

If the specified key file path refers to an AF_UNIX stream socket in the file system, the key is acquired by connecting to the socket and reading it from the connection. This allows the implementation of a service to provide key information dynamically, at the moment when it is needed.

With “systemd” you can easily build a service which responds to Unix sockets (or any other socket type as described in the man page). The socket is controlled and supervised by “systemd” and the mechanism is called “socket-based activation”. You have the option to execute a new process for each socket request, or a single program can process all requests. In this case I’m using the first approach because it’s very simple to implement and because the load on this service is negligible.

Here is how the socket service definition looks like. It’s stored in a file named “/etc/systemd/system/fetch-luks-key-volume1.socket”:

[Unit]

Description=Socket activator for service "fetch-luks-key-volume1"
After=local-fs.target

# recommended by "man systemd.socket"
CollectMode=inactive-or-failed

[Socket]

ListenStream=/run/luks-key-volume1.sock
SocketUser=root
SocketGroup=root
SocketMode=0600
RemoveOnStop=true

# execute a new Service process for each request
Accept=yes

[Install]

WantedBy=sockets.target

A typical “systemd” service unit needs to be configured with the same name as the socket service. This is where the custom logic to fetch the key is executed. Because “systemd” feeds additional meta data to the service unit, its name must be suffixed with “@”. The whole file name is “/etc/systemd/system/fetch-luks-key-volume1@.service” and contains the following code:

[Unit]

Description=Remotely fetch LUKS key for "volume1"

After=network-online.target
Wants=network-online.target

[Service]

Type=simple
RuntimeMaxSec=10
ExecStart=curl --max-time 5 -sS https://my-restricted.secure/key.location
StandardOutput=socket
StandardError=journal
# ignore the LUKS request packet which specifies the volume (man crypttab)
StandardInput=null

The new files are activated in “systemd” in the following way:

systemctl daemon-reload
systemctl enable fetch-luks-key-volume1.socket
systemctl start fetch-luks-key-volume1.socket

There is no need to enable the “service” unit because it’s activated by the socket when needed and is then immediately terminated upon completion.

Here is a command-line test of the new system:

# ls -la /run/luks-key-volume1.sock
srw------- 1 root root 0 Mar  4 18:09 /run/luks-key-volume1.sock

# nc -U /run/luks-key-volume1.sock|md5sum
4f7bac5cf51037495e323e338100ad46  -

# journalctl -n 100
Mar 04 18:09:38 home-server systemd[1]: Reloading.
Mar 04 18:09:45 home-server systemd[1]: Starting Socket activator for service "fetch-luks-key-volume1"...
Mar 04 18:09:45 home-server systemd[1]: Listening on Socket activator for service "fetch-luks-key-volume1".
Mar 04 18:10:05 home-server systemd[1]: Started Remotely fetch LUKS key for "volume1" (PID 2371/UID 0).
Mar 04 18:10:05 home-server systemd[1]: fetch-luks-key-volume1@0-2371-0.service: Deactivated successfully.

You can use the newly created Unix socket in “/etc/crypttab” like this:

# <target name>  <source device>           <key file>                 <options>
backup-decrypted /dev/vg0/backup-encrypted /run/luks-key-volume1.sock luks,headless=true,nofail,keyfile-timeout=10,_netdev

Disclaimer: This “always on” remote key protection works only if you can disable the remote key quickly enough. If someone breaks into your home and steals your NAS server, you probably have more than enough time to disable the remote key which is accessible only by the remote IP address of your home network. But if you are targeted by highly skilled hackers who can physically breach into your server, then they could boot your Linux server into rescue mode (or read the hard disk physically) while they are still on your premises, find the URL where you keep your remote key and then fetch the key to use it later to decrypt what they managed to steal physically. The Mandos system tries to narrow the attack surface by facilitating keep-alive checks and auto-locking of the key server.

If your hardware supports UEFI Secure Boot and TPM 2.0, you can greatly improve the security of your encryption keys and encrypted data. Generally speaking, UEFI Secure Boot will ensure a fully verified boot chain (boot loader, initrd, running kernel). Only a verified system boot state can request the encryption keys from the TPM hardware device. This verified system boot state is something which you control and you can disable the Linux “rescue mode” or other ways of getting to the root file-system without supplying a password. Here are two articles (¹, ²) where this is further discussed.

Last but not least, if a highly-skilled attacker has enough time and physical access to your hardware, they can perform many different Evil maid attacks, install hardware backdoors on your keyboard, for example, or even read the encryption keys directly from your running RAM. Additionally, a system could also be compromised via the network, by various social engineering attacks, etc. You need to assess the likelihood of each attack against your data and decide which defense strategy is practical.

Update: This setup failed to boot after a regular OS upgrade. Probably due to incorrect ordering of the services. I didn’t have enough time to debug it and therefore created the file “/root/mount-home-backup” which does the mount manually:

#!/bin/bash
set -u

echo "Executing mount-home-backup"

set -x

systemctl start systemd-cryptsetup@backup\\x2ddecrypted.service
mount /home/backup

The I marked all definitions in “/etc/crypttab” and “/etc/fstab” with the option “noauto” which tells the system scripts to not try to mount the file systems at boot.

Finally I created the following service in “/etc/systemd/system/mount-home-backup.service”:

[Unit]

Description=Late mount for /home/backup
After=network-online.target fetch-luks-key-volume1.socket

[Service]

Type=oneshot
RemainAfterExit=yes

StandardOutput=journal
StandardError=journal

ExecStart=/root/mount-home-backup

[Install]

WantedBy=multi-user.target

This new service needs to be activated, too:

systemctl daemon-reload
systemctl enable mount-home-backup.service

March 4, 2023
by Ivan Zahariev Leave a comment

Proxy SSH traffic using Cloudflare Tunnels

Long story short, Cloudflare Tunnel started as a network service which lets you expose a web server with private IP address to the public Internet. This way you don’t have to punch a hole in your firewall infrastructure, in order to have inbound access to the server. There are additional benefits like the fact that nobody knows the real IP address of your server, they can’t DDoS you by sending malicious traffic, etc.

Today I was pleasantly surprised to discover that Cloudflare Tunnels can be used for SSH traffic as well. It’s true that most machines with an SSH server have public IP addresses. But think about the time when you want to access the Linux laptop or workstation of a relative, so that you can remotely control their desktop, in order to help them out. Modern Linux distros all provide remote desktop functionality but the question is how do you get direct network access to the Linux workstation.

If you can connect via SSH to a remote machine without a public IP address, then you can set up SSH port forwarding, in order to connect to their remote desktop local service, too.

Here is what you have to execute at the remote machine to which you want to connect:

$ wget https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64
$ chmod +x cloudflared-linux-amd64 
$ ./cloudflared-linux-amd64 tunnel --url ssh://localhost:22

2023-03-04T20:51:16Z INF Thank you for trying Cloudflare Tunnel. Doing so, without a Cloudflare account, is a quick way to experiment and try it out. However, be aware that these account-less Tunnels have no uptime guarantee. If you intend to use Tunnels in production you should use a pre-created named tunnel by following: https://developers.cloudflare.com/cloudflare-one/connections/connect-apps
2023-03-04T20:51:16Z INF Requesting new quick Tunnel on trycloudflare.com...
2023-03-04T20:51:20Z INF +--------------------------------------------------------------------------------------------+
2023-03-04T20:51:20Z INF |  Your quick Tunnel has been created! Visit it at (it may take some time to be reachable):  |
2023-03-04T20:51:20Z INF |  https://statistics-feel-icon-applies.trycloudflare.com                                    |
2023-03-04T20:51:20Z INF +--------------------------------------------------------------------------------------------+

When you have the URL “statistics-feel-icon-applies.trycloudflare.com” (which changes with every quick Cloudflare tunnel execution), you have to do the following on your machine (documentation is here):

$ wget https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64
$ chmod +x cloudflared-linux-amd64 

$ vi ~/.ssh/config # and then add the following
Host home.server
	ProxyCommand /home/famzah/cloudflared-linux-amd64 access ssh --hostname statistics-feel-icon-applies.trycloudflare.com

$ ssh root@home.server 'id ; hostname' # try the connection

uid=0(root) gid=0(root) groups=0(root)
home-server

The quick Cloudflare Tunnels are free and don’t require that you have an account with Cloudflare. What a great ad-hoc replacement of VPN networks! On Linux this network infrastructure lets you replace Teamviewer, AnyDesk, etc. with a free secure remote desktop solution.

February 4, 2023
by Ivan Zahariev Leave a comment

How to reliably get the system time zone on Linux?

If you want to get the time zone abbreviation, use the following command:

date +%Z

If you want the full time zone name, use the following command instead:

timedatectl show | grep ^Timezone= | perl -pe 's/^Timezone=(\S+)$/$1/'

There are other methods for getting the time zone. But they depend either on the environment variable $TZ (which may not be set), or on the statically configured “/etc/timezone” file which might be out of sync with the system time zone file “/etc/localtime”.

It’s important to note that the Linux system utilities reference the “/etc/localtime” file (not “/etc/timezone”) when working with the system time. Here is a proof:

$ strace -e trace=file date
execve("/bin/date", ["date"], 0x7ffda462e360 /* 67 vars */) = 0
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/etc/localtime", O_RDONLY|O_CLOEXEC) = 3
Sat 04 Feb 2023 10:33:35 AM EET
+++ exited with 0 +++

The “/etc/timezone” file is a static helper that is set up when “/etc/localtime” is configured. Typically, “/etc/localtime” and “/etc/timezone” are in sync so it shouldn’t matter which one you query. However, I prefer to use the source of truth.

January 12, 2023
by Ivan Zahariev Leave a comment

Can your local NFS connections get broken by your external Internet connection?

Long story short: Yes! A flaky Internet connection to the outside world can make your local NFS client-server connections unusable. Even when they run on a dedicated storage network using dedicated switches and cables. This is a tale of dependencies, wrong assumptions, desperate restart of services, the butterfly effect and learning something new.

The company I work for operates 1000+ production servers in three data centers around the globe. This all started after a planned, trivial mass-restart of some internal services which are used by the online Control Panel. A couple of minutes after the restarts, the support team alarmed me that the Backup section of the Control Panel is not working. I acted as a typical System Administrator and just tried if the NFS backups are accessible from an SSH console. They were. So I concluded that most probably it wasn’t something with the NFS service but it was a problem caused by the restart of the internal services which keep the Control Panel running.

So we called a system developer to look into this. In the meantime I discovered by tests that the issue is limited only to one of our data centers. This raised an eyebrow but still with no further debug info and with everything working under the SSH console, I had to wait for input from the system development team. About an hour later they came up with a super simple reproducer:

perl -e 'use Path::Tiny; print path("/nfs/backup/somefile")->slurp;'

strace() shown that this hung on “flock(LOCK_SH)”. OMG! So it was a problem with the System Administrators’ systems after all. My previous test was to simply browse and read the files, and it didn’t occur to me to try file locking. I didn’t even know that this was used by the Control Panel. It turns out to be some (weird) default by Path::Tiny. A couple of minutes later I simplified the reproducer even more to just the following:

flock --shared /nfs/backup/somefile true

This also hung on “flock(LOCK_SH)”. Only in the USA data center. The backup servers were complaining about the following:

statd: server rpc.statd not responding, timed out lockd: cannot monitor %server-XXX-of-ours%

The NFS clients were reporting:

lockd: server %backup-IP% not responding, still trying xs_tcp_setup_socket: connect returned unhandled error -107

Right! So it’s the “rpc.statd” which just died! On both of our backup servers, simultaneously? Hmm… I raised the eyebrow even more. All servers had weeks of uptime, no changes at the time when the incident started, etc. Nothing suspicious caused by activity from any of our teams. Nevertheless, it doesn’t hurt to restart the NFS services. So I did it — restarted the backup NFS services (two times), the client NFS services for one of the production servers, unmounted and mounted the NFS directories. Nothing. Finally, I restarted the backup servers because there was a “[lockd]” kernel process hung in “D” state. After all it is possible that two backup servers with the same uptime get the same kernel bug at the same time…

The restart of the server machines fixed it! Pfew! Yet another unresolved mystery fixed by restart. Wait! Three minutes later the joy was gone because the Control Panel Backup section started to be sluggish again. The production machine where I was testing was intermittendly able to use the NFS locking.

2h30m elapsed already. Now it finally occurred to me that I need to pay closer attention to what the “rpc.statd” process was doing. To my surprise strace() shown that the process was waiting for 5+ seconds for some… DNS queries! It was trying to resolve “a.b.c.x.in-addr.arpa” and was timing out. The request was going to the local DNS cache server. The global DNS resolvers 8.8.8.8 and 1.1.1.1 were working properly and immediately returned “NXDOMAIN” for this DNS query. So I configured them on the backup servers and the NFS connections got much more stable. Still not perfect though.

The situation started to clear up. The NFS client was connecting to the NFS server. The server then tried to resolve the client’s private IP address to a hostname but was failing and this DNS failure was taking too many seconds. The reverse DNS zone for this private IPv4 network is served by the DNS servers “blackhole-1.iana.org” and “blackhole-2.iana.org”. Unfortunately, our upstream Internet provider was experiencing a problem and the connection to those DNS servers was failing with “Time to live exceeded” because of a network loop.

But why the NFS locking was still a bit sluggish after I fixed the NFS servers? It turned out that the “rpc.statd” of the NFS clients also does DNS resolve for the IP address of the NFS server.

30 minutes later I blacklisted the whole “x.in-addr.arpa” DNS zone for the private IPv4 network in all our local DNS resolvers and now they were replying with SERVFAIL immediately. The NFS locking started to work fast again and the Online Control panels were responding as expected.

Case closed. In three hours. Could have been done must faster – if I only knew NFS better, our NFS usage pattern and if I didn’t jump into the wrong assumptions. I’m still happy that I got to the root cause and have the confidence that the service is completely fixed for our customers.

October 16, 2022
by Ivan Zahariev Leave a comment

Open many interactive sessions in Konsole using a script

For 99.999% of my mass-execute tasks on many servers I use MPSSH.py which opens non-interactive SSH shells in parallel. But still there is one tedious weekly task that needs to be done using an interactive SSH shell.

In order to make the task semi-automated and to avoid typo errors, I open the Konsole sessions (tabs) from a list of servers. Here is how to do it:

for srv in $(cat server-list.txt); do konsole --new-tab --hold -e bash -i -c "ssh root@$srv" ; done

Once I have all sessions opened, I use Edit -> Copy Input to in Konsole, so that the input from the first “master” session (tab) is sent simultaneously to all sessions (tabs).

The "--hold" option ensures that no session ends without you noticing it. For example, if you had many sessions started without "--hold" and one of them terminates because the SSH connection was closed unexpectedly, the session (tab) would simply be closed and the tab would disappear. Having "--hold" keeps the terminated session (tab) opened so that you can see the exit messages, etc. Finally, you have to close each terminated session (tab) with File -> Close Session or the shortcut Ctrl + Shift + W.

/contrib/famzah

Enthusiasm never stops

Category Archives: Linux

How to Limit the Download Speed of a Docker Container

Speed up SSH connections by splitting known_hosts per host

Benchmark Summary

How To Split `known_hosts` Per Host

Reproducing The Benchmark

Achieving Zero-Downtime PHP-FPM Restarts and Atomic Updates

The Challenge with PHP-FPM Restarts

The Solution: Zero-Downtime Restarts and Atomic Deployments

Important Considerations

Demo, documentation and source code

Troubleshooting Zigbee Network Stability and Device Connectivity | Raspberry Pi + ConBee

Arduino ESP32 development under Linux

Custom key fetch script for cryptsetup @ systemd

Proxy SSH traffic using Cloudflare Tunnels

How to reliably get the system time zone on Linux?

Can your local NFS connections get broken by your external Internet connection?

Open many interactive sessions in Konsole using a script

Benchmark Summary

How To Split known_hosts Per Host

Reproducing The Benchmark

The Challenge with PHP-FPM Restarts

The Solution: Zero-Downtime Restarts and Atomic Deployments

Important Considerations

Demo, documentation and source code

How To Split `known_hosts` Per Host