Enthusiasm never stops

Leave a comment

Amazon EFS benchmarks

The Amazon Elastic File System (EFS) is a very intriguing storage product. It provides simple, scalable, elastic file storage for use on an EC2 virtual machine. The file system can be mounted over NFS at one or more EC2 machines simultaneously, and it also supports file locking.

Here are some important facts which I found out while doing my tests:

  • I/O operations per second (IOPS) are not the same metric that we’re used to measure when dealing with block devices like HDD or SSD disks. When working with EFS, we measure the NFS I/O operations per second. These correspond 1:1 to the read() or write() system calls that your applications make.
  • The size of the issued I/O requests are another very important metric for EFS. This is the real bytes transfer between your EC2 instance and the NFS server.
  • Therefore, we’re limited by both the NFS I/O requests per second, and the total transferred bytes for those NFS I/O per second.
  • The EFS performance and EFS limits documentation pages give a lot of insight. You have to monitor your EFS metrics using CloudWatch.
  • NFS I/O requests smaller than 4096 bytes are accounted as 4096 bytes. Regardless if you request 1 bytes, 1000 bytes, or 4096 bytes, you will get 4096 bytes accounted. Once you request more than 4096 bytes, they are accounted correctly.
  • You need more than one reader/writer thread or program, in order to achieve the full IOPS potential. One writer thread in my tests did 130 op/s, while 20 writer threads did 1500 op/s, for example.
  • The documentation says: “In General Purpose mode, there is a limit of 7000 file system operations per second. This operations limit is calculated for all clients connected to a single file system”. Our tests confirm this — we could do 3500 reading or 3000 writing operations per second.
  • CloudWatch has different aggregation functions for the *IOBytes metrics: min/max/average; sum; count. They represent different aspects of your EFS metrics, namely: the min/max/average IO operation size in bytes; the total transferred bytes in a minute (you need to divide to 60 to get the “per second” value); the total operations in a minute (you need to divide to 60 to get the “per second” value).
  • The CloudWatch EFS metrics “DataReadIOBytes” and “DataWriteIOBytes” reflect exactly what we see on the Linux system for “kB/s” and “ops/s” by the nfsiostat program. The transferred bytes reflect exactly the used bandwidth on the Linux network interfaces.
  • The “Metered size” in the AWS Console which is the same value as what you see by the “df” command is not updated in real-time. It could take more than an hour to reflect the real disk usage.
  • There is plenty of initial burst credit balance which lets you do some heavy I/O on your freshly created EFS file system. Our benchmark tests ran for hours with block sizes between 1 byte and 10k bytes, and we still had some positive burst credit balance left at the end.

I’m using the default NFS settings by the NFS mount helper provided in the “Amazon Linux 2” OS:

[root@ip-172-31-11-75 ~]# mount -t efs fs-7513e02c:/ /efs

[root@ip-172-31-11-75 ~]# mount
fs-7513e02c.efs.eu-central-1.amazonaws.com:/ on /efs type nfs4 (rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=,local_lock=none,addr=

The tests were performed using two “m4.xlarge” EC2 instances in the “eu-central” AWS region. This EC2 instance type provides “High” network performance.

The NFS I/O operations per second limits were tested using a simple C program which basically does the following:

fd = open(testfile, O_RDWR|O_DIRECT|O_SYNC);

while (1) {
  lseek(fd, SEEK_SET, 0);

  read(fd, buf, sizeof(buf));
  // or
  write(fd, buf, sizeof(buf));

I created 40 different files, so that I can run 40 different single benchmark programs on an EC2 instance – one for each file. This increases concurrency and lets the total throughput scale better.

Sequential writing and reading

Sequential writing and reading performed as expected – up to the “PermittedThroughput” limit shown in the CloudWatch metrics. In my case, for such a small EFS file system, the limit was 105 MB/s.

Writing: NFS I/O operations per second

Here are the results:

  • Writing from one EC2 instance using 1 byte, 1k bytes, or 10k bytes: regardless of the request size, we get up to 2000 IOPS. Typically the IOPS are between 1400 and 1700.
  • Writing from two EC2 instances using 1 byte, 1k bytes, or 10k bytes: regardless of the request size, we get up to 3000 IOPS in total which are equally spread across the two EC2 instances.
  • The “PercentIOLimit” CloudWatch metric shows 84% when we do 2880 ops/s, for example. Therefore, the total IOPS limit for writing is about 3500 ops/s.
  • When doing only write() system calls with 1 byte data, only “DataWriteIOBytes” is accounted by EFS which is an advantage for us. A real block file system needs to read the block (usually 4k bytes), update 1 byte in it, and then write it back on disk. I feel like this needs additional testing with more random data, so test for yourself, too. Note that the minimum accounted request size in EFS is 4kB.

Reading: NFS I/O operations per second

Here are the results:

  • Reading from one EC2 instance using 1 byte or 10k bytes: regardless of the request size, we get up to 3500 IOPS. One EC2 instance is enough to saturate the EFS limit.
  • Reading from two EC2 instances using 1 byte or 10k bytes: regardless of the request size, we get up to 3500 IOPS in total which are equally spread across the two EC2 instances.
  • The “PercentIOLimit” CloudWatch metric shows 100% when we do 3500 ops/s. Therefore, the total IOPS limit for reading is 3500 ops/s.


Leave a comment

Poor man’s AWS CLI “s3 sync” bandwidth limit

Here you go:

PID=13424; while [ 1 ]; do kill -STOP "$PID" ; sleep 0.4 ; kill -CONT "$PID" ; sleep 0.6 ; done

You may need to adjust the two sleep intervals in seconds.

This is a quick hack until the AWS CLI team releases an official option, which is being discussed under AWS CLI issue #1090.


Linux md-RAID scalability on a 10 Gigabit network

The question for today is – does Linux md-RAID scale to 10 Gbit/s?

I wanted to build a proof of concept for a scalable, highly available, fault tolerant, distributed block storage, which utilizes commodity hardware, runs on a 10 Gigabit Ethernet network, and uses well-tested open-source technologies. This is a simplified version of Ceph. The only single point of failure in this cluster is the client itself, which is inevitable in any solution.

Here is an overview diagram of the setup:
Linux md-RAID scalability on a 10 Gigabit network

My test lab is hosted on AWS:

  • 3x “c4.8xlarge” storage servers
    • each of them has 5x 50 GB General Purpose (SSD) EBS attached volumes which provide up to 160 MiB/s and 3000 IOPS for extended periods of time; practical tests shown 100 MB/s sustained sequential read/write performance per volume
    • each EBS volume is managed via LVM and there is one logical volume with size 15 GB
    • each 15 GB logical volume is being exported by iSCSI to the client machine
  • 1x “c4.8xlarge” client machine
    • the client machine initiates an iSCSI connection to each single 15 GB logical volume, and thus has 15 identical iSCSI block devices (3 storage servers x 5 block devices = 15 block devices)
    • to achieve a 3x replication factor, the block devices from each storage server are grouped into 5x mdadm software RAID-1 (mirror) devices; each RAID-1 device “md1” to “md5” contains three disks from a different storage server, so that if one or two of the storage servers fail, this won’t affect the operation of the whole RAID-1 device
    • all RAID-1 devices “md1” to “md5” are grouped into a single RAID-0 (stripe), in order to utilize the full bandwidth of all devices into a single block device, namely the “md99” RAID-0 device, which also combines the size capacity of all “md1” to “md5” devices and it equals to 75 GB
  • 10 Gigabit network in a VPC using Jumbo frames
  • the storage servers and the client machine were limited on boot to 4 CPUs and 2 GB RAM, in order to minimize the effect of the Linux disk cache
  • only sequential and random reading were benchmarked
  • Linux md RAID-1 (mirror) does not read from all underlying disks by default, so I had to create a RAID-1E (mirror) configuration; more info here and here; the “mdadm create” options follow: --level=10 --raid-devices=3 --layout=o3 Continue reading

Leave a comment

Locally encrypted secure remote backup over Internet on Linux (iSCSI / TrueCrypt)

Recently I decided to start using Amazon AWS as my backup storage but my paranoid soul wasn’t satisfied until I figured it out how to secure my private data. It’s not that I don’t trust Amazon but a lot of bad things could happen if I decided that I just copy my data to a remote server on Amazon:

  • Amazon staff would have access to my data.
  • A breach in Amazon’s systems would expose my data.
  • A breach in my remote server OS would expose my data.

One of the solutions which I considered was to encrypt my local file-system with eCryptfs but it has some issues with relatively long file names.

Finally I came out with the following working backup solution which I currently use to backup both my Windows and Linux partitions. I share the Windows root directory with the VirtualBox Linux machine and run the backup scripts from there. Here is a short explanation of the properties and features of the backup setup:

  • Locally encrypted — all files which I store on the iSCSI volume are encrypted on my personal desktop, before being sent to the remote server. This ensures that the files cannot be read by anyone else.
  • Secure — besides the local volume encryption, the whole communication is done over an SSH tunnel which secures the Internet point-to-point client-to-server communication.
  • Remote — having a remote backup ensures that even if someone breached in my house and steals my laptop and my offline backup, I can still recover my data from the remote server. Furthermore, it is more convenient to frequently backup on a remote machine, because we have Internet access everywhere now. Note that remote backups are not a substitution for offline backups.
  • Over Internet — very convenient. Of course, this backup scheme can be used in any TCP/IP network — private LAN, WAN, VPN networks, etc.

The following two articles provide detailed instructions on how to setup the backup solution:

Daily usage example

Here are the commands which I execute, in order to make a backup of my laptop. Those can be further scripted and automated if a daily or more frequent backup is required:

IP= # the public DNS IP address of the EC2 instance / server

## Execute the following, in order to mount the remote encrypted iSCSI volume:

sudo -E \
  ssh -F /dev/null \
  -o PermitLocalCommand=yes \
  -o LocalCommand="ifconfig tun0 pointopoint netmask" \
  -o ServerAliveInterval=60 \
  -w 0:0 root@"$IP" \
  'sudo ifconfig tun0 pointopoint netmask; hostname; echo tun0 ready'

sudo iscsiadm -m node --targetname "iqn.2012-03.net.famzah:storage.backup" --portal "" --login
sudo truecrypt --filesystem=none -k "" --protect-hidden=no /dev/sdb
sudo mount /dev/mapper/truecrypt1 /mnt

## You can now work on /mnt -- make a backup, copy files, etc.

ls -la /mnt

## Execute the following, in order to unmount the encrypted iSCSI volume:

sudo umount /mnt
sudo truecrypt -d /dev/sdb
sudo iscsiadm -m node --targetname "iqn.2012-03.net.famzah:storage.backup" --portal "" --logout
# stop the SSH tunnel

Disaster recovery plan

Any backup is useless if you cannot restore your data. If your main computer is totally out, you would need the following, in order to access your backed up data:

In order to be able to log in to the remote server via SSH, you need to set up the following:

vi /etc/ssh/sshd_config # PasswordAuthentication yes
/etc/init.d/ssh restart
passwd root # set a very long password which you CAN remember

Make sure that you test if you can log in using an SSH client which does not have your SSH key and thus requires you to enter the root password manually.

I do not consider password authentication for the root account to be a security threat here. The backup server is online only during the time a backup is being made, after which I shut it down in order to save money from Amazon AWS. Furthermore, the backup has a new IP address on each new EC2 machine start, so an attacker cannot continue a brute-force attack easily, even if they started it.

Leave a comment

Secure iSCSI setup via an SSH tunnel on Linux

This article will demonstrate how to export a raw block storage device over Internet in a secure manner. Re-phrased this means that you can export a hard disk from a remote machine and use it on your local computer as it was a directly attached disk, thanks to iSCSI. Authentication and secure transport channel is provided by an SSH tunnel (more info). The setup has been tested on Ubuntu 11.10 Oneiric.

Server provisioning

Amazon AWS made it really simple to deploy a server setup in a minute:

  1. Launch a Micro EC2 instance and then install Ubuntu server by clicking on the links in the Ubuntu EC2StartersGuide, section “Official Ubuntu Cloud Guest Amazon Machine Images (AMIs)”.
  2. Create an EBS volume in the same availability zone. Attach it to the EC2 instance as “/dev/sdf” (seen as “/dev/xvdf” in latest Ubuntu versions).
  3. (optionally) Allocate an Elastic IP address and associate it with the EC2 instance.

Note that you can lower your AWS bill by buying a Reserved instance slot. Those slots are non-refundable and non-transferrable, so shop wisely. You can also stop the EC2 instance when you’re not using it and you won’t be billed for it but only for the allocated EBS volume storage.

You can use any other dedicated or virtual server which you own and can access by IP. An Amazon AWS EC2 instance is given here only as an example.

iSCSI server-side setup

Execute the following on your server (iSCSI target):

IP= # the public DNS IP address of the EC2 instance / server

# Log in to the server
ssh ubuntu@$IP
# Update your SSH key in ".ssh/authorized_keys", if needed.
sudo bash
cp /home/ubuntu/.ssh/authorized_keys /root/.ssh/ # so that we can log in directly as root

apt-get update
apt-get upgrade

apt-get install linux-headers-virtual # virtual because we're running an EC2 instance
apt-get install iscsitarget iscsitarget-dkms
perl -pi -e 's/^ISCSITARGET_ENABLE=.*$/ISCSITARGET_ENABLE=true/' /etc/default/iscsitarget

# We won't use any iSCSI authentication because the server is totally firewalled
# and we access it only using an SSH tunnel.
# NOTE: If you don't use Amazon EC2, make sure that you firewall this machine completely,
# leaving only SSH access (TCP port 22).

# update your block device location in "Path", if needed
cat >> /etc/iet/ietd.conf <<EOF
Target iqn.2012-03.net.famzah:storage.backup
   Lun 0 Path=/dev/xvdf,Type=fileio

/etc/init.d/iscsitarget restart

echo 'PermitTunnel yes' >> /etc/ssh/sshd_config
/etc/init.d/ssh restart

iSCSI client-side setup

Execute the following on your client / desktop machine (iSCSI initiator):

# Install the iSCSI client
sudo apt-get install open-iscsi

How to attach an iSCSI volume on the client

The following commands show how to attach and detach a remote iSCSI volume on the client machine. The output of the commands is quoted with “#>>”.

IP= # the public DNS IP address of the EC2 instance / server

# Establish the secure SSH tunnel to the remote server
sudo -E \
  ssh -F /dev/null \
  -o PermitLocalCommand=yes \
  -o LocalCommand="ifconfig tun0 pointopoint netmask" \
  -o ServerAliveInterval=60 \
  -w 0:0 root@"$IP" \
  'sudo ifconfig tun0 pointopoint netmask; hostname; echo tun0 ready'

# Make sure that we can reach the remote server via the SSH tunnel

# Execute this one-time; it discovers the available iSCSI volumes
sudo iscsiadm -m discovery -t st -p
#>>,1 iqn.2012-03.net.famzah:storage.backup

# Attach the remote iSCSI volume on the local machine
sudo iscsiadm -m node --targetname "iqn.2012-03.net.famzah:storage.backup" --portal "" --login
#>> Logging in to [iface: default, target: iqn.2012-03.net.famzah:storage.backup, portal:,3260]
#>> Login to [iface: default, target: iqn.2012-03.net.famzah:storage.backup, portal:,3260]: successful

# Check the kernel log
#>> [ 1237.538172] scsi3 : iSCSI Initiator over TCP/IP
#>> [ 1238.657846] scsi 3:0:0:0: Direct-Access     IET      VIRTUAL-DISK     0    PQ: 0 ANSI: 4
#>> [ 1238.662985] sd 3:0:0:0: Attached scsi generic sg2 type 0
#>> [ 1239.578079] sd 3:0:0:0: [sdb] 167772160 512-byte logical blocks: (85.8 GB/80.0 GiB)
#>> [ 1239.751271] sd 3:0:0:0: [sdb] Write Protect is off
#>> [ 1239.751279] sd 3:0:0:0: [sdb] Mode Sense: 77 00 00 08
#>> [ 1240.099649] sd 3:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
#>> [ 1241.962729]  sdb: unknown partition table
#>> [ 1243.568470] sd 3:0:0:0: [sdb] Attached SCSI disk

# Double-check that the iSCSI volume is with the expected size (80 GB in our case)
cat /proc/partitions
#>> major minor  #blocks  name
#>> ...
#>> 8       16   83886080 sdb

# The remote iSCSI volume is now available under /dev/sdb on our local machine.
# You can use it as any other locally attached hard disk (block device).

# Detach the iSCSI volume from the local machine
sudo iscsiadm -m node --targetname "iqn.2012-03.net.famzah:storage.backup" --portal "" --logout
#>> Logging out of session [sid: 1, target: iqn.2012-03.net.famzah:storage.backup, portal:,3260]
#>> Logout of [sid: 1, target: iqn.2012-03.net.famzah:storage.backup, portal:,3260]: successful

# Check the kernel log
#>> [ 1438.942277]  connection1:0: detected conn error (1020)

# Double-check that the iSCSI volume is no longer available on the local machine
cat /proc/partitions
#>> no "sdb"

Once you have the iSCSI block device volume attached on your local computer, you can use it as you need, just like it was a normal hard disk. Only it will be slower because each I/O operation takes place over Internet. For example, you can locally encrypt the iSCSI volume with TrueCrypt, in order to prevent the administrators of the remote machine to be able to see your files.



Associate Amazon EC2 Elastic IP in a different region

If you allocate an Elastic IP address in the US-East region (N.Virginia), then sorry folks but you can’t use this IP address with an EC2 instance which is located in another region, say EU-West (Ireland) for example.
No problem to remap it to an EC2 instance in another availability zone within the same region.

The documentation of Amazon EC2 leaves another impression:

Elastic IP addresses allow you to mask instance or Availability Zone failures by programmatically remapping your public IP addresses to any instance in your account.

This is a bit misleading. I already had started dreaming as how I can transfer my EC2 instance across continents with no DNS propagation issues. And also had started admiring Amazon as to how they achieved this technically… Well, they haven’t. 🙂