October 30, 2016
by Ivan Zahariev 1 Comment

Two AWS CLI tips for S3 — UTF-8 when piping, and migrating the Storage Class

While working on the “youtube-mp3-archive” project, I stumbled across two issues which are worth to be documented for future use.

“aws s3 ls” shows “???” instead of the UTF-8 key names of the S3 objects

On my machine this happens when I pipe the output of “aws s3 ls” to another program. Here is an example:

$ aws s3 ls --recursive s3://youtube-mp3.famzah/ | tee | grep 4185710
2016-10-30 08:08:49    4185710 mp3/Youtube/??????? - ?? ???? ?????-BF6KuR8vWN0.mp3

There is already a discussion about this at the AWS CLI project. The solution in my case was to tamper with the PYTHONIOENCODING environment variable and force UTF-8:

$ PYTHONIOENCODING=utf8 aws s3 ls --recursive s3://youtube-mp3.famzah/ | tee | grep 4185710
2016-10-30 08:08:49    4185710 mp3/Youtube/Аналгин - Тя беше ангел-BF6KuR8vWN0.mp3

How to convert all stored S3 objects to another Storage Class

As already explained, the Storage Class cannot be set on a per-bucket basis. It must be specified with each upload operation in your client.

The migration procedure is already documented at the AWS CLI project. Here are the commands to check the current Storage Class of all objects in an S3 bucket, and how to convert them to a different Storage Class:

# all our S3 objects are using the "Standard" Storage Class
$ aws s3api list-objects --bucket youtube-mp3.famzah | grep StorageClass | sort | uniq -c
749  "StorageClass": "STANDARD"

# convert without re-uploading the objects from your computer
aws s3 cp --recursive --storage-class STANDARD_IA s3://youtube-mp3.famzah/ s3://youtube-mp3.famzah/

# all our S3 objects are now using the "Standard-Infrequent" Storage Class
$ aws s3api list-objects --bucket youtube-mp3.famzah | grep StorageClass | sort | uniq -c
749  "StorageClass": "STANDARD_IA"

The reason to use a different Storage Class is pricing.

AWS S3 icon by isdownrightnow.net

October 23, 2016
by Ivan Zahariev Leave a comment

Dynamic DNS using AWS Route 53

The Internet ecosystem and technologies advanced so much lately that you can rebuild an entire business from scratch in a few hours of coding and at pretty acceptable costs. I’m referring to the dynamic DNS (aka. DDNS or DynDNS) service which was a hit a few years back. It took me less than a hundred lines of code to create a simple dynamic DNS using AWS Route 53. The AWS API and backend provide the DNS service, while the free service “ipify” lets you look up your real remote IP address. While this solution is not free as speech, it’s free as beer and costs less than a dollar per month.

DNS icon by PRchecker

October 23, 2016
by Ivan Zahariev 4 Comments

Goodbye Acronis cloud — Hello Encrypted S3 backup!

Over time the backup strategies for my personal laptop are changing in the search for the most cost-effective, robust and secure solution. And it must be able to back up both my Windows host and Linux virtual machine.

I tried a backup to an AWS EC2 instance for a while but this was expensive.
I then changed to Acronis Cloud backup because I’m very satisfied with their local hard disk backups. But their online cloud backup was an unpleasant experience. The cloud backup failed without indication in the taskbar; when I clicked for more info, the cryptic “error(0x49052524) in lib; please contact support” was displayed; I contacted support to no avail — but they wanted me to reinstall; it fixed itself after a dozen of days; this has happened two times in a few months; last but not least, when I wanted to browse my online backup the web interface was really slow. Sorry Acronis, but you really disappointed me.

Now I’ve come to an open-source solution for my backup needs — the Encrypted S3 Backup written in Bash based on the official Amazon Command-Line Interface (CLI). This simple backup system leaves control and visibility in your hands. Additionally, the backup scripts are very small and you can easily audit them. The README provides all information about the design, security, usage, disaster recovery, etc. More or less, it’s a solution for Linux technical guys, and not really suited for end-uses who should try Duplicati instead. And it doesn’t back up an “image” of your system but it is file-based. Only the file data is archived, so you can’t restore the file owners, permissions and other meta info.

Let’s review the pricing side. In my case I’m doing a daily backup for 125 GB data in 320,000 files.

The incremental daily backup costs me $2.73 per month. 89% is the cost for S3 (mainly the GB-storage cost) and the rest is for bandwidth.
The initial one-time upload of 70 GB costed me $3.43. Expect about double for 125 GB.
The projected cost for a full restore is $11.59 where 96% is the price of the used bandwidth from S3 to Internet.
All prices are without taxes.

As far as performance is concerned, S3 is great!

Browsing my backup versions in the online S3 explorer is lightning fast.
The daily sync for 125 GB data in 320,000 files takes 23 minutes. I don’t change a lot of files on my laptop during my daily activities.
My initial upload performed with a speed of 10 MBytes/s, and it could have been faster if I had more than 80 Mbit/s Internet at my disposal.

Note that in the end you need to trust AWS S3 to encrypt your data server-side, and then to completely forget your original data.

Backup icon by PRchecker

October 20, 2016
by Ivan Zahariev 2 Comments

Bash: Process null-terminated results piped from external commands

Usually when working with filenames we need to terminate each result record uniquely using the special null-character. That’s because filenames may contain special symbols, including white-space and even the newline character “\n”.

There is already a great answer how to do this in the StackOverflow topic “Capturing output of find . -print0 into a bash array”. The proposed solution doesn’t invoke any sub-shells, which is great, and also explains all caveats in detail. In order to become really universal, this solution must not rely on the static file-descriptor “3”. Another great answer at SO gives an example on how to dynamically use the next available file-descriptor.

Here is the solution which works without using sub-shells and without depending on a static FD:

a=()
while IFS='' read -r -u"$FD" -d $'\0' file; do
  # note that $IFS is having the default value here
  a+=("$file") # or however you want to process each file
done {FD}< <(find /tmp -type f -print0)
exec {FD}<&- # close the file descriptor

# the result is available outside the loop, too
echo "${a[0]}" # 1st file
echo "${a[1]}" # 2nd file


					
		
			
				Categories: Development, Linux | Tags: bash, print0 | Permalink.

July 28, 2016
by Ivan Zahariev Leave a comment

Poor man’s AWS CLI “s3 sync” bandwidth limit

Here you go:

PID=13424; while [ 1 ]; do kill -STOP "$PID" ; sleep 0.4 ; kill -CONT "$PID" ; sleep 0.6 ; done

You may need to adjust the two sleep intervals in seconds.

This is a quick hack until the AWS CLI team releases an official option, which is being discussed under AWS CLI issue #1090.

May 20, 2016
by Ivan Zahariev Leave a comment

mpssh-py — half a million executions for 3 years

It’s “mpssh-py” 3rd birthday this year! 🙂

I did some stats analysis on my logs and they revealed the following:

500.000 SSH executions by “mpssh-py” on my local machine for the last 3 years.
No bugs for the last 3 years.

This is a proof for the power of interpreted languages and that the single-responsibility design approach for programs helps in making them more reliable.

Font Awesome by Dave Gandy - http://fontawesome.io

December 15, 2015
by Ivan Zahariev Leave a comment

Convert human-readable sizes back to raw numbers

Ever needed to convert lots of lines with 1M or 1G to their raw number representation?

Here is a sample:

$ cat sample
26140   132K   1.9G   1.5G     ?K     0K     8K     0K   5% mysqld
26140   132K   1.9G   1.5G     ?K     4K     8K     0K   5% mysqld
26140   132K   1.9G   1.5G     ?K     0K     0K     0K   5% mysqld
26140   132K   1.9G   1.5G     ?K    -8K     0K     0K   5% mysqld
26140   132K   1.9G   1.6G     ?K     0K    20K     0K   5% mysqld
26140   132K   1.9G   1.6G     ?K     0K    56K     0K   5% mysqld
26140   132K   1.9G   1.7G     ?K    -4K     4K     0K   5% mysqld
26140   132K   1.9G   1.7G     ?K     0K    16K     0K   5% mysqld
26140   132K   1.9G   1.8G     ?K     0K     0K     0K   5% mysqld

The following Perl one-liner comes to the rescue:

perl -Mstrict -Mwarnings -n -e 'my %p=( K=>3, M=>6, G=>9, T=>12); s/(\d+(?:\.\d+)?)([KMGT])/$1*10**$p{$2}/ge; print'

In the end you get:

$ cat sample | perl -Mstrict -Mwarnings -n -e 'my %p=( K=>3, M=>6, G=>9, T=>12); s/(\d+(?:\.\d+)?)([KMGT])/$1*10**$p{$2}/ge; print'
26140   132000   1900000000   1500000000     ?K     0     8000     0   5% mysqld
26140   132000   1900000000   1500000000     ?K     4000     8000     0   5% mysqld
26140   132000   1900000000   1500000000     ?K     0     0     0   5% mysqld
26140   132000   1900000000   1500000000     ?K    -8000     0     0   5% mysqld
26140   132000   1900000000   1600000000     ?K     0    20000     0   5% mysqld
26140   132000   1900000000   1600000000     ?K     0    56000     0   5% mysqld
26140   132000   1900000000   1700000000     ?K    -4000     4000     0   5% mysqld
26140   132000   1900000000   1700000000     ?K     0    16000     0   5% mysqld
26140   132000   1900000000   1800000000     ?K     0     0     0   5% mysqld

You can now paste this output to Excel, for example, in order to create a nice chart of it.

June 26, 2015
by Ivan Zahariev 5 Comments

OpenSSH ciphers performance benchmark (update 2015)

It’s been five years since the last OpenSSH ciphers performance benchmark. There are two fundamentally new things to consider, which also gave me the incentive to redo the tests:

Since OpenSSH version 6.7 the default set of ciphers and MACs has been altered to remove unsafe algorithms. In particular, CBC ciphers and arcfour* are disabled by default. This has been adopted in Debian “Jessie”.
Modern CPUs have hardware acceleration for AES encryption.

I tested five different platforms having CPUs with and without AES hardware acceleration, different OpenSSL versions, and running on different platforms including dedicated servers, OpenVZ and AWS.

Since the processing power of each platform is different, I had to choose a criteria to normalize results, in order to be able to compare them. This was a rather confusing decision, and I hope that my conclusion is right. I chose to normalize against the “arcfour*”, “blowfish-cbc”, and “3des-cbc” speeds, because I doubt it that their implementation changed over time. They should run equally fast on each platform because they don’t benefit from the AES acceleration, nor anyone bothered to make them faster, because those ciphers are meant to be marked as obsolete for a long time.

A summary chart with the results follow:

You can download the raw data as an Excel file. Here is the command which was run on each server:

# uses "/root/tmp/dd.txt" as a temporary file!
for cipher in aes128-cbc aes128-ctr aes128-gcm@openssh.com aes192-cbc aes192-ctr aes256-cbc aes256-ctr aes256-gcm@openssh.com arcfour arcfour128 arcfour256 blowfish-cbc cast128-cbc chacha20-poly1305@openssh.com 3des-cbc ; do
	for i in 1 2 3 ; do
		echo
		echo "Cipher: $cipher (try $i)"
		
		dd if=/dev/zero bs=4M count=1024 2>/root/tmp/dd.txt | pv --size 4G | time -p ssh -c "$cipher" root@localhost 'cat > /dev/null'
		grep -v records /root/tmp/dd.txt
	done
done

We can draw the following conclusions:

Servers which run a newer CPU with AES hardware acceleration can enjoy the benefit of (1) a lot faster AES encryption using the recommended OpenSSH ciphers, and (2) some AES ciphers are now even two-times faster than the old speed champion, namely “arcfour”. I could get those great speeds only using OpenSSL 1.0.1f or newer, but this may need more testing.
Servers having a CPU without AES hardware acceleration still get two-times faster AES encryption with the newest OpenSSH 6.7 using OpenSSL 1.0.1k, as tested on Debian “Jessie”. Maybe they optimized something in the library.

Test results may vary (a lot) depending on your hardware platform, Linux kernel, OpenSSH and OpenSSL versions.

June 25, 2015
by Ivan Zahariev Leave a comment

“iperf” and “iftop” accuracy

While working on my latest pet project which involved 10 GigE transfers, I noticed a significant difference between the results shown by “iperf” and “iftop“. A fellow blogger also noticed this discrepancy. In order to get to the bottom of this, I did some additional tests using different MTU sizes, and observing the output of “iperf”, “iftop”, “iptraf”, and the raw Linux network device counters as seen by “ifconfig”.

The tests results are summarized in an online spreadsheet: https://goo.gl/MvJC8K

Some notes about each application:

iperf – this tool measures the TCP performance, as per documentation; therefore it counts the useful payload in a TCP/IP transfer; this is layer4 in the OSI model
iftop – this tool counts all IP packets, as per documentation; my tests show that it also operates on layer4, just as “iperf”, because ARP traffic (on layer3) is not counted at all; the fact that “iftop” cares about connections+ports also suggests that it operates at layer4
iptraf – this tool seems to be too old now, and its results were off by a multiple of 4 to 5
ifconfig – shows the most low-level statistics, namely bytes that passed as RX or TX through the network device; the most trusted source of performance data

We notice that both “iperf” and “iftop” measure the useful payload data that we can transfer per second. Since all OSI layers have some overhead, let’s take a look at what theory says about bandwidth efficiency in Ethernet:

with a standard MTU frame of 1500 bytes, we get 94.93% efficiency (5.07% overhead)
with a jumbo MTU frame of 9000 bytes, we get 99.14% efficiency (0.86% overhead)

Those numbers correspond very closely with the results shown by “iperf”.

It’s only “iftop” which differs a lot. Analysis of its source code reveals the reason for this and how we must interpret the displayed results:

#
# ui.c
#

void ui_print() {
...
    mvaddstr(y, COLS - 8 * HISTORY_DIVISIONS - 8, "rates:");

    draw_totals(&totals);
}

void draw_totals(host_pair_line* totals) {
    for(j = 0; j < HISTORY_DIVISIONS; j++) {
        readable_size((totals->sent[j] + totals->recv[j]) , buf, 10, 1024, options.bandwidth_in_bytes);
...
}

#
# ui_common.c
#

/*
 * Format a data size in human-readable format
 */
void readable_size(float n, char* buf, int bsize, int ksize, int bytes) {
    float size = 1;
...
    while(1) {
      size *= ksize;
...
        snprintf(buf, bsize, " %4.2f%s", n / size, bytes ? unit_bytes[i] : unit_bits[i]);

The authors of “iftop” decided to round to Gigibit (multiple of 1024), instead of the more common Gigabit (multiple of 1000). This makes the difference by “iftop” bigger as the transfer rate gets higher. For Gigabit the difference is 7%.

Once the “iftop” values are converted from Gigibit to Gigabit, they also match the results by “iperf” and the raw Linux network device counters.

/contrib/famzah

Enthusiasm never stops

Category Archives: Linux