network | /contrib/famzah

If you have a separate private network for your MySQL Galera Cluster, it is a good security practice to configure it to listen only on the private IP address. This way you have less firewall settings to set up and rely on. The following has been tested with Percona XtraDB Cluster.

A MySQL Galera Cluster listens all the time on two different ports, in order to provide the following services:

port 4567 – Galera Cluster communication
port 3306 – MySQL client connections and State Snapshot Transfer that use the “mysqldump” method

While those two services could be bound on different IP addresses, they are usually using the same IP address. Each of these services are configured using different MySQL settings in “my.cnf”:

port 4567 – “wsrep_cluster_address=gcomm://%CLUSTER_IP1%,%CLUSTER_IP2%,%CLUSTER_IP3%?gmcast.listen_addr=tcp://%THIS_NODE_LISTEN_IP%:4567”
port 3306 – “bind-address=%THIS_NODE_LISTEN_IP%”

If we had a cluster, and the current node has an IP address of 169.254.50.1, we would have the following in “my.cnf” regarding the networking configuration:

wsrep_provider_options="gmcast.listen_addr=tcp://169.254.50.1:4567"
wsrep_node_address=169.254.50.1
bind-address=169.254.50.1

There are two other ports which are opened on demand: port 4568 for Incremental State Transfer, and port 4444 for all other State Snapshot Transfer operations. Those two ports are controlled by “wsrep_sst_receive_address” and the “ist.recv_addr” option in “wsrep_provider_options”, as explained at this page. The default listening IP address is the same as configured for “wsrep_node_address”, and therefore doesn’t need any additional tweaks.

EDIT: It turns out that regardless of what is specified for the above two options for ports 4444 and 4568, at least the “other” State Snapshot Transfer port 4444 is always listening on the catch-all IP address “0.0.0.0” which accepts connections on any network interface and local address. I’ve observed this while a node was in a “Donor” state because another node was just joining the cluster.

The question for today is – does Linux md-RAID scale to 10 Gbit/s?

I wanted to build a proof of concept for a scalable, highly available, fault tolerant, distributed block storage, which utilizes commodity hardware, runs on a 10 Gigabit Ethernet network, and uses well-tested open-source technologies. This is a simplified version of Ceph. The only single point of failure in this cluster is the client itself, which is inevitable in any solution.

Here is an overview diagram of the setup:

My test lab is hosted on AWS:

3x “c4.8xlarge” storage servers
- each of them has 5x 50 GB General Purpose (SSD) EBS attached volumes which provide up to 160 MiB/s and 3000 IOPS for extended periods of time; practical tests shown 100 MB/s sustained sequential read/write performance per volume
- each EBS volume is managed via LVM and there is one logical volume with size 15 GB
- each 15 GB logical volume is being exported by iSCSI to the client machine
1x “c4.8xlarge” client machine
- the client machine initiates an iSCSI connection to each single 15 GB logical volume, and thus has 15 identical iSCSI block devices (3 storage servers x 5 block devices = 15 block devices)
- to achieve a 3x replication factor, the block devices from each storage server are grouped into 5x mdadm software RAID-1 (mirror) devices; each RAID-1 device “md1” to “md5” contains three disks from a different storage server, so that if one or two of the storage servers fail, this won’t affect the operation of the whole RAID-1 device
- all RAID-1 devices “md1” to “md5” are grouped into a single RAID-0 (stripe), in order to utilize the full bandwidth of all devices into a single block device, namely the “md99” RAID-0 device, which also combines the size capacity of all “md1” to “md5” devices and it equals to 75 GB
10 Gigabit network in a VPC using Jumbo frames
- “iperf” measured a 10.07 Gbits/s maximum throughput between the servers; at the same time, “iftop” displayed 9.20 Gbit/s
- VPC does not support network broadcasts, so ATA over Ethernet was not an option
the storage servers and the client machine were limited on boot to 4 CPUs and 2 GB RAM, in order to minimize the effect of the Linux disk cache
only sequential and random reading were benchmarked
Linux md RAID-1 (mirror) does not read from all underlying disks by default, so I had to create a RAID-1E (mirror) configuration; more info here and here; the “mdadm create” options follow: --level=10 --raid-devices=3 --layout=o3 Continue reading →

/contrib/famzah

Enthusiasm never stops

Tag Archives: network

Configure MySQL Galera Cluster to listen on a specific IP address

Linux md-RAID scalability on a 10 Gigabit network