The following is part 3 of a 4 part series that will go over an installation and configuration of Pacemaker, Corosync, Apache, DRBD and a VMware STONITH agent.
Before We Begin
We are going to use DRBD to store web content for Apache. You may want to be advised that this is one of those situations when DRBD is likely not the best choice for meeting storage needs (see here for more info: https://fghaas.wordpress.com/2007/06/26/when-not-to-use-drbd/). Awesome rsync would do just fine. In production, you’d want to use DRBD for backend store rather than frontend.
The convention followed in the series is that [ALL] # denotes a command that needs to be run on all cluster machines.
Replicate Cluster Storage Using DRBD
It is recommended, though not strictly required, that you run your DRBD replication over a dedicated connection.
It is generally not recommended to run DRBD replication via routers, for reasons of fairly obvious performance drawbacks (adversely affecting both throughput and latency).
We use a dedicated vlan for DRBD in this article.
DRBD Installation
Import the ELRepo package signing key, enable the repository and install the DRBD kernel module with utilities:
[ALL]# rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org [ALL]# rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-2.el7.elrepo.noarch.rpm [ALL]# yum install -y kmod-drbd84 drbd84-utils
To avoid issues with SELinux, for the time being, we are going to exempt DRBD processes from SELinux control:
[ALL]# semanage permissive -a drbd_t
LVM Volume for DRBD
Create a new 256MB logical volume for DRBD:
[ALL]# vgcreate vg_drbd /dev/sdb [ALL]# lvcreate --name lv_drbd --size 256M vg_drbd
DRBD Features: Single-primary and Dual-primary modes
In single-primary mode, a resource is, at any given time, in the primary role on only one cluster member. Since it is guaranteed that only one cluster node manipulates the data at any moment, this mode can be used with any conventional file system (ext4, XFS).
Deploying DRBD in single-primary mode is the canonical approach for high availability (failover capable) clusters. This is the mode that we are going to use in our failover cluster.
In dual-primary mode, a resource is, at any given time, in the primary role on both cluster nodes. Since concurrent access to the data is thus possible, this mode requires the use of a shared cluster file system that utilizes a distributed lock manager. Examples include GFS and OCFS2.
Deploying DRBD in dual-primary mode is the preferred approach for load-balancing clusters which require concurrent data access from two nodes. This mode is disabled by default, and must be enabled explicitly in DRBD’s configuration file.
DRBD Replication Modes
DRBD supports three distinct replication modes, allowing three degrees of replication synchronicity.
- Protocol A. Asynchronous replication protocol. Local write operations on the primary node are considered completed as soon as the local disk write has finished, and the replication packet has been placed in the local TCP send buffer.
- Protocol B. Memory synchronous (semi-synchronous) replication protocol. Local write operations on the primary node are considered completed as soon as the local disk write has occurred, and the replication packet has reached the peer node.
- Protocol C. Synchronous replication protocol. Local write operations on the primary node are considered completed only after both the local and the remote disk write have been confirmed.
The most commonly used replication protocol in DRBD setups is protocol C.
Configure DRBD
Configure DRBD, use single-primary mode with replication protocol C.
[ALL]# cat << EOL > /etc/drbd.d/webdata.res resource webdata { protocol C; meta-disk internal; device /dev/drbd0; disk /dev/vg_drbd/lv_drbd; handlers { split-brain "/usr/lib/drbd/notify-split-brain.sh root"; } net { allow-two-primaries no; after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary; after-sb-2pri disconnect; rr-conflict disconnect; } disk { on-io-error detach; } syncer { verify-alg sha1; } on pcmk01 { address 172.16.22.11:7789; } on pcmk02 { address 172.16.22.12:7789; } } EOL
We have a resource named webdata which uses /dev/vg_drbd/lv_drbd
as the lower-level device, and is configured with internal meta data.
The resource uses TCP port 7789 for its network connections, and binds to the IP addresses 172.16.22.11 and 172.16.22.12, respectively.
If case we run into problems, we have to ensure that a TCP 7789 port is open on a firewall for the DRBD interface and that the resource name matches the file name.
Create the local metadata for the DRBD resource:
[ALL]# drbdadm create-md webdata
Ensure the DRBD kernel module is loaded:
[ALL]# lsmod|grep drbd drbd 392583 0 libcrc32c 12644 1 drbd
Finally, bring up the DRBD resource:
[ALL]# drbdadm up webdata
For data consistency, tell DRBD which node should be considered to have the correct data (can be run on any node as both have garbage at this point):
[pcmk01]# drbdadm primary --force webdata
It should now sync:
[pcmk01]# watch -n.5 'cat /proc/drbd'
Create a filesystem on the DRBD device, tune if required:
[pcmk01]# mkfs.ext4 -m 0 -L drbd /dev/drbd0 [pcmk01]# tune2fs -c 200 -i 180d /dev/drbd0
Populate DRBD Content
Mount the newly created disk and populate it with a web document:
[pcmk01]# mount /dev/drbd0 /mnt
[pcmk01]# cat << EOL >/mnt/index.html DRBD backend test EOL
We need to give the same SELinux policy as the web document root. Display security context:
[pcmk01]# ls -ldZ /var/www/html/
drwxr-xr-x. root root system_u:object_r:httpd_sys_content_t:s0 /var/www/html/
The httpd policy stores data with multiple different file context types under the /var/www
directory. If we want to store the data in a different directory, we can use the semanage command to create an equivalence mapping.
[pcmk01]# semanage fcontext --add --equal /var/www /mnt [pcmk01]# restorecon -R -v /mnt
Please be advised that changes made with the chcon command do not survive a file system relabel, or the execution of the restorecon command. Always use semanage.
[pcmk01]# umount /dev/drbd0
Cluster Configuration for the DRBD Device
Create a cluster resource named my_webdata for the DRBD device, and an additional clone resource MyWebClone to allow the resource to run on both nodes at the same time:
[pcmk01]# pcs cluster cib drbd_cfg
[pcmk01]# pcs -f drbd_cfg resource create my_webdata ocf:linbit:drbd \ drbd_resource=webdata op monitor interval=10s
[pcmk01]# pcs -f drbd_cfg resource master MyWebClone my_webdata \ master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 \ notify=true
Verify resources and commit:
[pcmk01]# pcs -f drbd_cfg resource show Resource Group: my_webresource my_VIP (ocf::heartbeat:IPaddr2): Started pcmk01-cr my_website (ocf::heartbeat:apache): Started pcmk01-cr Master/Slave Set: MyWebClone [my_webdata] Stopped: [ pcmk01-cr pcmk02-cr ]
[pcmk01]# pcs cluster cib-push drbd_cfg
Check the cluster status:
[pcmk01]# pcs status Cluster name: test_webcluster Last updated: Sun Dec 13 15:16:31 2015 Last change: Sun Dec 13 15:16:21 2015 by root via cibadmin on pcmk01-cr Stack: corosync Current DC: pcmk02-cr (version 1.1.13-10.el7-44eb2dd) - partition with quorum 2 nodes and 4 resources configured Online: [ pcmk01-cr pcmk02-cr ] Full list of resources: Resource Group: my_webresource my_VIP (ocf::heartbeat:IPaddr2): Started pcmk01-cr my_website (ocf::heartbeat:apache): Started pcmk01-cr Master/Slave Set: MyWebClone [my_webdata] Masters: [ pcmk01-cr ] Slaves: [ pcmk02-cr ] PCSD Status: pcmk01-cr: Online pcmk02-cr: Online Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/enabled
Filesystem Configuration for the Cluster
We have a working DRBD device, now we need to mount its filesystem.
Create a cluster resource named my_webfs for the filesystem:
[pcmk01]# pcs cluster cib fs_cfg
[pcmk01]# pcs -f fs_cfg resource create my_webfs Filesystem \ device="/dev/drbd0" directory="/var/www/html" fstype="ext4"
Filesystem resource needs to be run on the same node as the MyWebClone resource. Since one cluster service depends one another cluster service running on the same node, we need to assign an infinity score to the constraint:
[pcmk01]# pcs -f fs_cfg constraint colocation add my_webfs with MyWebClone \ INFINITY with-rsc-role=Master
[pcmk01]# pcs -f fs_cfg constraint order promote MyWebClone then start my_webfs Adding MyWebClone my_webfs (kind: Mandatory) (Options: first-action=promote then-action=start)
Tell the cluster that the virtual IP needs to run on the same machine as the filesystem and that it must be active before the VIP can start:
[pcmk01 ~]# pcs -f fs_cfg constraint colocation add my_VIP with my_webfs INFINITY
[pcmk01 ~]# pcs -f fs_cfg constraint order my_webfs then my_VIP Adding my_webfs my_VIP (kind: Mandatory) (Options: first-action=start then-action=start)
This way Apache is only started when the filesystem and the VIP are both available.
Verify the updated configuration:
[pcmk01]# pcs -f fs_cfg constraint Location Constraints: Ordering Constraints: promote MyWebClone then start my_webfs (kind:Mandatory) start my_webfs then start my_VIP (kind:Mandatory) Colocation Constraints: my_webfs with MyWebClone (score:INFINITY) (with-rsc-role:Master) my_VIP with my_webfs (score:INFINITY)
[pcmk01]# pcs -f fs_cfg resource show Resource Group: my_webresource my_VIP (ocf::heartbeat:IPaddr2): Started pcmk01-cr my_website (ocf::heartbeat:apache): Started pcmk01-cr Master/Slave Set: MyWebClone [my_webdata] Masters: [ pcmk01-cr ] Slaves: [ pcmk02-cr ] my_webfs (ocf::heartbeat:Filesystem): Stopped
Commit the changes and check the cluster status:
[pcmk01]# pcs cluster cib-push fs_cfg
[pcmk01]# pcs status Cluster name: test_webcluster Last updated: Sun Dec 13 15:19:01 2015 Last change: Sun Dec 13 15:18:55 2015 by root via cibadmin on pcmk01-cr Stack: corosync Current DC: pcmk02-cr (version 1.1.13-10.el7-44eb2dd) - partition with quorum 2 nodes and 5 resources configured Online: [ pcmk01-cr pcmk02-cr ] Full list of resources: Resource Group: my_webresource my_VIP (ocf::heartbeat:IPaddr2): Started pcmk01-cr my_website (ocf::heartbeat:apache): Started pcmk01-cr Master/Slave Set: MyWebClone [my_webdata] Masters: [ pcmk01-cr ] Slaves: [ pcmk02-cr ] my_webfs (ocf::heartbeat:Filesystem): Started pcmk01-cr PCSD Status: pcmk01-cr: Online pcmk02-cr: Online Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/enabled
Here’s a list of cluster resources:
# pcs resource show --full Group: my_webresource Resource: my_VIP (class=ocf provider=heartbeat type=IPaddr2) Attributes: ip=10.247.50.213 cidr_netmask=32 Operations: start interval=0s timeout=20s (my_VIP-start-interval-0s) stop interval=0s timeout=20s (my_VIP-stop-interval-0s) monitor interval=10s (my_VIP-monitor-interval-10s) Resource: my_website (class=ocf provider=heartbeat type=apache) Attributes: configfile=/etc/httpd/conf/httpd.conf statusurl=http://localhost/server-status Operations: start interval=0s timeout=40s (my_website-start-interval-0s) stop interval=0s timeout=60s (my_website-stop-interval-0s) monitor interval=10s (my_website-monitor-interval-10s) Master: MyWebClone Meta Attrs: master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true Resource: my_webdata (class=ocf provider=linbit type=drbd) Attributes: drbd_resource=webdata Operations: start interval=0s timeout=240 (my_webdata-start-interval-0s) promote interval=0s timeout=90 (my_webdata-promote-interval-0s) demote interval=0s timeout=90 (my_webdata-demote-interval-0s) stop interval=0s timeout=100 (my_webdata-stop-interval-0s) monitor interval=10s (my_webdata-monitor-interval-10s) Resource: my_webfs (class=ocf provider=heartbeat type=Filesystem) Attributes: device=/dev/drbd0 directory=/var/www/html fstype=ext4 Operations: start interval=0s timeout=60 (my_webfs-start-interval-0s) stop interval=0s timeout=60 (my_webfs-stop-interval-0s) monitor interval=20 timeout=40 (my_webfs-monitor-interval-20)
References
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Clusters_from_Scratch/index.html
https://drbd.linbit.com/users-guide/
Hi,
First, thank you for your great article. I have a question, what is happen if primary node (pcmk01) crash. Is there any way to create another primary node? or our cluster will be ruined.
Thank you again,
Abe
In case of a failure of the primary node, services are automatically failed-over to the other node, which then becomes the primary node for such services. This is basically the idea of having a failover cluster.
Such an amazing tutorial! I now feel so comfortable and confident about the Clustering!
Thanks a ton. Cheers.
You’re welcome!
Hi Tomas,
Could you please suggest how can i create a custom resource. For example instead of using ocf::heartbeat:apache i need to configured ocf::heartbeat:my_application_service.
Hi, cluster resource scripts are similar to init scripts where they need to support start, stop and status. Pacemaker supports LSB-compliant scripts natively, so in theory, as long as your script is LSB compatible, you can create one.
To give you an idea of the process, you will need to:
1) ensure that script is LSB-compliant
2) copy the script under /etc/init.d
3) check to make sure that your script can be seen by pacemaker
4) add the script as cluster resource
5) check the configuration of the resource
I hope that helps!