9월 16 2019

리눅스 HA(corosync, pacemaker, DRBD) – Part 2

By snowffox in Application(어플리케이션), Linux(리눅스)

corosync, pacemaker 클러스터에 DRBD 디스크 이용하기.

참고:
리눅스 HA(corosync, pacemaker) – Part 1
리눅스 HA (pacemaker, corosync, iscsi shared storage) – part 3
리눅스 HA(corosync, pacemaker, shared disk)에 zabbix 모니터링 서버 구성 – part 4

이 글은 리눅스 HA(corosync, pacemaker) – Part 1에 이어집니다. part2는 원래 iscsi 볼륨을 붙여서 active-active 클러스터로 넘어가는 글이었는데, 작성하다가 멈췄습니다.(여러가지 이유로…) 그런데, 시간이 벌써 2년이 지났군요!!!

최근에 DRBD 관련 문의주신 분이 있어, DRBD 볼륨을 붙여봅니다.

환경은 이전과 동일합니다. 이전 글을 참고해 주시기 바랍니다.

DRBD는 네트워크를 통한 디스크 미러링이라 생각하면 됩니다. 가령, 서버1의 디스크1, 서버2의 디스크2를 DRBD로 구성한 경우, 서버1의 작업으로 인해 디스크1의 내용이 변경되면 실시간으로 서버2의 디스크2도 변경됩니다.

1. DRBD 설치(양쪽 노드 모두)
공식 레포지토리에는 drbd패키지가 없으므로, elrepo를 등록해준다.

# rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org

# rpm -ivh http://www.elrepo.org/elrepo-release-7.0-2.el7.elrepo.noarch.rpm

drbd 버전 9.x을 아래와 같이 설치한다.(아래쪽 오류로 나중에 8.4를 재 설치해봤으나, 버전문제는 아니었음)

# yum install drbd90-utils
# yum install kmod-drbd90

2. 커널에 DRBD 모듈을 로딩한다.(양쪽 노드 모두)

# lsmod | grep drbd
# modprobe drbd
# lsmod | grep drbd
drbd                  553913  0 
libcrc32c              12644  2 xfs,drbd

3. DRBD 설정(양쪽 노드 모두)
DRBD를 사용하기위해 새 디스크를 추가했다.(디스크가 하나인 경우, OS 파티션과 다른 파티션 DRBD가 사용하도록 할 수 있다.)
새로 붙인 디스크를 xfs로 파일시스템을 생성하고, parted -l로 확인 하면 아래와 같다.

* 새 디스크(또는 lv)는 화일시스템이 생성되어 있지 않아야 한다. 화일시스템을 만들어놓으면 Drbd 메타디스크 생성시 아래 오류가 발생한다.
오류내용:

Command 'drbdmeta 0 v09 /dev/sdb internal create-md 1' terminated with exit code 40

* 아래는 현재 디스크 상태

# parted -l
Model: ATA VBOX HARDDISK (scsi)
Disk /dev/sda: 8590MB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags: 

Number  Start   End     Size    Type     File system     Flags
 1      1049kB  513MB   512MB   primary  linux-swap(v1)
 2      513MB   8590MB  8077MB  primary  xfs             boot


Error: /dev/sdb: unrecognised disk label
Model: ATA VBOX HARDDISK (scsi)                                           
Disk /dev/sdb: 8590MB
Sector size (logical/physical): 512B/512B
Partition Table: unknown
Disk Flags:

양쪽 노두 모두에서, /etc/drbd.d/cluster_disk.res 화일을 생성하고, 아래와 같이 작성한다. 여기서 ip주소는 두번째 NIC에 할당된 ip 주소를 이용한다.(서비스ip는 192.x.x.x, 데이타 전송용은 172.x.x.x)

# cat cluster_disk.res
resource disk_wolf {
        on wolf1 {
                device /dev/drbd0;
                disk /dev/sdb;
                address 172.16.1.31:7789;
                meta-disk internal;
        }
        on wolf2 {
                device /dev/drbd0;
                disk /dev/sdb;
                address 172.16.1.32:7789;
                meta-disk internal;
        }
}

/etc/drbd.d/global_common.conf 에서 네트워크 프로토콜을 설정한다. 프로토콜 부분을 찾아서 아래 내용을 추가한다.

net {
		# protocol timeout max-epoch-size max-buffers
		# connect-int ping-int sndbuf-size rcvbuf-size ko-count
		# allow-two-primaries cram-hmac-alg shared-secret after-sb-0pri
		# after-sb-1pri after-sb-2pri always-asbp rr-conflict
		# ping-timeout data-integrity-alg tcp-cork on-congestion
		# congestion-fill congestion-extents csums-alg verify-alg
		# use-rle
		protocol C;
	}

drbd metadisk를 생성한다.(양쪽 노드 모두) 그리고, DRBD 서비스를 실행한다.

메타디스크 생성

# drbdadm create-md disk_wolf
md_offset 8589930496
al_offset 8589897728
bm_offset 8589635584

Found some data

 ==> This might destroy existing data! <==

Do you want to proceed?
[need to type 'yes' to confirm] yes

initializing activity log
initializing bitmap (256 KB) to all zero
Writing meta data...
New drbd meta data block successfully created.
success

서비스 실행

# systemctl start drbd

drbd master로 사용할 서버에서(여기서는 wolf1) 아래 명령을 실행한다.

# drbdadm role disk_wolf
Secondary
[root@wolf1 drbd.d]# drbdadm primary disk_wolf 
disk_wolf: State change failed: (-2) Need access to UpToDate data
Command 'drbdsetup primary disk_wolf' terminated with exit code 17
# drbdadm primary --force disk_wolf
# drbdadm role disk_wolf
Primary

drbd 상태 확인

[root@wolf1 ~]# drbdadm status
disk_wolf role:Primary
  disk:UpToDate
  wolf2 role:Secondary
    replication:SyncSource peer-disk:Inconsistent done:1.20

[root@wolf2 ~]# drbdadm status
disk_wolf role:Secondary
  disk:Inconsistent
  wolf1 role:Primary
    replication:SyncTarget peer-disk:UpToDate done:3.27

[root@wolf2 ~]# drbdadm secondary disk_wolf
[root@wolf2 ~]# drbdadm status
disk_wolf role:Secondary
  disk:UpToDate
  wolf1 role:Primary
    peer-disk:UpToDate

[root@wolf1 ~]# drbdadm status
disk_wolf role:Primary
  disk:UpToDate
  wolf2 role:Secondary
    peer-disk:UpToDate

화일시스템을 생성하고 원하는 위치에 마운트해 본다.

# mkfs -t xfs /dev/drbd0
meta-data=/dev/drbd0             isize=512    agcount=4, agsize=524270 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=0, sparse=0
data     =                       bsize=4096   blocks=2097079, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

# mount /dev/drbd0 /mnt
# umount /mnt

4. DRBD리소스를 클러스터에 추가하기
먼저, 클러스터 상태를 확인해 본다

# pcs status
Cluster name: wolf_cluster
Stack: corosync
Current DC: wolf1 (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum
Last updated: Thu Sep 12 06:36:33 2019
Last change: Wed Sep 11 21:34:30 2019 by root via cibadmin on wolf1

2 nodes configured
2 resources configured

Online: [ wolf1 wolf2 ]

Full list of resources:

 VirtualIP	(ocf::heartbeat:IPaddr2):	Started wolf1
 WebService	(ocf::heartbeat:apache):	Started wolf1

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

drbd 리소스를 클러스터에 통합하기 위해서 아래 명령어를 실행한다. push 하기 전까지는 현 디렉토리에 drbd_cfg 파일에 저장된다. cib(Cluster Information Base) 옵션은 cib로부터 raw xml화일을 생성한다.

# pcs cluster cib drbd_cfg

Drbd Data 리소스를 생성한다.

# pcs -f drbd_cfg resource create DrbdData ocf:linbit:drbd drbd_resource=disk_wolf op monitor interval=60s

Drbd Clone 리소스를 생성한다.

# pcs -f drbd_cfg resource master DrbdDataClone DrbdData master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true

이제 작성한 cib를 라이브 CIB로 push한다

# pcs cluster cib-push drbd_cfg
CIB updated

pcs status로 확인 했을때 아래 오류 발생. 확인결과 오타로인한 오류가 났으며, resource를 삭제후 다시만들었음

# pcs status
Cluster name: wolf_cluster
Stack: corosync
Current DC: wolf2 (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum
Last updated: Thu Sep 12 07:56:22 2019
Last change: Thu Sep 12 07:13:57 2019 by root via cibadmin on wolf1

2 nodes configured
4 resources configured

Online: [ wolf1 wolf2 ]

Full list of resources:

 VirtualIP	(ocf::heartbeat:IPaddr2):	Started wolf1
 WebService	(ocf::heartbeat:apache):	Started wolf1
 Master/Slave Set: DrbdDataClone [DrbdData]
     Masters: [ wolf2 ]
     Stopped: [ wolf1 ]

Failed Actions:
* DrbdData_start_0 on wolf1 'not installed' (5): call=19, status=complete, exitreason='',
    last-rc-change='Thu Sep 12 07:13:59 2019', queued=0ms, exec=134ms


Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

# pcs resource delete DrbdData
Attempting to stop: DrbdData... Stopped

수정후 클러스터 상태 확인

# pcs status 
Cluster name: wolf_cluster

WARNINGS:
No stonith devices and stonith-enabled is not false

Stack: unknown
Current DC: NONE
Last updated: Thu Sep 12 09:45:07 2019
Last change: Thu Sep 12 09:40:04 2019 by root via cibadmin on wolf1

0 nodes configured
2 resources configured


Full list of resources:

 Master/Slave Set: DrbdDataClone [DrbdData]

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

# pcs status resources
 Master/Slave Set: DrbdDataClone [DrbdData]

DRBD 파일시스템 리소스를 만든다.

# pcs cluster cib fs_cfg

DRBD 리소스의 마운트 포인트를 정의한다. 웹서비스에 사용할것이므로, /var/www/html 에 마운트하도록 설정한다.

 pcs  -f fs_cfg resource create DrbdFS Filesystem device="/dev/drbd0" directory="/var/www/html" fstype="xfs"

리소스가 시작될때의 우선순위를 지정한다.

# pcs  -f fs_cfg resource create DrbdFS Filesystem device="/dev/drbd0" directory="/var/www/html" fstype="xfs"
Assumed agent name 'ocf:heartbeat:Filesystem' (deduced from 'Filesystem')
# pcs  -f fs_cfg constraint colocation add DrbdFS with DrbdDataClone INFINITY with-rsc-role=Master
# pcs  -f fs_cfg constraint order promote DrbdDataClone then start DrbdF
Adding DrbdDataClone DrbdFS (kind: Mandatory) (Options: first-action=promote then-action=start)
# pcs -f fs_cfg constraint colocation add WebService with DrbdFS INFINITY
# pcs constraint order DrbdFS then WebService
Adding DrbdFS WebService (kind: Mandatory) (Options: first-action=start then-action=start)

작성한 cib를 push한다.

# pcs cluster cib-push fs_cfg

여기까지 하고 클러스터 상태를 확인했더니, 아래처럼 DRBD 오류가 보인다.

# pcs status
Cluster name: wolf_cluster
Stack: corosync
Current DC: wolf2 (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum
Last updated: Sun Sep 15 07:29:22 2019
Last change: Sun Sep 15 07:11:21 2019 by root via cibadmin on wolf1

2 nodes configured
5 resources configured

Online: [ wolf1 wolf2 ]

Full list of resources:

 VirtualIP	(ocf::heartbeat:IPaddr2):	Started wolf1
 WebService	(ocf::heartbeat:apache):	Stopped
 Master/Slave Set: DrbdDataClone [DrbdData]
     Masters: [ wolf2 ]
     Stopped: [ wolf1 ]
 DrbdFS	(ocf::heartbeat:Filesystem):	Started wolf2

Failed Actions:
* DrbdData_start_0 on wolf1 'not installed' (5): call=21, status=complete, exitreason='',
    last-rc-change='Sun Sep 15 07:13:48 2019', queued=0ms, exec=96ms


Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

확인해 보면,

[root@wolf1 cluster]# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda2       7.6G  1.4G  6.2G  18% /
devtmpfs        909M     0  909M   0% /dev
tmpfs           920M   38M  882M   5% /dev/shm
tmpfs           920M  8.6M  911M   1% /run
tmpfs           920M     0  920M   0% /sys/fs/cgroup
tmpfs           184M     0  184M   0% /run/user/0
/dev/drbd0      8.0G   33M  8.0G   1% /var/www/html
[root@wolf1 cluster]# drbdadm status
WARN: no normal resources defined for this host (wolf)!?
Misspelled name of the local machine with the 'on' keyword ?

호스트네임 문제로 생각되어 호스트네임을 확인해 보면, /etc/hosts 화일에 VirtualIP를 위한 호스트가 아래처럼 설정되어 있다.

[root@wolf1 cluster]# hostname
wolf
# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

192.168.100.31	wolf1
#172.16.1.31	wolf1-drbd
192.168.100.32	wolf2
#172.16.1.32	wolf2-drbd
192.168.100.30	wolf

위의 192.168.100.30을 양쪽 노드에서 주석처리하거나 삭제하고 클러스터를 다시 실행해 보면 정상 작동함을 확인 할 수 있다.

양쪽노드에서 drbd 정지후 재 실행.

# systemctl stop drbd

drbdadm status 명령으로 drbd 정상작동 확인
노드1 이 primary

[root@wolf1 ~]# drbdadm status
disk_wolf role:Primary
  disk:UpToDate
  peer role:Secondary
    replication:Established peer-disk:UpToDate

노드2가 secondary

[root@wolf2 ~]# drbdadm status
disk_wolf role:Secondary
  disk:UpToDate
  peer role:Primary
    replication:Established peer-disk:UpToDate

이제, 클러스터를 정지하고 다시 실행한다.

[root@wolf1 ~]# pcs cluster stop --all
wolf1: Stopping Cluster (pacemaker)...
wolf2: Stopping Cluster (pacemaker)...
wolf2: Stopping Cluster (corosync)...
wolf1: Stopping Cluster (corosync)...

[root@wolf1 cluster]# pcs cluster start --all
wolf1: Starting Cluster (corosync)...
wolf2: Starting Cluster (corosync)...
wolf1: Starting Cluster (pacemaker)...
wolf2: Starting Cluster (pacemaker)...

클러스터 상태확인을 해 본다.

[root@wolf1 ~]# pcs status
Cluster name: wolf_cluster
Stack: corosync
Current DC: wolf2 (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum
Last updated: Sun Sep 15 12:12:34 2019
Last change: Sun Sep 15 12:04:27 2019 by root via cibadmin on wolf1

2 nodes configured
5 resources configured

Online: [ wolf1 wolf2 ]

Full list of resources:

 VirtualIP	(ocf::heartbeat:IPaddr2):	Started wolf1
 WebService	(ocf::heartbeat:apache):	Started wolf1
 Master/Slave Set: DrbdDataClone [DrbdData]
     Masters: [ wolf1 ]
     Slaves: [ wolf2 ]
 DrbdFS	(ocf::heartbeat:Filesystem):	Started wolf1

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled
[root@wolf1 ~]# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda2       7.6G  1.4G  6.2G  18% /
devtmpfs        909M     0  909M   0% /dev
tmpfs           920M   38M  882M   5% /dev/shm
tmpfs           920M  8.6M  911M   1% /run
tmpfs           920M     0  920M   0% /sys/fs/cgroup
tmpfs           184M     0  184M   0% /run/user/0
/dev/drbd0      8.0G   33M  8.0G   1% /var/www/html
[root@wolf1 ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 08:00:27:ea:a0:7a brd ff:ff:ff:ff:ff:ff
    inet 192.168.100.31/24 brd 192.168.100.255 scope global noprefixroute enp0s3
       valid_lft forever preferred_lft forever
    inet 192.168.100.30/32 brd 192.168.100.255 scope global enp0s3
       valid_lft forever preferred_lft forever
    inet6 fe80::73b7:ab9b:bae:634b/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 08:00:27:b0:39:96 brd ff:ff:ff:ff:ff:ff
    inet 172.16.1.31/24 brd 172.16.1.255 scope global noprefixroute enp0s8
       valid_lft forever preferred_lft forever
    inet6 fe80::a00:27ff:feb0:3996/64 scope link 
       valid_lft forever preferred_lft forever

노드1을 정지하고, 노드2에서 서비스 확인(클러스터 상태, VirtualIP, 마운트, 웹서비스확인)을 해 보면 아래와 같다.

[root@wolf2 ~]# pcs cluster stop wolf1
wolf1: Stopping Cluster (pacemaker)...
wolf1: Stopping Cluster (corosync)...
[root@wolf2 ~]# pcs status
Cluster name: wolf_cluster
Stack: corosync
Current DC: wolf2 (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum
Last updated: Sun Sep 15 12:14:37 2019
Last change: Sun Sep 15 12:04:27 2019 by root via cibadmin on wolf1

2 nodes configured
5 resources configured

Online: [ wolf2 ]
OFFLINE: [ wolf1 ]

Full list of resources:

 VirtualIP	(ocf::heartbeat:IPaddr2):	Started wolf2
 WebService	(ocf::heartbeat:apache):	Started wolf2
 Master/Slave Set: DrbdDataClone [DrbdData]
     Masters: [ wolf2 ]
     Stopped: [ wolf1 ]
 DrbdFS	(ocf::heartbeat:Filesystem):	Started wolf2

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled
[root@wolf2 ~]# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda2       7.6G  1.4G  6.2G  18% /
devtmpfs        909M     0  909M   0% /dev
tmpfs           920M   54M  866M   6% /dev/shm
tmpfs           920M  8.6M  911M   1% /run
tmpfs           920M     0  920M   0% /sys/fs/cgroup
tmpfs           184M     0  184M   0% /run/user/0
/dev/drbd0      8.0G   33M  8.0G   1% /var/www/html
[root@wolf2 ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 08:00:27:c4:95:c2 brd ff:ff:ff:ff:ff:ff
    inet 192.168.100.32/24 brd 192.168.100.255 scope global noprefixroute enp0s3
       valid_lft forever preferred_lft forever
    inet 192.168.100.30/32 brd 192.168.100.255 scope global enp0s3
       valid_lft forever preferred_lft forever
    inet6 fe80::b603:32a0:249b:5278/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 08:00:27:53:c9:09 brd ff:ff:ff:ff:ff:ff
    inet 172.16.1.32/24 brd 172.16.1.255 scope global noprefixroute enp0s8
       valid_lft forever preferred_lft forever
    inet6 fe80::a00:27ff:fe53:c909/64 scope link 
       valid_lft forever preferred_lft forever

node1 중지후 재실행시 서비스가 node1으로 돌아오지 않을때는 constraint 를 조정해본다.
내 경우 최종 constraint는 아래와 같다.

# pcs constraint
Location Constraints:
  Resource: DrbdDataClone
    Enabled on: wolf2 (score:50)
  Resource: DrbdFS
    Enabled on: wolf1 (score:INFINITY)
    Enabled on: wolf2 (score:50)
  Resource: WebService
    Enabled on: wolf1 (score:INFINITY)
    Enabled on: wolf2 (score:50)
Ordering Constraints:
  start VirtualIP then start WebService (kind:Mandatory)
  promote DrbdDataClone then start DrbdFS (kind:Mandatory)
  start DrbdFS then start WebService (kind:Mandatory)
Colocation Constraints:
  DrbdFS with DrbdDataClone (score:INFINITY) (with-rsc-role:Master)
  WebService with DrbdFS (score:INFINITY)
  WebService with VirtualIP (score:INFINITY)
Ticket Constraints:

* 기타:
drbd용으로 두번째 NIC을 사용하려 했으나 hostname 문제가 발생해서 클러스터와 같은 NIC을 사용하도록 설정했다.
drbd는 8.4를 사용했다.

* 참고문서
http://fibrevillage.com/sysadmin/304-pcs-command-reference
https://www.learnitguide.net/2016/07/integrate-drbd-with-pacemaker-clusters.html

* 리눅스 HA(Pacemaker, corosync, iscsi shared storage, lvm2) – part 3

리눅스 HA(corosync, pacemaker, DRBD) – Part 2

이것이 좋아요:

관련

30 comments

답글 남기기 응답 취소

카테고리

최신 글

리눅스 HA(corosync, pacemaker, DRBD) – Part 2

이 글 공유하기:

이것이 좋아요:

관련

30 comments

답글 남기기 응답 취소

카테고리

최신 글

태그