Please login or register. October 17, 2017, 08:19:39 PM

Author Topic: HACMP two node cluster with SAN storage mirror  (Read 4039 times)

0 Members and 1 Guest are viewing this topic.

Odil

  • New Member
  • *
  • Posts: 2
  • Karma: +0/-0
HACMP two node cluster with SAN storage mirror
« on: April 07, 2014, 11:31:46 AM »
Hi All,
Having HACMP two-node cluster with two SAN storages mirrored using LVM.  No Cluster Sites defined. Both nodes connected to both SAN storages
First issue Heart beat network. Configured 2 disk heartbeat networks - 1 per each SAN storage. While performing redundancy tests once one of SAN storage is down - cluster is going to ERROR state with errors for heartbeat network down. What are the guidelines to configure heartbeat network in such environment.

Second issue While performing redundancy tests and turn off same time working server + one of SAN Storage Resource group is not coming up with the following error message
+MAIN_Rg:cl_sync_vgs[191] lqueryvg -g 00c5e65700004c0000000143a479f069 -L
+MAIN_Rg:cl_sync_vgs[191] cut -f2- '-d '
+MAIN_Rg:cl_sync_vgs[192] read lv_name stale_count
+MAIN_Rg:cl_sync_vgs[193] (( 1 != 3 ))
+MAIN_Rg:cl_sync_vgs[195] [[ high == high ]]
+MAIN_Rg:cl_sync_vgs[195] set -x
+MAIN_Rg:cl_sync_vgs[197] : This logical volume has stale partitions, so sync it.
+MAIN_Rg:cl_sync_vgs[198] : Doing 4 stale partitions at a time seems to be a
+MAIN_Rg:cl_sync_vgs[199] : win most of the time. However, we will honor the
+MAIN_Rg:cl_sync_vgs[200] : NUM_PARALLEL_LPS value in /etc/environment, if set.
+MAIN_Rg:cl_sync_vgs[202] grep ^NUM_PARALLEL_LPS= /etc/environment
+MAIN_Rg:cl_sync_vgs[202] NPL_VAR=''
+MAIN_Rg:cl_sync_vgs[203] [[ 1 == 0 ]]
+MAIN_Rg:cl_sync_vgs[213] cl_log 999 'Warning: syncvg can take considerable amount of time, depending on data size and network
 bandwidth.'
+MAIN_Rg:cl_log[+50] version=1.10
+MAIN_Rg:cl_log[+94] SYSLOG_FILE=/var/hacmp/adm/cluster.log
***************************
Apr 5 2014 19:17:02 !!!!!!!!!! ERROR !!!!!!!!!!
[/sub

Please help!! Thank you ALL in advance!

Michael

  • Administrator
  • Hero Member
  • *****
  • Posts: 1052
  • Karma: +0/-0
Re: HACMP two node cluster with SAN storage mirror
« Reply #1 on: April 07, 2014, 12:12:28 PM »
1) It is not uncommon for a network to drop. The main reason for the heartbeat network is so that cluster managers running on the nodes can communicate and know that the cluster is alive whenever the IP adapter and/or network fails. This prevents IP from becoming a SPOF.

2) Perhaps this is a official "error" status, but I would have expected "unstable" instead. Traditionally, error, means HACMP cannot process events. That is, HACMP/PowerHA will not progress until the error is cleared.

As such, I doubt the your non-IP aka heartbeat network is the root cause of the error status.

Couple of quick questions:
AIX level, PowerHA level; patches applied; test cluster and/or is there already an application running.

And I shall think of more questions as we proceed (I have not had to debug HACMP/PowerHA/SystemMirror for quite a while).

Odil

  • New Member
  • *
  • Posts: 2
  • Karma: +0/-0
Re: HACMP two node cluster with SAN storage mirror
« Reply #2 on: April 07, 2014, 12:29:01 PM »
Hi Michael,
Thank you very much for your kind response!
1) AIX
# oslevel -r
7100-00

2) PowerHA
# /usr/es/sbin/cluster/utilities/halevel -s
6.1.0 SP12

3) There is a single cluster application running which is Oracle DB which is in fact NOT monitored

# cllsserv
MAIN_App  /etc/start.sh  /etc/stop.sh

# clRGinfo -m
---------------------------------------------------------------------------------------------------------------------
Group Name     State                                                    Application state            Node
---------------------------------------------------------------------------------------------------------------------
MAIN_Rg        ONLINE                                                                                main2
 MAIN_App                                                                 ONLINE NOT MONITORED

Let me know if you need additional information to be provided.

Thank you for support!
Odil

1) It is not uncommon for a network to drop. The main reason for the heartbeat network is so that cluster managers running on the nodes can communicate and know that the cluster is alive whenever the IP adapter and/or network fails. This prevents IP from becoming a SPOF.

2) Perhaps this is a official "error" status, but I would have expected "unstable" instead. Traditionally, error, means HACMP cannot process events. That is, HACMP/PowerHA will not progress until the error is cleared.

As such, I doubt the your non-IP aka heartbeat network is the root cause of the error status.

Couple of quick questions:
AIX level, PowerHA level; patches applied; test cluster and/or is there already an application running.

And I shall think of more questions as we proceed (I have not had to debug HACMP/PowerHA/SystemMirror for quite a while).

Michael

  • Administrator
  • Hero Member
  • *****
  • Posts: 1052
  • Karma: +0/-0
Re: HACMP two node cluster with SAN storage mirror
« Reply #3 on: April 07, 2014, 02:33:18 PM »
AIX 7100-TL0 is long unsupported. Maybe it is missing one patch and saying older than it is.

What I need is the output of oslevel -s. However, just in case you have updates installed, but missing a single fileset use the command below to give a better view of what AIX knows about.

Code: [Select]
oslevel -q -s | head -5
You can read up on usage for oslevel at: http://www.rootvg.net/content/view/441/107/

Finally, for recommendations about software and firmware levels go to FLRT (Fix Level Recommendation Tool) for a verification of your software levels - at http://www14.software.ibm.com/webapp/set2/flrt/

p.s. If you cannot get to the recommended levels - I will still try to help, but there may be issues we cannot solve if you are not at recent levels.
« Last Edit: April 09, 2014, 07:34:04 AM by Michael »