Please login or register. August 19, 2017, 07:10:04 AM

Author Topic: HACMP verification and synchonization problem  (Read 14761 times)

0 Members and 1 Guest are viewing this topic.

pravinkatkade

  • Jr. Member
  • **
  • Posts: 5
  • Karma: +0/-0
HACMP verification and synchonization problem
« on: July 27, 2009, 07:51:53 AM »
Hi There,

When I am doing verification and synchonization of server it is failing. Please find below the screen dump after finishing verification and synchonization

Command: failed        stdout: yes           stderr: no

Before command completion, additional instructions may appear below.

/usr/es/sbin/cluster/utilities/clcheck_server: line 93: open file limit exceeded [Bad file number]
/usr/es/sbin/cluster/utilities/clcheck_server: line 93: open file limit exceeded [Bad file number]

Verification to be performed on the following:
        Cluster Topology
        Cluster Resources

Verification will interactively correct verification errors.


Retrieving data from available cluster nodes.  This could take a few minutes.

        Start data collection on node salpar11
        Start data collection on node sblpar11
        Collector on node sblpar11 completed
        Collector on node salpar11 completed
        Data collection complete

Verifying Cluster Topology...

        Completed 10 percent of the verification checks


For nodes with a single Network Interface Card per logical
network configured, it is recommended to include the file
'/usr/es/sbin/cluster/netmon.cf' with a "pingable"
IP address as described in the 'HACMP Planning Guide'.
WARNING: File 'netmon.cf' is missing or empty on the following nodes:
salpar11
sblpar11
        Completed 20 percent of the verification checks
WARNING: sblpar11: Read on disk /dev/vpath6 failed.
  Check cables and connections.
  A reserve may be set on that disk by another node.

WARNING: sblpar11: Read on disk /dev/vpath13 failed.
  Check cables and connections.
  A reserve may be set on that disk by another node.

/usr/es/sbin/cluster/utilities/cldare[37]: 1745064 Memory fault

cldare: Failures detected during verification.  Please correct
the errors and retry this command.

Verification has completed normally.




Please let me know how to grt it synchronized.

Michael

  • Administrator
  • Hero Member
  • *****
  • Posts: 1041
  • Karma: +0/-0
Re: HACMP verification and synchonization problem
« Reply #1 on: August 06, 2009, 11:04:07 PM »
Quote
/usr/es/sbin/cluster/utilities/cldare[37]: 1745064 Memory fault

So that is the error. Up to then it had just been Warnings.

What runs through my mind:

What limits does root have in /etc/security/limits?
What version of AIX (oslevel -s), and in particular, rsct.core.* fileset levels?
What version and patch level of HACMP (or PowerHA)?

pravinkatkade

  • Jr. Member
  • **
  • Posts: 5
  • Karma: +0/-0
Re: HACMP verification and synchonization problem
« Reply #2 on: August 07, 2009, 10:28:46 AM »
Hi Michael,

limits for user root


> ulimit -a
time(seconds)        unlimited
file(blocks)         unlimited
data(kbytes)         unlimited
stack(kbytes)        unlimited
memory(kbytes)       unlimited
coredump(blocks)     0
nofiles(descriptors) unlimited

 > oslevel -s
5300-08-03-0831

 


 cluster.license            5.4.0.0  COMMITTED  HACMP Electronic License
  rsct.basic.hacmp           2.4.9.0  COMMITTED  RSCT Basic Function (HACMP/ES
  rsct.compat.basic.hacmp    2.4.9.0  COMMITTED  RSCT Event Management Basic
                                                 Function (HACMP/ES Support)
  rsct.compat.clients.hacmp  2.4.9.0  COMMITTED  RSCT Event Management Client
                                                 Function (HACMP/ES Support)


I resolved this problem by reboot of another node and auto correction while starting cluster.

but I want to know the reason of the problem.

BR

Michael

  • Administrator
  • Hero Member
  • *****
  • Posts: 1041
  • Karma: +0/-0
Re: HACMP verification and synchonization problem
« Reply #3 on: August 09, 2009, 08:54:57 AM »
Since you rebooted another node, and that resolved the problem, and there are errors/warnings reported regarding your vpath availability I would examine your SAN setup to verify that all vpaths are setup so that all nodes may access all the disks at the same time.

The attribute value IBM uses is no_reserve. Per disk you would run a command like:

chdev -l hdisk$i -a reserve_policy=no_reserve

If the disks are active, they must be varied off, or a reboot is needed, after updating the ODM with something like

chdev -l hdisk$i -a reserve_policy=no_reserve -P

This is just a guess, based on the available information.

btw - a reboot on the other node releases any reserves it might have held, and during auto-correction fewer resources groups (i.e. volumn groups varied on) may have been active and that is why syncronization could be successful.

Glad you got it resolved!