Please login or register. September 22, 2017, 09:39:34 AM

Author Topic: NIM thread blocked across multiple clusters at exactly the same time  (Read 10051 times)

0 Members and 1 Guest are viewing this topic.

gumboe

  • Registered
  • *
  • Posts: 1
  • Karma: +0/-0
I noticed the other day that there were some NIM thread blocked errors in errpt on one of our servers. Curious to see whether this was specific to this machine I ran the following command from our csm master:

dsh "errpt | grep "NIM thread blocked"

The result I got back was that at exactly the same time all servers reported this error.  Looking at the detail of the errpt entry it says that the affected item is the disk heartbeat device which sits on the SAN.

My question is this:  Is it as I suspect that there must have been some issue on the SAN that caused this, perhaps some zoning going on or something similar.  Could I have reduced the possibility of this error by alerting the syncd setting?

Thanks in advance.

Michael

  • Administrator
  • Hero Member
  • *****
  • Posts: 1052
  • Karma: +0/-0
Re: NIM thread blocked across multiple clusters at exactly the same time
« Reply #1 on: April 16, 2007, 08:01:17 AM »
syncd, imho, has no relation with the errpt statement - other than perhaps a sync was being done.

The heartbeat uses RSCT to keep a HB going over an ehanced concurrent volume group, which is what I am assuming is how you  have configured your disks. Most likely someone was doing 'something', whcih is why it is important that there is also an IP network active - so the nods know that it is a network failure and not a node failure, or worse, split cluster.