G'day all,
I have read through the posts here and have found some descriptons close to my own problem but they are not quite the same.
Firstly, I want to state at the outset that I know using Etherchannel would overcome the issue - but hacmp is supposed to do the work and high availability means high availability - we have it because getting down time agreed to is difficult.
Now I have got that off my chest, I will describe the problem. I have a 2 node hacmp cluster using IPAT. Its running 5.3 cluster software on an AIX 5.3 machine. Our persistent address is on the same subnet as our service address (192.168.160.0/24) and the boot addresses on the two cards are on different subnets to each other and to the service address (10.0.1.0/24, 10.0.2.0/24). When the switch failed last year, the service address and the gateway moved to the working switch. However, looking at the routing table, we still had a route using the faulty interface (en1). Consequently, every second ip connection on the subnet of the service address (192.168.160) seemed to be sent to the bad route. I did try the boot address subnet and it worked fine but can't guarantee how well I checked it - limited down time meant not much time to test different scenarios. Everything that went thorugh the default gateway worked so that clients didn't notice any difference.
I did some research and found seemingly related posts on the internet that kind of suggested the sort of behaviour we were seeing could be fixed by using dead gateway detection - I tried both active_dgd on the route and passive_dgd, neither made a difference.
Reading thorugh the posts here, it seems the problem is our choice of addresses and networks. At the same time as all this, our accounting has somehow managed to not pay our HACMP maintenance so I can't run this by IBM.
Does anyone have a suggestion of what might be the cause and a potential fix to our problems?
Any insight or comments would be appreciated!
Cheers,
Bernie.