Please login or register. June 23, 2017, 10:11:37 AM

Author Topic: default route with failed switch port on NIC with persistent IP  (Read 14837 times)

0 Members and 1 Guest are viewing this topic.

u0003193

  • New Member
  • *
  • Posts: 3
  • Karma: +0/-0
default route with failed switch port on NIC with persistent IP
« on: December 16, 2008, 03:54:10 PM »
Got a question here: 

Setting: A 2-node HACMP with 2 NICs on each node, use IP aliasing, both service IP and persistent IP are in the same subnet trying to take advantage of default route for persistent IP.
 
Issue: The switch port connecting a NIC with our persistent IP failed and swap_adapter event didn't move default route to standby NIC.  Since both persistent IP and service IP use the same default gateway, this failure cut the server from network.

Cause: Since logs are removed, don't know why HACMP swap_adapter event failed over service IP and persistent IP but didn't move default route to another NIC though the script shows it should (will try to duplicate the event this weekend to get log).

If you have any insight regarding this issue, please share.  Thanks!

u0003193

  • New Member
  • *
  • Posts: 3
  • Karma: +0/-0
Re: default route with failed switch port on NIC with persistent IP
« Reply #1 on: December 22, 2008, 06:01:50 PM »
When en0 is disconnected, swap_adapter event created .restore_routes, looks like cl_route_change still put default route back to en0 even though en0 failed....not sure if anyone have this problem before.  Thanks!

HOSTNAME:/usr/es/sbin/cluster #more .restore_routes
#!/bin/ksh
#
# Script created by cl_swap_IP_addres on Sat Dec 20 17:23:43 CST 2008
#
PATH=/usr/es/sbin/cluster:/usr/es/sbin/cluster/utilities:/usr/es/sbin/cluster/events:/usr/es/sbin/cluster/events/utils:/usr/es/sbin/
cluster/events/cmd:/usr/es/sbin/cluster/diag:/usr/es/sbin/cluster/etc:/usr/es/sbin/cluster/sbin:/usr/es/sbin/cluster/cspoc:/usr/es/s
bin/cluster/conversion:/usr/es/sbin/cluster/events/emulate:/usr/es/sbin/cluster/events/emulate/driver:/usr/es/sbin/cluster/events/em
ulate/utils:/usr/es/sbin/cluster/tguides/bin:/usr/es/sbin/cluster/tguides/classes:/usr/es/sbin/cluster/tguides/images:/usr/es/sbin/c
luster/tguides/scripts:/usr/es/sbin/cluster/glvm/utils:/usr/bin:/etc:/usr/sbin:/usr/ucb:/usr/bin/X11:/sbin
[[ "$VERBOSE_LOGGING" = "high" ]] && set -x
#
cl_route_change default 127.0.0.1 x.x.x.x-my-default-gw

Michael

  • Administrator
  • Hero Member
  • *****
  • Posts: 1039
  • Karma: +0/-0
Re: default route with failed switch port on NIC with persistent IP
« Reply #2 on: January 04, 2009, 05:05:02 PM »
I will have to look at the script, but route command does not bind to an interface - normally. It sets a route table entry.

As your addresses are aliased I am thinking that your default router is not a "service" address network, but a non-service IP address - and then the route command will continue to use a "failed" interface.

Change some numbers to protect your identity (if needed), but please give me routing information with HACMP inactive compared to routing information with HACMP active.

John R Peck

  • Administrator
  • Senior Member
  • *****
  • Posts: 134
  • Karma: +0/-0
Re: default route with failed switch port on NIC with persistent IP
« Reply #3 on: January 06, 2009, 04:37:50 AM »
Regardless of HACMP, where you have an AIX OS instance that has two NIC ports in the same logical network,
outbound packet traffic will go via only the first configured interface in the routing table,
although inbound packets can be received on either port.  Basically unless you are using
etherchannel for resilience, you can't really have two ports in the same logical network.

If your card that was the outbound one fails, that's why your connection fails even though you have the other card
because the OS can't reply.  I think you might get it to recover by ifconfig detach on the primary card if you will,
for the other to take over in the routing table. 

Otherwise, with respect to HACMP, not that I'm an expert, don't you have an event script that can change your
route table if required on the failure of an interface ? 

I would imagine etherchannel within a single box is the better way to achieve card resilience for that server,
leaving HACMP to handle shared disk and application migrations. 

Michael

  • Administrator
  • Hero Member
  • *****
  • Posts: 1039
  • Karma: +0/-0
Re: default route with failed switch port on NIC with persistent IP
« Reply #4 on: January 06, 2009, 11:33:07 AM »
However, as it is HACMP (or PowerHA as it is now called), it does make a difference with regard to which version of IPAT you are using: alias or replacement. As such, John's comment are valid for IPAT via aliasing - as HACMP only moves the alias IP address that, by definition, is in a different IP network that the AIX (aka boot) IP addresses assigned. If you are using IPAT via replacement this problem should not occur.

My question is: in which IP network is the default assigned: in the "AIX" or (one of) boot IP network(s), or in the "service" IP network?

u0003193

  • New Member
  • *
  • Posts: 3
  • Karma: +0/-0
Re: default route with failed switch port on NIC with persistent IP
« Reply #5 on: January 08, 2009, 08:35:18 PM »
Michael is correct, it turns out that no persistent IP configured as I imaged.  The server name is configured just like normal AIX, not "service" IP address.  That is why the default route didn't failover (The IP is not under HACMP control)

I did try to do what John suggested since we are using IP aliasing, I went to the other node, connected to the server from standby interface, was removing default route and attaching it to standby interface...then suddenly a coworker powered down the server

The HACMP log went away.  During my testing on another pair with the same failure I noticed that no persistent IP through debug info, that got me checking configuration (We have 2 dozens HACMP, I thought I fixed  them all).

Thanks!