What happens when management connection lost between nodes

Features, capabilities, and information about HAAst
Post Reply
TomJohnsonMA
Posts: 3
Joined: Thu Nov 08, 2018 8:08 pm

What happens when management connection lost between nodes

Post by TomJohnsonMA » Fri Nov 09, 2018 6:18 pm

I have two network interfaces on each of my PBX's. One I will call the voice network and the other the management network. I've setup HAAst nodes to point to each other on the management network. And that works fine in terms of one node taking over for the other upon failure.

However, if there is a problem with the management network (eg: unplug the cable), then the inactive node becomes active and thus both PBX's say they are the active node. Is something wrong with my setup?
User avatar
Telium Support
Posts: 185
Joined: Sun Nov 27, 2016 3:27 pm

Re: What happens when management connection lost between nodes

Post by Telium Support » Fri Nov 09, 2018 7:06 pm

I'll start by answering this question in the context of ANY cluster (eg: gateway cluster, router cluster, file server cluster, etc.). If the nodes which make up the cluster cannot talk to one another then they have no way of knowing if the other node is dead or alive. As such, the correct action for any isolated cluster node is to promote to active and assume the other node is dead. Once the nodes contact each other again they discover that multiple nodes are active (a situation called "dual-active contention"). Then the nodes should negotiate who should remain active, and who should demote itself.

This is exactly what happens with HAAst. If the management connection between nodes is lost, then there is no way for either node to know that the other is alive. And so both nodes try to take over telephony service. Once the nodes reconnect then one node will automatically demote itself.

You will find this scenario plays out identically with any commercial HA product (eg: CISCO routers with HSRP). Dual-active contention is the worst case scenario for any cluster as the two nodes are competing, and they will both contend for the resources / traffic / data / etc.

There is a workaround called STONITH - available using event handlers in the Commercial Unlimited edition of HAAst. STONITH is an acronym for "Short The Other Node In The Head", which basically tells one node to power off the other node. Although HAAst supports STONITH this functionality is disabled by default as the concept of STONITH is hotly debated as risky (a failing node may mistaking shoot the healthy node). And there are many scenarios where STONITH does not work (eg: two isolated nodes) without another out of band connection (eg: serial, 3rd network connection, etc)
Post Reply