High Availability

Run backup nodes

It's likely that your IoTeX node could be in outage because of code bugs, insufficient compute resources, host failure, network failure and etc. Also, you may have to shut down your node to do some maintenance work or update the node software itself.

Running multiple IoTeX nodes is the best way to guarantee the high availability (zero downtime) of your delegation, that's why IoTeX nodes provide a utility feature to run in "backup mode" conveniently.

Configure the main and backup nodes

Let's assume that you want to run three nodes for your block producer delegate: one is the main node, i.e. the one that actively participates in the consensus work, and the other two are on "standby" and only listening to the blocks.

You can conveniently configure all of the three nodes with the same producerPrivKey private key setting, as long as you also add the following settings in config.yaml for all the nodes:

...
network:
  ...
  masterKey: producer_private_key-replica_id
  ...
...

while you add this setting for the main node only:

...
system:
  ...
  active: true
  ...
...

and this one for the standby nodes only:

...
system:
  ...
  active: false
  ...
...

Switch the active mode

Make sure you export port 9009 from the node's docker container.

As soon as your main node is down, make sure you manually switched the mode to standby" in the configuration file (this reduces the risk of running two active delegates, see below), then you can open the following URL to turn one of your backup nodes from standby into active mode:

http://ip-to-one-node:9009/ha?activate=true

Always make sure you don't have any other node in Active mode before activating a backup node: running two block-producer nodes which use the same private key is considered a serious attack on the network and will result in slashing of your delegate and possibly loss of the self staked delegate deposit.

Similarly, you can turn an active node into standby mode by using:

http://ip-to-one-node:9009/ha?activate=false.

Finally, if you only want to check the active status of a node, use the ULR below:

http://ip-to-one-node:9009/ha

Automatic Leader Election

If you run many nodes, and you want to get rid of the tedious manual operation, or just want to try out the fancy setup of a high availability cluster, you can take a look at this automatic leader election solution:

Please notice that running two block-producer nodes using the same private key is considered a serious attack on the network and will result in slashing of your delegate and possibly loss of the self staked delegate deposit.

Always be careful when using automatic leader election algorithms and make sure you never have two block producers sharing the same private key in active mode

Last updated