Page tree
Skip to end of metadata
Go to start of metadata



Overview

The Advanced Recovery Commercial module provides an easy to configure replication engine along with the ability to automatically fail over to a secondary server; this mechanism protects voice services when there is failure in primary server.

This module is applicable to PBX 15+ systems only.

If the backup server cannot be active at the same time as the primary this module will help to ensure that the Secondary server will come up quickly in the event the Primary server goes down to insure less downtime.

Below are the key features of this modules - 

  1. Easy PBX GUI interface for configuration -  in just a few steps all your current (primary) system configuration will be ready to replicate to the secondary server.
  2. Optional Automatic switchover - As soon as primary service(s) die automatically switching over to secondary system. 
  3. Actively monitoring primary server services like Network interface, Asterisk , MySQL or PBX stack so as soon as any of the services dies , switchover to secondary server will happen.
  4. Built-in notification mechanism to provide Call and Email both notification to admin during the fail over.
  5. Provides more control over the trunks to decide what all trunks should gets activated automatically after switchover.
  6. Provides configuration option to control primary server services after the switchover. This is needed for the scenario where secondary becomes active because of loosing connection with primary due to some minor fluctuation issues like power or network or any other quick maintenance so in those situations, primary will comes back after sometime , so its better to stop primary server services to avoid endpoints conflicts where some endpoints can register to primary and some can register to secondary. With the help of this option , secondary will stop the primary server services as soon as primary comes up.
  7. Auto switchover custom Hooks. Provides option to execute custom hooks or third party script during switchover.  This is useful when we want to trigger some custom or third party application related logic during the switchover.
  8. Integrated support for Endpoint Manager module to re-build the Sangoma's S and D series phones template to update secondary (fail over) server IP to phones.


Details of each feature and how it can be configured is defined below.



Prerequisites

In order to successfully deploy the Advanced Recovery module the following requirements must be met:

  1. You have an existing PBX system that will be your Primary server.

  2. You have an identical PBX system that will be your Secondary server.
  3. The two servers can communicate on an IP level.
    1. Both systems are configured with their own IP addresses.
    2. Both the systems IP network, can be a local network or can be on separate geographically location.
    3. SSH and HTTP(s) ports should be open between both the servers.
  4. Both servers are running the Advanced Recovery module, and it is "licensed" on each system.
  5. Backup & RestoreAPI and Filestore are dependent modules that must be installed on both the systems.
  6. ICMP/Ping between Primary and Secondary is required.

Setup Configuration

Advanced Recovery module can be configured by following below mentioned simple steps.

  1. Establish SSH connection between both the servers. 
    1. SSH to secondary server from primary without needing password.
    2. SSH to primary server from secondary without needing password.

  2. Configure Advanced Recovery module using "Quick configuration" wizard.

Establish SSH connection between both the servers

We need to establish SSH connection between both the servers.  

Please find below instructions to copy primary server SSH key to secondary so that we can easily SSH to secondary server from primary without needing password.

  1. Login to your Primary server with an SSH client such as PuTTySecureCRT, or other SSH client. 

  2. At the primary server Linux CLI prompt type: sudo -u asterisk ssh-copy-id -i  /home/asterisk/.ssh/id_rsa.pub [email protected] and enter the password when prompted. 

    Make sure you replace the SecondaryServerIP with the IP Address of your Secondary PBX. (use IP and not a hostname that may be common to both primary and warmspare; if fqdns are desired create 3 records the common name , specific name for primary and specific name for warm spare - ie mypbx.company.com , mypbx1.company.com , mypbx2.company.com)

    If the Firewall is configured, pay attention to creating the right rule allowing the two servers to talk to each other.


    3.  If above command completes without error, you are ready to test:

    At the prompt type: ssh -i /home/asterisk/.ssh/id_rsa [email protected]

     If all went well, you should now be logged in to the Secondary server.

    Please follow above mentioned same steps to copy secondary server SSH key to primary so that we can easily SSH to primary server from secondary without needing password.

Install Advanced Recovery module

Download and install the "Advanced Recovery" module by following "Check Online" and then download install guide as described in  Module Admin User Guide#CheckingforAvailableUpgrades wiki.

Advanced Recovery Module Configuration


How to Open the Advanced Recovery Module Settings 

Within the PBX GUI, navigate to Admin > Advanced Recovery


How to Configure the Advanced Recovery Module  


The main landing page of the Advanced Recovery module has an options to view system status, perform configuration changes, and adjust global settings(like SSH keys).

'Quick Configuration' option will display only for the first time when system is not configured.

Once system is configured then this option will not be visible and we need to use "Configuration → Primary (or secondary) Server " option to do any further configuration modification.



  

Quick Configuration Wizard 


Quick  Configure wizard will provide easy GUI interface to configure Advanced Recovery module.
This quick configuration wizard will take care of configuring primary and secondary server by himself so after this we do not have to do any further configuration.
When you click on "Quick configuration" button then it will pop up wizard as shown below -


   

Step-1 Server Configuration

Here, We have to specify the "Secondary" Server IP.
Click  "Next" , after select "Secondary Server" instance, 

If there are any issue in doing ssh to secondary server then this will throw the alert.


  

  

If ssh connection to secondary server is good then it will check whether proper licensed "Advanced Recovery" module is installed on secondary server or not.
If AR module is not installed on secondary system, then it will throw error like shown below.

   


If module is installed but not "licensed" on secondary system, then it will throw error like shown below.

   


If secondary has proper active licensed Advanced Recovery module, then it will proceed further with Step-2 > Sync.

Step-2 Sync

To define syncing frequency.
Syncing can take from minutes to hours depends on system size(capacity) and additional files/directories that might have been added into the module configuration to be synced. 

Syncing process might be CPU intensive depends on your system capacity so it's recommended to do syncing during "Off hours". Syncing frequency should be configured more wisely.



Step-3 Settings

This section allows you to do necessary configurations on the Advanced Recovery module required for doing replication of configuration to secondary server.


   


Please find below details of each configuration options in this step.

Auto Switch services 

Auto switching the services to secondary server when the threshold time is met.

Disable Remote Trunks 
Should the trunks be disabled on secondary server after replicating/restoring trunks configuration?. This is needed when we want trunks to register from both primary and secondary servers at the same time. Generally we would try to keep trunks active only from one server so this option should set to YES. Default is YES.

Exclude NAT Settings 
Should NAT settings from the Primary Server be restored to the Secondary Server?

Exclude Bind Address
Should Bind Address settings on the Primary Server be restored to the Secondary Server ?


Exclude DNS 
Should DNS settings on the Primary Server be restored to the Secondary Server ?


Apply Config 
Should we run "Apply Configs" on the Secondary Server after a restore is completed?


Once done with above configuration, then move on to the next step to do "Notifications" configuration.

Step-4 Notification

This section will allows you to do "Notification" configuration. 

Advanced Recovery module gives the option to receive notifications either via calling to an admin extension or via Email.


  


By default, Call Notification is disabled, if enabled then further options to configure will be shown as follow:





The parameters to configure for Notification section are: 

  • Notification Extension: which extension to call during failover event. On system failure event, active system will initiate call to configured extension and will play the configured announcement. Intention of this call notification is to update admin about the system failure
  • Recording when primary fails : select recording to play when the Primary server fails. This will specify the list of "recordings" to choose from as configured in System Recording module.
  • Recording when standby fails: select recording to play when the standby/Warm Spare server fails. This will specify the list of "recordings" to choose from as configured in System Recording module.
  • Notification Email: email address where notifications will be sent to.


Server Failure Notification Frequency

The Advanced Recovery module will generate a notification as soon as any failure event detection happens.


Once done with configuration, press "Configure" to finish the configuration of Advanced Recovery Module.

This will finish the "Quick configuration" part of Advanced Recovery module.  If any further modifications of the configuration are needed then please refer to Advanced Recovery Expert Configuration wiki.

We need to start "Advanced Recovery Service" daemon as soon as we done with "Quick configuration" process as described in below section.

Advanced Recovery service daemon


This service daemon is mainly responsible for keep monitoring the health of the primary system and on the event of failure, this will execute the necessary steps to perform switchover to the secondary server.
After completion of "Quick configuration" wizard, we can see status of the Primary and Secondary would be something like below. 

Advanced recovery service needs to be started only on Primary server. Service on Secondary Server will start automatically.


Primary Server


Secondary Server

As shown below, dashboard shows configuration is done but service has not yet started. Next step is to "Start' the service from Primary.


Advanced Recovery Dashboard

Dashboard provides the information about service status and last sync time. 

We can also use "Sync now" option to forcefully sync the configuration to secondary system.



Advanced Recovery Sync Now 

Sync now option is to do manual configuration syncing to secondary server. 
This could be useful for user to confirm syncing is working fine as soon as initial configuration is over and also to know that how much time sync could take for the PBX system.

As soon as we click on "sync now" option, a confirmation dialog will pop up asking to confirm, and then we will start seeing the status of the process as shown below:


If the confirmation dialog box does not pop up when clicking the 'Sync Now' button, check the browser for a popup/ad blocker.

Syncing might take minutes to hours depends on system capacity. Please keep refreshing the page periodically (in minutes interval) to get the process latest status.



Dashboard will display "Time since last sync" to reflect when the last sync happened from primary to secondary server.

As soon as the syncing process finish, it will display "Time taken to finish last sync" in HH:MM:SS format which will give a rough estimate of how much time the system can take to sync the configuration. 

If require, change the "Syncing scheduling" frequency using "Advanced configuration" option.


Modules and Call Recording syncing 

On primary server settings page you have the option to add modules to be added into the sync process. By default all modules are selected and included in syncing process you can unselect the modules which are to be excluded from syncing process.

Also you have the option to add custom directories to be added into the sync process. The files/folders selected on these fields will be on the Secondary Server once the sync is completed and on the same locations.

By default we are adding the Voicemails folder as directory item and this folder will be syncing in incremental basis.

  

Switchover Configuration

Advanced Recovery module provides configuration options to decide the various actions during switchover.
All the Switchover related configuration is part of the Advanced Configuration. 

We can jump to Advanced Configuration by going to "Advanced Recovery Module → Configuration → Primary Server' as shown in below screenshot.




Trunk Selection Configuration option

As soon as we enable the "Auto switch services", it will show list of currently configured trunks in the system.
We can select our desired state of the trunks after a switchover for every configured trunk:


Bring down Primary server after switchover configuration option

Execute 'fwconsole stop' on the primary server after a failover.

This option could be useful in the scenario where due to some partial outages like network or power fluctuation, Primary server looses communication with Secondary. In that given situation Secondary server will become active but after a period of time primary also comes up which will lead to situations where some phones might try to register to the Primary server and while some of them will try with Secondary server. 
To avoid this kind of situation, admin can choose to bring down the primary server after switchover or not.

If this option is set to YES then the Advanced Recovery module will keep on checking the configured Primary server to see if it comes up and will bring down all the services on the primary when that happen.

Post Switchover Hook

This is for advanced users who would like to perform some special steps after switchover. 
Please specify the custom script path to execute after switchover.


APPLICATION NOTE

Make sure script has execute permissions for the Asterisk user.



Advanced Configuration 

Once Quick configuration wizard is over then any further configuration or change must be done on the 'Advanced Configuration' page. Changes like changing GraphQL API tokens, modifying the Primary/Secondary server IP address, etc. we will need to use "Advanced configuration" as mentioned in Advanced Recovery Expert Configuration 

Switchover 

Advanced Recovery module decides Primary is down by detecting at least one of the following conditions:

  1. Network connectivity is down on Primary server - Secondary server lost communication with primary server
  2. Asterisk running status on Primary server
  3. FreePBX stack running status on Primary server
  4. Database running status on Primary server


Switchover to secondary server will happen as soon it detects any of the failure condition as mentioned above and the threshold time has been reached.

Advanced Recovery modules will perform following actions during switchover (in order):

  1. Switchover related actions as configured in SwitchoverConfiguration
    1. Enable the Trunks on secondary as configured in TrunkSelectionConfigurationoption
    2. Execute post switchover hooks to run custom third party script with an "START" argument. 
  2. Notify to admin via Call to admin extension if Call Notification is enable.
  3. Notify to admin via Email  



Failover recommendation
s

The Advanced Recovery module will be beneficial during outages by automatically switching services over to a Secondary server when a failure is detected on Primary server. However, it is critical to understand and be aware that there are other network elements such as IP/SIP Phones, SIP Trunking, routers, etc. that need to be configured properly to ensure they start working smoothly after services are switched over. 


SIP Phones Recommendation

Regenerate existing Sangoma's phone configuration 

Advanced recovery module has an option to regenerate the configuration of already connected/configured Sangoma's S and D series phones via Endpoint Manager.

"Advanced Recovery → Endpoint → Regenerate EPM config for S and D series phones" 

This option will add the 'Secondary Server' IP address parameter into the selected template as 'Backup SIP Server'. The option 'Update Phones' may also be selected to force all the phone under the template to pull a new configuration from the server.



Manually editing templates for Sangoma's S series phones

The 'Regenerate EPM Config for S and D series phones' mentioned above will take care of the Sangoma templates's configs for the backup server. Phone configs for any other Brand will have to be done manually by editing the related template.

Sangoma S and D series phones support the configuration of a "Failover" IP along with the Primary IP. 
The Endpoint Manager module, which is "Free" to use for Sangoma's S and D series phones, can be used to help configure this setup. 

Please refer to Connecting Sangoma Phone to FreePBX or PBXact Indepth for detailed guide of using Endpoint Manager for Sangoma's S series phones.

We have to "enable" Backup destination field and Secondary server information in below template to achieve the failover in case of primary server failure.

Application Note

If Advanced Recovery option 'Regenerate EPM Config for S and D series phones module' was executed then the associated template(s) will be pre-populated with Secondary server IP address.






Manually editing templates for Sangoma's D series phones

The backup destination address is added in the D/P Series phone template, in the Redundancy tab. (EPM → Sangoma  → D & P series phones)
Please refer to EPM-Admin User Guide#AdminUserGuide-templatesTemplateCreationandEditing(ExamplewithSangomaBrand) guide to see example of how to edit templates via EPM.




SIP Trunk recommendation

This is recommended to ensure SIP Trunk provider allows registration requests from both the Primary and Secondary server's IP.  
During the event of a failure when secondary server will become active then SIP Trunk provider should be able to accept the registration request from secondary server to bring up the SIP traffic. This does not apply if both Primary and Secondary servers are behind the same Public IP since both servers will register from the same source IP.

IT admin recommendation

It is advisable to IT persons or PBX's administrators to take care of below roles and responsibilities:

  1. Any networking changes required in order to bring up the secondary server (like router's port forwarding in case of NATing environment to make sure):
    1. Secondary server registration messages are reaching the SIP Trunking provider.
    2. Messages from the SIP Trunk need to reach the secondary server as they would do on the primary.
    3. Phones are able to register with the secondary server. 

  2. Make sure Primary and Secondary server IPs are not changing and if they are changing, we need to make sure GraphQL configuration (only server URL) are updating accordingly because both the servers are talking to each other using GraphQL API URIs.
    IP changes might result in false declaration of "server down" event.

  3. Make sure both Primary and Secondary servers are accessible to each other. The Firewall module will need to have both the servers IP whitelisted accordingly.

  4. Make sure SSH connectivity between both the servers.

  5. Keep in mind that on latest Recording Report module (v15.0.4.28+) Call Recording files will not be the part of "full system backup" so make sure the call recordings directory is included in the Primary Server - Advanced Recovery settings (as default they are located in /var/spool/asterisk/monitor/)

Switchback to Primary server 

Advanced Recovery module is mainly designed to do easy failover to Secondary system on the event of Primary server failure.
Once primary server is back up and running then its recommended to switch back services to primary to ensure any subsequent disruption in the future will not affect the phone system's availability.


During the switchover scenario we need to follow the below mentioned steps:

  1. Login to Secondary server administrative GUI which is active as of now.
    1. Stop the Advanced recovery daemon to avoid getting notifications from Freepbx/PBXact GUI → Advanced Recovery option.
  2. Repair Primary server (if possible) or bring up new a Primary server by a fresh installation of FreePBX/PBXact.
  3. Once Primary server is ready then follow steps as mentioned in "Sync back to primary" to sync the data from secondary to primary.
  4. Once syncing over, switchback to primary server so that the primary server will become the active node and secondary will become the standby server.

Sync back to Primary 

This option will be useful when we want to bring up the Primary server which could be either the same server or new server/installation.

When secondary server is running as active, "dashboard" status on secondary server will show Primary server is down and an option to Sync back to Primary.



Application Note

Need to ensure that SSH connectivity to primary server is configured properly. Refer to Setup-SSHConfiguration


"Sync back to Primary" option will open the below wizard and will ask to enter Primary server IP. On recent versions the Primary server's IP address will be automatically populated from the existing config.


After entering Primary server IP, you will have two options: Sync or Skip Sync


1. Sync: this Option will sync the Secondary server data to the Primary server (overwriting any existing configuration on the primary)

2. Skip Sync: this option will skip the Sync and jump to the Switch back page




As a part of syncing, data from secondary server will push to primary server IP.  


Once syncing to Primary server has finished, the 3rd step will give you the option to do "Switch back" which is the process of reverting the status of the trunks on the secondary server (disabling them) and turning back on the trunks on the Primary. This process will also update the status for the Advanced Recovery module so the Primary server will be the new (again) active server and the health check will be from the Primary → Secondary server.



High Level use case scenario using Advanced Recovery module


Advanced Recovery module will help to maintain the below fail over scenario where Secondary server will take over the production when there is a critical failure in Primary server, as illustrated below:



Frequently Asked Questions 


(question) Do we need a floating IP now like in the old HA setup? 
 A:  No. A floating IP is not a requirement with this module. Each server has its own IP address. SSH communication must be open between both the servers.

(question) Do both servers have identical configurations, except that on the standby server the trunks are disabled to avoid registrations coming from two machines simultaneously.
 A:  Yes. This module provides more granularity to control the trunks either during syncing or after switchover.  "Disable Remote Trunks" option will take care of trunks status during normal primary and secondary system and "Swicthover Trunk selection" option will take what should be the trunk status (like want to enable or disable) after switchover.

(question) Do non Sangoma phones need to be configured to register to the backup server address if registration to primary is unsuccessful.
 A: Yes we need to manually set the Fall back destination or sip server address to SIP phones so during the primary system failure time, phone can get register to secondary system.

(question) Are services like Asterisk, FreePBX, etc. all running on both machines no matter whether in standby or active ?
 A: Yes. During a normal state, all services will be fully running on both servers: Active and StandBy server.

(question) How does the monitoring happen?
 A:  Primary server and Secondary server monitor each other and send notifications as soon as failure of peer node happens. If the primary goes down then switchover happens. If the Secondary goes down, only a notification will be sent to keep the administrator aware of the StandBy server failure.

(question) What does the “Bring down Primary server after switchover” option mean?
  A: After switchover, if this option is enabled then the Secondary will keep on monitoring the primary server IP and if Primary server comes back up then will perform a 'fwconsole stop' on primary which basically means stopping all running services like asterisk and any other FreePBX processes. This is required when we want only one node to be active to avoid:

  • Split registration scenarios where some phones can register to primary and some to secondary.
  • Sending SIP trunks registration request from both servers using the same account/credentials/auths.

 We could have situations in which due to some network or power fluctuations it will result in lost of the communication between servers and due to which, switchover happens. 

(question) I have two FreePBX servers, each have two NICs. One will be providing regular access to the network to which SIP will bind to, and another NIC to directly connect to the other FreePBX server. How can I configure Advanced Recovery module to use dedicated NIC or LAN interface for monitoring - syncing purposes between the servers ?
 A:  As long as the two servers are able to communicate at IP level with each other then Advanced Recovery module will work fine. Each NIC will have its own IP so you just have to make sure "direct" link between servers is configured in such a way that communication between both the servers over dedicated NIC interface is happening as expected.

For example Server A NIC with IP x.y.z.w and Server B NIC with IP a.b.c.d and there is direct link between both servers. We have to take care of following pre-conditions during the configuration:

  • Routing or communication between x.y.z.w and a.b.c.d at IP level is working. 
  • SSH and HTTP(s) traffic is working over those IPs. 
  • Whitelist these IPs in both servers to ensure firewall is not blocking access between each other.
  • SSH key setup between both the servers to ensure they can SSH to each other using these IPs.
  • Use this NIC IP in Advanced module configuration. x.y.z.w as Primary's server IP and a.b.c.d as Secondary's server IP.
  • When we use Endpoint Manager to re-build failover IP configuration, ensure that you are choosing the correct SIP server IP. If the template is for local phones, x.y.z.w and a.c.b.d will be primary and backup IPs accordingly. If the template that we are editing is to config remote phones/endpoints, make sure you put in the Public IP addresses for primary and secondary servers and not the private IPs (x.y.z.w and a.c.b.d). Same needs to be done while doing manual configuration for non-Sangoma brands as well.







  • No labels