Due to various kernel changes, this upgrade process may result in an unexpected restart of Asterisk. There will also be a short outage as you move the services between nodes. Please be prepared for this, and schedule an outage window as necessary.
Requirements
FreePBX 13
Upgrading is only possible when you are running FreePBX 13. Before attempting to upgrade to Distro 6.6, ensure you are running the latest FreePBX 13 version, and associated modules
Outage Window
Whilst there is no danger of data loss, both nodes will require a reboot. This means that there will need to be an outage window where you can swap nodes.
Overview
Before you begin, you must ensure your cluster is in maintenance mode! When your cluster is in maintenance mode, running the command 'pcs status' will result in all the resources having the suffix "(unmanaged)". More information is on the page Setting the cluster to maintenance mode
The cluster is upgraded in an A-then-B fashion. This means that the cluster infrastructure on the active node is upgraded first, and then the standby node.
Please read the entire page!
There is a lot of information on this page, and you should take care to follow the script below. There are some potential errors that you may encounter as you proceed, so please take care to follow the script precisely.
The overview of the tasks are as follows
- Run a cluster health check
- Disable Fencing
- Put the node that is not processing calls explicitly into standby mode
- Put the entire cluster into maintenance mode
- Upgrade the cluster software on the active node
- Start pacemaker service on active node
- Take the cluster out of maintenance mode
- Upgrade the cluster software on the standby node
- Reboot the standby node
- Ensure replication is valid
- Verify cluster integrity
- Reboot the active node to validate failover
Estimated Timeframes
An upgrade of both nodes should not take more than an hour.
Outage Windows
There should be two planned outage windows.
- This is a potential outage, when returning the cluster from maintenance mode. This failure hasn't been duplicated, but is theoretically possible if asterisk returns an unusual error. Ensuring that Asterisk is up and running, and processing calls, before returning the cluster from maintenance mode will remove this possibility.
- The primary planned outage window should be approximately 5 minutes, or, however long it takes Asterisk to start up. This is when all the services are failed over from the active node to the standby node, before rebooting the previously active node.
Before you begin
Before making any changes, run a cluster health check. If there are any errors, they will be fixed, or, you will be given instructions on how to fix it.
Upgrade Process
First disable fencing, if enabled using
[[email protected] ~]# pcs property set stonith-enabled=false
And verify that fencing is disabled with:
[[email protected] ~]# pcs property Cluster Properties: cluster-infrastructure: cman cluster-recheck-interval: 5m dc-version: 1.1.11-97629de default-action-timeout: 30 last-lrm-refresh: 1470610819 maintenance-mode: false no-quorum-policy: ignore stonith-enabled: false <--- here
You can re-enable Fencing at any time, just by browsing to the Fencing menu item in the High Availability module so do not accidentally re-enable fencing until finished.
When you have organised your outage window, ensure that at least one node is in standby (We'll be assuming your currently active machine is FreePBX-A):
[[email protected] ~]# pcs cluster standby freepbx-b [[email protected] ~]#
If any services were running on freepbx-b, they will be moved across to freepbx-a (or vice versa, if you are running services on -b, and are setting -a to standby). Verify that this is complete by running 'pcs status'. All services should say 'Started freepbx-a'. After you have verified this, you must put the cluster into maintenance mode.
[[email protected] ~]# pcs property set maintenance-mode=true [[email protected] ~]#
After you issue this command, the cluster immediately stops managing and monitoring resources. Nothing will be restarted or moved if it fails. (Note that we remove this setting after upgrading the active node, please don't forget to do that!).
It is imperative that you put the cluster into maintenance mode! Failure to do so will lead to extremely difficult to resolve failures, and may cause an extended outage. If you are unsure of how to verify this, please read FreePBX HA-Setting the cluster to maintenance mode.
You start by upgrading the distro on the node that is processing calls. This does NOT automatically restart the Asterisk process, it only restarts the Cluster Management software. (If your cluster is not in maintenance mode, the secondary node will attempt to take over the cluster services! Read the previous paragraph about maintenance mode!).
This will take, normally, about 20 minutes.This process will not cause an outage.
This may hang!
It's possible that the 'Cleanup' part of the upgrade may hang. This is due to a (fixed) bug in the Cluster services. If you system seems to not be proceeding in the 'Cleanup' phase, please read the Stalled Upgrade information
[[email protected] ~]# curl https://upgrades.freepbxdistro.org/stable/10.13.66/upgrade-10.13.66-1.sh | bash % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 16551 100 16551 0 0 8026 0 0:00:02 0:00:02 --:--:-- 34991 Check to make sure this is a FreePBX Distro system before executing This appears to be a FreePBX Distro system as it has a Distro Version of 6.12.65-30 Your FreePBX Distro System is being upgraded to 10.13.66-1. Please standby... STAGE 1 STARTING - GUI Modules Upgrade All FreePBX GUI Modules ... many lines of text skipped ...
Your upgrade may hang! It is possible that your upgrade may hang on or around this point:
Cleanup : libtonezone-devel-2.10.0.1-1.shmz65.1.17.x86_64 262/478 Cleanup : bfa-firmware-3.2.21.1-2.el6.noarch 263/478 Cleanup : asterisk-version-switch-2.0.0.0-10.shmz65.1.17.noarch 264/478 Cleanup : dracut-kernel-004-336.el6_5.2.noarch 265/478 Cleanup : dracut-004-336.el6_5.2.noarch 266/478 Cleanup : iSymphonyServerV3-fpbx-3.2.1.11-1.noarch 267/478
This is due to a bug that is fixed in the latest version of Pacemaker. Please read the Stalled Upgrade page to unblock it.
When it is finished you will see something like this
STAGE 4 COMPLETED - Clean Up - Moving to Stage 5 STAGE 5 STARTING - Final Verifications STAGE 5 COMPLETED - Final Verifications - Moving to Stage 6 Wed Dec 9 15:13:58 AEST 2015 UPGRADE 100% COMPLETED Error: unable to get crm_config, is pacemaker running? Unlocked. REBOOT YOUR BOX NOW If you would like to change your Asterisk release version from 1.8, 10, 11 or 12 you can do so by typing asterisk-version-switch from the linux CLI at anytime [[email protected] ~]# [[email protected] ~]#
DO NOT RESTART YOUR MACHINE! At this point, the cluster services are not running on this machine, and need to be restarted. You can verify this with the following commands:
[[email protected] ~]# pcs status Error: cluster is not currently running on this node [[email protected] ~]# service pacemaker start Starting cluster: Checking if cluster has been disabled at boot... [ OK ] Checking Network Manager... [ OK ] Global setup... [ OK ] Loading kernel modules... [ OK ] Mounting configfs... [ OK ] Starting cman... [ OK ] Waiting for quorum... [ OK ] Starting fenced... [ OK ] Starting dlm_controld... [ OK ] Tuning DLM kernel config... [ OK ] Starting gfs_controld... [ OK ] Unfencing self... [ OK ] Joining fence domain... [ OK ] Starting Pacemaker Cluster Manager [ OK ] [[email protected] ~]# pcs status Cluster name: freepbx-ha Last updated: Wed Dec 9 15:14:35 2015 Last change: Wed Dec 9 14:44:27 2015 via cibadmin on freepbx-a Stack: cman Current DC: freepbx-b - partition with quorum Version: 1.1.10-14.el6-368c726 2 Nodes configured 20 Resources configured ( .. many configuration entries skipped .. )
You will notice that everything should look exactly the same as it did prior to the upgrade. At this point you can now take the cluster out of maintenance mode.
WARNING! It is at this point that the cluster may determine that asterisk needs to be restarted.
There are cases where an upgrade does not create /usr/sbin/fwconsole symlink, so at this point you want to confirm that with:
# which fwconsole /usr/bin/which: no fwconsole in (/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin) # amportal chown # which fwconsole /usr/sbin/fwconsole
running "amportal chown" will ensure the symlink is created in /usr/sbin/
You can take the cluster out of maintenance mode now:
[[email protected] ~]# pcs property set maintenance-mode=false [[email protected] ~]#
Under NO CIRCUMSTANCES should you take the other node out of standby at this point!
At this time, you can run 'pcs status' and all the services will appear to be running and valid. However, due to version changes, it is not possible for the standby node to take control of the cluster services, and if it attempts to do so, will cause a catastrophic failure.
Finally, because the upgrade script thinks it failed, you should manually update the version number on this machine
[[email protected] ~]# echo 10.13.66-1 > /etc/schmooze/pbx-version [[email protected] ~]#
This ensures that the upgrade system knows which track you are on.
Switch to the other node
You must now proceed to upgrading the standby node. This upgrade will take slightly longer than the upgrade on the active node. An average system should complete the upgrade in around 25 minutes. Note that a number of errors and warnings about Asterisk will be shown as part of the upgrade process This is of no concern, and should be expected.
This may hang!
It's possible that the 'Cleanup' part of the upgrade may hang. This is due to a (fixed) bug in the Cluster services. If you system seems to not be proceeding in the 'Cleanup' phase, please read the Stalled Upgrade information
[[email protected] ~]# curl https://upgrades.freepbxdistro.org/stable/10.13.66/upgrade-10.13.66-1.sh | bash % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 16551 100 16551 0 0 6759 0 0:00:02 0:00:02 --:--:-- 34267 Unable to connect to remote asterisk (does /var/run/asterisk/asterisk.ctl exist?) Check to make sure this is a FreePBX Distro system before executing This appears to be a FreePBX Distro system as it has a Distro Version of 6.12.65-30 Your FreePBX Distro System is being upgraded to 10.13.66-1. Please standby... STAGE 1 STARTING - GUI Modules Upgrade All FreePBX GUI Modules Fetching settings from amportal.conf file.. /etc/amportal.conf: line 792: reg: command not found /etc/amportal.conf: line 1009: http://feeds.feedburner.com/InsideTheAsterisk: No such file or directory FATAL: can not find freepbx_engine to start Asterisk sudo: /var/lib/asterisk/bin/module_admin: command not found sudo: /var/lib/asterisk/bin/retrieve_conf: command not found sudo: /var/lib/asterisk/bin/module_admin: command not found sudo: /var/lib/asterisk/bin/module_admin: command not found sudo: /var/lib/asterisk/bin/retrieve_conf: command not found sudo: /var/lib/asterisk/bin/module_admin: command not found ( .. large number of lines omitted .. )
Your upgrade may hang! It is possible that your upgrade may hang on or around this point:
Cleanup : libtonezone-devel-2.10.0.1-1.shmz65.1.17.x86_64 262/478 Cleanup : bfa-firmware-3.2.21.1-2.el6.noarch 263/478 Cleanup : asterisk-version-switch-2.0.0.0-10.shmz65.1.17.noarch 264/478 Cleanup : dracut-kernel-004-336.el6_5.2.noarch 265/478 Cleanup : dracut-004-336.el6_5.2.noarch 266/478 Cleanup : iSymphonyServerV3-fpbx-3.2.1.11-1.noarch 267/478
This is due to a bug that is fixed in the latest version of Pacemaker. Please read the Stalled Upgrade page to unblock it.
When the upgrade of the standby node is completed, it will appear as if it has encountered a fatal error. This is normal, as this machine is not in control of the cluster.
Moving to Next Step Update to FreePBX 13 bash: line 226: /var/lib/asterisk/bin/freepbx_setting: No such file or directory bash: line 229: [: : integer expression expected bash: line 239: [: : integer expression expected update all FreePBX 13 modules now bash: line 252: /var/lib/asterisk/bin/freepbx_setting: No such file or directory unlink: cannot unlink `/usr/local/sbin/fwconsole': No such file or directory Fetching settings from amportal.conf file.. /etc/amportal.conf: line 792: reg: command not found /etc/amportal.conf: line 1009: http://feeds.feedburner.com/InsideTheAsterisk: No such file or directory FATAL: can not find freepbx_engine to start Asterisk An error has occurred updating to FreePBX 13 [[email protected] ~]#
Because the upgrade script thinks it failed, you should manually update the version number on this machine
[[email protected] ~]# echo 10.13.66-1 > /etc/schmooze/pbx-version [[email protected] ~]#
When you see these errors, and after you update the version, you now must reboot the standby node.
[[email protected] ~]# reboot Broadcast message from [email protected] (/dev/pts/0) at 16:11 ... The system is going down for reboot NOW! [[email protected] ~]#
When the standby node has rebooted, you can now take it out of standby mode and verify that it has rejoined the cluster successfully.
[[email protected] ~]# pcs cluster unstandby freepbx-b (Wait a second here for the changes to propagate) [[email protected] ~]# pcs status Cluster name: freepbx-ha Last updated: Wed Dec 9 16:14:58 2015 Last change: Wed Dec 9 16:14:54 2015 via cibadmin on freepbx-a Stack: cman Current DC: freepbx-a - partition with quorum Version: 1.1.11-97629de 2 Nodes configured 20 Resources configured Online: [ freepbx-a freepbx-b ] Full list of resources: spare_ip (ocf::heartbeat:IPaddr2): Started freepbx-a floating_ip (ocf::heartbeat:IPaddr2): Started freepbx-a Master/Slave Set: ms-asterisk [drbd_asterisk] Masters: [ freepbx-a ] Slaves: [ freepbx-b ] Master/Slave Set: ms-mysql [drbd_mysql] Masters: [ freepbx-a ] Slaves: [ freepbx-b ] Master/Slave Set: ms-httpd [drbd_httpd] Masters: [ freepbx-a ] Slaves: [ freepbx-b ] Master/Slave Set: ms-spare [drbd_spare] Masters: [ freepbx-a ] Slaves: [ freepbx-b ] ( .. resource configuration omitted .. )
You should now validate the cluster configuration in FreePBX HA again. If any errors are detected, it will fix them, or, tell you how to fix them if it can't fix it itself.
If all tests pass, you should now set the currently active node to standby, in preparation for rebooting it.
WARNING: THIS WILL CAUSE AN OUTAGE.
Simply click on the 'Standby' button in FreePBX HA. This will move all the services across to the other node.
When the services have moved across, run a cluster check AGAIN. This ensures that all software is up to date on both machines.
You can now reboot the original node, and when it's rebooted return it from standby mode. Your cluster version upgrade is now complete. Any further upgrades can be performed through the System Admin module, as per normal.
Re-enable Fencing if necessary by browsing to the High Availability module, and clicking the Fencing menu item. Verify that fencing is enabled with:
[[email protected] ~]# pcs property Cluster Properties: cluster-infrastructure: cman cluster-recheck-interval: 5m dc-version: 1.1.11-97629de default-action-timeout: 30 last-lrm-refresh: 1470610819 maintenance-mode: false no-quorum-policy: ignore stonith-enabled: true <--- here