Page tree
Skip to end of metadata
Go to start of metadata

Due to various kernel changes, this upgrade process may result in an unexpected restart of Asterisk. There will also be a short outage as you move the services between nodes. Please be prepared for this, and schedule an outage window as necessary.

Requirements

FreePBX 13

Upgrading is only possible when you are running FreePBX 13.  Before attempting to upgrade to Distro 6.6, ensure you are running the latest FreePBX 13 version, and associated modules

Outage Window

Whilst there is no danger of data loss, both nodes will require a reboot. This means that there will need to be an outage window where you can swap nodes. 

Overview

Before you begin, you must ensure your cluster is in maintenance mode! When your cluster is in maintenance mode, running the command 'pcs status' will result in all the resources having the suffix "(unmanaged)". More information is on the page Setting the cluster to maintenance mode 

The cluster is upgraded in an A-then-B fashion. This means that the cluster infrastructure on the active node is upgraded first, and then the standby node.

Please read the entire page!

There is a lot of information on this page, and you should take care to follow the script below. There are some potential errors that you may encounter as you proceed, so please take care to follow the script precisely.

 

The overview of the tasks are as follows

  1. Run a cluster health check
  2. Disable Fencing
  3. Put the node that is not processing calls explicitly into standby mode
  4. Put the entire cluster into maintenance mode
  5. Upgrade the cluster software on the active node
  6. Start pacemaker service on active node
  7. Take the cluster out of maintenance mode
  8. Upgrade the cluster software on the standby node
  9. Reboot the standby node
  10. Ensure replication is valid
  11. Verify cluster integrity
  12. Reboot the active node to validate failover

Estimated Timeframes

An upgrade of both nodes should not take more than an hour.

Outage Windows

There should be two planned outage windows.

  1. This is a potential outage, when returning the cluster from maintenance mode. This failure hasn't been duplicated, but is theoretically possible if asterisk returns an unusual error. Ensuring that Asterisk is up and running, and processing calls, before returning the cluster from maintenance mode will remove this possibility.  
  2. The primary planned outage window should be approximately 5 minutes, or, however long it takes Asterisk to start up. This is when all the services are failed over from the active node to the standby node, before rebooting the previously active node.

Before you begin

Before making any changes, run a cluster health check. If there are any errors, they will be fixed, or, you will be given instructions on how to fix it.

Upgrade Process

First disable fencing, if enabled using

[[email protected] ~]# pcs property set stonith-enabled=false

And verify that fencing is disabled with:

[[email protected] ~]# pcs property
Cluster Properties:
 cluster-infrastructure: cman
 cluster-recheck-interval: 5m
 dc-version: 1.1.11-97629de
 default-action-timeout: 30
 last-lrm-refresh: 1470610819
 maintenance-mode: false
 no-quorum-policy: ignore
 stonith-enabled: false   <--- here

You can re-enable Fencing at any time, just by browsing to the Fencing menu item in the High Availability module so do not accidentally re-enable fencing until finished.

When you have organised your outage window, ensure that at least one node is in standby (We'll be assuming your currently active machine is FreePBX-A):

[[email protected] ~]# pcs cluster standby freepbx-b
[[email protected] ~]#

If any services were running on freepbx-b, they will be moved across to freepbx-a (or vice versa, if you are running services on -b, and are setting -a to standby).  Verify that this is complete by running 'pcs status'. All services should say 'Started freepbx-a'.  After you have verified this, you must put the cluster into maintenance mode.

[[email protected] ~]# pcs property set maintenance-mode=true
[[email protected] ~]#

After you issue this command, the cluster immediately stops managing and monitoring resources.  Nothing will be restarted or moved if it fails.  (Note that we remove this setting after upgrading the active node, please don't forget to do that!).

It is imperative that you put the cluster into maintenance mode! Failure to do so will lead to extremely difficult to resolve failures, and may cause an extended outage. If you are unsure of how to verify this, please read FreePBX HA-Setting the cluster to maintenance mode.

You start by upgrading the distro on the node that is processing calls. This does NOT automatically restart the Asterisk process, it only restarts the Cluster Management software. (If your cluster is not in maintenance mode, the secondary node will attempt to take over the cluster services! Read the previous paragraph about maintenance mode!). 

This will take, normally, about 20 minutes.This process will not cause an outage.  

This may hang!

It's possible that the 'Cleanup' part of the upgrade may hang. This is due to a (fixed) bug in the Cluster services. If you system seems to not be proceeding in the 'Cleanup' phase, please read the Stalled Upgrade information


[[email protected] ~]# curl https://upgrades.freepbxdistro.org/stable/10.13.66/upgrade-10.13.66-1.sh | bash
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 16551  100 16551    0     0   8026      0  0:00:02  0:00:02 --:--:-- 34991
Check to make sure this is a FreePBX Distro system before executing
This appears to be a FreePBX Distro system as it has a Distro Version of 6.12.65-30
Your FreePBX Distro System is being upgraded to 10.13.66-1. Please standby...

STAGE 1 STARTING - GUI Modules

 Upgrade All FreePBX GUI Modules
 
... many lines of text skipped ... 

Your upgrade may hang! It is possible that your upgrade may hang on or around this point:

  Cleanup    : libtonezone-devel-2.10.0.1-1.shmz65.1.17.x86_64                                262/478
  Cleanup    : bfa-firmware-3.2.21.1-2.el6.noarch                                             263/478
  Cleanup    : asterisk-version-switch-2.0.0.0-10.shmz65.1.17.noarch                          264/478
  Cleanup    : dracut-kernel-004-336.el6_5.2.noarch                                           265/478
  Cleanup    : dracut-004-336.el6_5.2.noarch                                                  266/478
  Cleanup    : iSymphonyServerV3-fpbx-3.2.1.11-1.noarch                                       267/478

This is due to a bug that is fixed in the latest version of Pacemaker. Please read the Stalled Upgrade page to unblock it.

When it is finished you will see something like this

STAGE 4 COMPLETED - Clean Up - Moving to Stage 5

STAGE 5 STARTING - Final Verifications

STAGE 5 COMPLETED - Final Verifications - Moving to Stage 6
Wed Dec  9 15:13:58 AEST 2015 UPGRADE 100% COMPLETED
Error: unable to get crm_config, is pacemaker running?
Unlocked.
REBOOT YOUR BOX NOW
If you would like to change your Asterisk release version from
1.8, 10, 11 or 12 you can do so by typing asterisk-version-switch
from the linux CLI at anytime
[[email protected] ~]#
[[email protected] ~]#

DO NOT RESTART YOUR MACHINE! At this point, the cluster services are not running on this machine, and need to be restarted. You can verify this with the following commands:

[[email protected] ~]# pcs status
Error: cluster is not currently running on this node
[[email protected] ~]# service pacemaker start
Starting cluster:
   Checking if cluster has been disabled at boot...        [  OK  ]
   Checking Network Manager...                             [  OK  ]
   Global setup...                                         [  OK  ]
   Loading kernel modules...                               [  OK  ]
   Mounting configfs...                                    [  OK  ]
   Starting cman...                                        [  OK  ]
   Waiting for quorum...                                   [  OK  ]
   Starting fenced...                                      [  OK  ]
   Starting dlm_controld...                                [  OK  ]
   Tuning DLM kernel config...                             [  OK  ]
   Starting gfs_controld...                                [  OK  ]
   Unfencing self...                                       [  OK  ]
   Joining fence domain...                                 [  OK  ]
Starting Pacemaker Cluster Manager                         [  OK  ]
[[email protected] ~]# pcs status
Cluster name: freepbx-ha
Last updated: Wed Dec  9 15:14:35 2015
Last change: Wed Dec  9 14:44:27 2015 via cibadmin on freepbx-a
Stack: cman
Current DC: freepbx-b - partition with quorum
Version: 1.1.10-14.el6-368c726
2 Nodes configured
20 Resources configured
 
( .. many configuration entries skipped .. )

You will notice that everything should look exactly the same as it did prior to the upgrade. At this point you can now take the cluster out of maintenance mode.

WARNING! It is at this point that the cluster may determine that asterisk needs to be restarted.

There are cases where an upgrade does not create  /usr/sbin/fwconsole symlink, so at this point you want to confirm that with:

# which fwconsole
/usr/bin/which: no fwconsole in (/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin)
 
# amportal chown
 
# which fwconsole
/usr/sbin/fwconsole

running "amportal chown" will ensure the symlink is created in /usr/sbin/

You can take the cluster out of maintenance mode now:

[[email protected] ~]#  pcs property set maintenance-mode=false
[[email protected] ~]# 

Under NO CIRCUMSTANCES should you take the other node out of standby at this point! 

At this time, you can run 'pcs status' and all the services will appear to be running and valid. However, due to version changes, it is not possible for the standby node to take control of the cluster services, and if it attempts to do so, will cause a catastrophic failure.

Finally, because the upgrade script thinks it failed, you should manually update the version number on this machine

[[email protected] ~]# echo 10.13.66-1 > /etc/schmooze/pbx-version
[[email protected] ~]#

This ensures that the upgrade system knows which track you are on.

Switch to the other node

 You must now proceed to upgrading the standby node. This upgrade will take slightly longer than the upgrade on the active node. An average system should complete the upgrade in around 25 minutes. Note that a number of errors and warnings about Asterisk will be shown as part of the upgrade process  This is of no concern, and should be expected.

This may hang!

It's possible that the 'Cleanup' part of the upgrade may hang. This is due to a (fixed) bug in the Cluster services. If you system seems to not be proceeding in the 'Cleanup' phase, please read the Stalled Upgrade information

[[email protected] ~]# curl https://upgrades.freepbxdistro.org/stable/10.13.66/upgrade-10.13.66-1.sh | bash
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 16551  100 16551    0     0   6759      0  0:00:02  0:00:02 --:--:-- 34267
Unable to connect to remote asterisk (does /var/run/asterisk/asterisk.ctl exist?)
Check to make sure this is a FreePBX Distro system before executing
This appears to be a FreePBX Distro system as it has a Distro Version of 6.12.65-30
Your FreePBX Distro System is being upgraded to 10.13.66-1. Please standby...

STAGE 1 STARTING - GUI Modules

 Upgrade All FreePBX GUI Modules

Fetching settings from amportal.conf file..
/etc/amportal.conf: line 792: reg: command not found
/etc/amportal.conf: line 1009: http://feeds.feedburner.com/InsideTheAsterisk: No such file or directory
FATAL: can not find freepbx_engine to start Asterisk
sudo: /var/lib/asterisk/bin/module_admin: command not found
sudo: /var/lib/asterisk/bin/retrieve_conf: command not found
sudo: /var/lib/asterisk/bin/module_admin: command not found
sudo: /var/lib/asterisk/bin/module_admin: command not found
sudo: /var/lib/asterisk/bin/retrieve_conf: command not found
sudo: /var/lib/asterisk/bin/module_admin: command not found


( .. large number of lines omitted .. )
 

Your upgrade may hang! It is possible that your upgrade may hang on or around this point:

  Cleanup    : libtonezone-devel-2.10.0.1-1.shmz65.1.17.x86_64                                262/478
  Cleanup    : bfa-firmware-3.2.21.1-2.el6.noarch                                             263/478
  Cleanup    : asterisk-version-switch-2.0.0.0-10.shmz65.1.17.noarch                          264/478
  Cleanup    : dracut-kernel-004-336.el6_5.2.noarch                                           265/478
  Cleanup    : dracut-004-336.el6_5.2.noarch                                                  266/478
  Cleanup    : iSymphonyServerV3-fpbx-3.2.1.11-1.noarch                                       267/478

This is due to a bug that is fixed in the latest version of Pacemaker. Please read the Stalled Upgrade page to unblock it.

When the upgrade of the standby node is completed,  it will appear as if it has encountered a fatal error. This is normal, as this machine is not in control of the cluster. 

Moving to Next Step
 Update to FreePBX 13
bash: line 226: /var/lib/asterisk/bin/freepbx_setting: No such file or directory
bash: line 229: [: : integer expression expected
bash: line 239: [: : integer expression expected
 update all FreePBX 13 modules now
bash: line 252: /var/lib/asterisk/bin/freepbx_setting: No such file or directory
unlink: cannot unlink `/usr/local/sbin/fwconsole': No such file or directory
Fetching settings from amportal.conf file..
/etc/amportal.conf: line 792: reg: command not found
/etc/amportal.conf: line 1009: http://feeds.feedburner.com/InsideTheAsterisk: No such file or directory
FATAL: can not find freepbx_engine to start Asterisk
An error has occurred updating to FreePBX 13
[[email protected] ~]#

Because the upgrade script thinks it failed, you should manually update the version number on this machine

[[email protected] ~]# echo 10.13.66-1 > /etc/schmooze/pbx-version
[[email protected] ~]#

When you see these errors, and after you update the version, you now must reboot the standby node.

[[email protected] ~]# reboot
Broadcast message from [email protected]
        (/dev/pts/0) at 16:11 ...
The system is going down for reboot NOW!
[[email protected] ~]#

When the standby node has rebooted, you can now take it out of standby mode and verify that it has rejoined the cluster successfully.

[[email protected] ~]# pcs cluster unstandby freepbx-b


(Wait a second here for the changes to propagate)

[[email protected] ~]# pcs status
Cluster name: freepbx-ha
Last updated: Wed Dec  9 16:14:58 2015
Last change: Wed Dec  9 16:14:54 2015 via cibadmin on freepbx-a
Stack: cman
Current DC: freepbx-a - partition with quorum
Version: 1.1.11-97629de
2 Nodes configured
20 Resources configured

Online: [ freepbx-a freepbx-b ]
Full list of resources:
 spare_ip       (ocf::heartbeat:IPaddr2):       Started freepbx-a
 floating_ip    (ocf::heartbeat:IPaddr2):       Started freepbx-a
 Master/Slave Set: ms-asterisk [drbd_asterisk]
     Masters: [ freepbx-a ]
     Slaves: [ freepbx-b ]
 Master/Slave Set: ms-mysql [drbd_mysql]
     Masters: [ freepbx-a ]
     Slaves: [ freepbx-b ]
 Master/Slave Set: ms-httpd [drbd_httpd]
     Masters: [ freepbx-a ]
     Slaves: [ freepbx-b ]
 Master/Slave Set: ms-spare [drbd_spare]
     Masters: [ freepbx-a ]
     Slaves: [ freepbx-b ]


( .. resource configuration omitted .. )

You should now validate the cluster configuration in FreePBX HA again.  If any errors are detected, it will fix them, or, tell you how to fix them if it can't fix it itself.

If all tests pass, you should now set the currently active node to standby, in preparation for rebooting it.

WARNING: THIS WILL CAUSE AN OUTAGE.

Simply click on the 'Standby' button in FreePBX HA. This will move all the services across to the other node.

When the services have moved across, run a cluster check AGAIN. This ensures that all software is up to date on both machines. 

You can now reboot the original node, and when it's rebooted return it from standby mode. Your cluster version upgrade is now complete. Any further upgrades can be performed through the System Admin module, as per normal.

Re-enable Fencing if necessary by browsing to the High Availability module, and clicking the Fencing menu item. Verify that fencing is enabled with:

[[email protected] ~]# pcs property
Cluster Properties:
 cluster-infrastructure: cman
 cluster-recheck-interval: 5m
 dc-version: 1.1.11-97629de
 default-action-timeout: 30
 last-lrm-refresh: 1470610819
 maintenance-mode: false
 no-quorum-policy: ignore
 stonith-enabled: true   <--- here

 

 

  • No labels