Jump to content

BIG-IQ upgrade to 8.2.x causes var to run out of space


Cowboy Denny

Recommended Posts

I've been trying to upgrade from BIG-IQ 8.1.0.2 up to 8.2.x but without much luck since whenever I try to upgrade var runs out of space.  One mistake I did was not leverage the LARGE .ova when deploying BIG-IQ to the CM and the DCD's which if I need to rebuild, I will change this since the disk is 500GB on large vs 95GB on standard.

I've run across this F5 article on how to prep a partition on the CM for the upgrade which I'll do in a few weeks.

Something to note: your cluster MUST BE green.

CLUSTER

curl -s -u admin:admin --insecure https://localhost:9200/_cluster/health?pretty

 

CHECK STATUS OF NODES (CM and DCDs)

curl -s -u admin:admin --insecure https://localhost:9200/_cat/nodes?v

Asterick tells you which one is master and they must all match same device must be master

 

curl -s -u admin:admin --insecure https://localhost:9200/_cluster/settings | jq .

change minimum_master_nodes to 5 from 3

 

ALWAYS reboot DCDs first then CM

curl -s localhost:9210/_cat/indices?v

 

ping 10.47.208.45 | while read pong; do echo "$(date): $pong"; done

Link to comment
Share on other sites

  • 4 weeks later...

Sometimes it helps to alter /etc/biq_daemon_provision.json

Default settings

    "elasticsearch": {
      "active": true,
      "memory_allocation": {
        "SYS_4GB": "100m",
        "SYS_8GB": "200m",
        "SYS_16GB": "500m",
        "SYS_32GB": "2600m",
        "SYS_64GB": "8000m",
        "SYS_96GB": "10000m",
        "SYS_128GB": "12000m"
      }
    },

New settings

    "elasticsearch": {
      "active": true,
      "memory_allocation": {
        "SYS_4GB": "100m",
        "SYS_8GB": "200m",
        "SYS_16GB": "500m",
        "SYS_32GB": "1600m",
        "SYS_64GB": "3200m",
        "SYS_96GB": "4800m",
        "SYS_128GB": "6400m"
      }
    },

bigstart restart restjavad

bigstart restart elasticsearch

curl -s localhost:9210/_cat/indices?v

Link to comment
Share on other sites

If you installed the new version on HD1.2 and the original version is on HD1.1 to rollback you just type:

tmsh reboot volume HD1.1

Then when it finally comes back online delete HD1.1

tmsh delete /sys software volume HD1.2

Before you reinstall 8.2.0 on HD1.2 make sure your cluster is green

curl -s localhost:9200/_cluster/health?pretty

{
  "cluster_name" : "a6f1a0d6-c57a-40bf-8037-eef75a6b7f5b",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 9,
  "number_of_data_nodes" : 8,
  "active_primary_shards" : 1567,
  "active_shards" : 3140,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0

If it's green, then continue to reinstall the image

cd /shared/images

tmsh install /sys software image BIG-IQ-8.2.0-0.0.310.iso volume HD1.2 create-volume

tmsh show sys software status

Once it shows fully installed, reboot to the new image

tmsh reboot volume HD1.2

If you get stuck with Waiting for BIG-IQ services to become available then check for errors

 

tailf /var/log/tokumon/current

2022-09-30_12:42:29.95716 [INFO] postgres_configurator: configurePostgresqlIsReady:readWriteReady. ready:true
2022-09-30_12:42:29.95725 [INFO] postgres_configurator: configurePostgresqlIsReady:replicationReady. ready:true
2022-09-30_12:42:29.95750 [INFO] postgres_configurator: configurePostgresqlIsReady:rbacReady. ready:false
2022-09-30_12:42:29.95758 [INFO] postgres_configurator: is db ready. dbIsReady:false
2022-09-30_12:42:29.95778 [WARNING] tokumon: isDBReady db is NOT ready

Then try this solution article: https://support.f5.com/csp/article/K61023744

Link to comment
Share on other sites

  • 2 weeks later...

Next I have to try what is stated in this article

https://support.f5.com/csp/article/K14532110

In short...

  1. Download the BIG-IQ 8.2.0 disk-sizing tools from F5
  2. Extract the tools: tar xpf /shared/F5_Networks_Disk_Size_Tools.tar -C /shared/.
  3. Discover what size disk BIG-IQ 8.2.0 needs: /shared/F5_Networks_Disk_Size_Tools/imageplan /shared/images/BIG-IQ-8.2.0-0.0.310.iso
    • Standard plan (500GB HDD)
        Mount point: /, Size: 450000k
        Mount point: /usr, Size: 10485760k
        Mount point: /config, Size: 3320000k
        Mount point: /var, Size: 78643200k
      Tiny plan (95GB)
        Mount point: /, Size: 450000k
        Mount point: /usr, Size: 10485760k
        Mount point: /config, Size: 500000k
        Mount point: /var, Size: 26214400k
  4. Utilize vgdisplay to see what size your HDD drive is
    • # vgdisplay
        --- Volume group ---
        VG Name               vg-db-sda
        System ID
        Format                lvm2
        Metadata Areas        1
        Metadata Sequence No  188
        VG Access             read/write
        VG Status             resizable
        MAX LV                0
        Cur LV                12
        Open LV               7
        Max PV                0
        Cur PV                1
        Act PV                1
        VG Size               498.50 GiB
        PE Size               4.00 MiB
        Total PE              127617
        Alloc PE / Size       84563 / 330.32 GiB
        Free  PE / Size       43054 / 168.18 GiB
        VG UUID               niI0Pt-8xVm-GghP-hXHK-X1YG-5Bm6-wWFAII
  5. lvs
  6. create volume manually: addvol 1.2 /shared/images/BIG-IQ-8.2.0-0.0.310.iso
    • # addvol 1.2 /shared/images/BIG-IQ-8.2.0-0.0.310.iso
      create volume 1.2 for image /shared/images/BIG-IQ-8.2.0-0.0.310.iso
      product BIG-IQ version 8.2.0 build 0.0.310 (BIGIQ820) selected
      Creating new location sda, 2...
      warning: tm_install::VolumeSet::choose_filesystem_plan -- Selected standard plan (disk = 524288000, sz_standard = 92898960)
      warning: tm_install::VolumeSet::choose_filesystem_plan -- Selected standard plan (disk = 524288000, sz_standard = 92898960)
       
  7. reboot
  8. Verify HD1.2 exists (software is not installed to it yet) : tmsh show sys software status
  9. when you run lvs you will see the volumes are not large enough and you will need to resize the volumes first
  10. Normally the only volume that is wrong is the /var volume (set to 75GB instead of 120GB) so you need to resize it by running: resizevol /var 125829120 (NOTE: it does it for all logical volumes regardless of what partition you are on (HD1.1, HD1.2, HD1.3) but it doesn't affect anything.)
  11. NOW the partition is prepared, install the OS to it. 
    • cd /shared/images
    • tmsh install /sys software image BIG-IQ-8.2.0-0.0.310.iso volume HD1.2
    • tmsh show /sys software status
  12. Confirm that /var is still set to 120GB and not 75GB still: lvs
  13. When ready to boot to the new volume run: tmsh reboot volume HD1.2

 

ROLLBACK is super easy

  1. tmsh reboot volume HD1.1 (original working volume)
  2. tmsh delete /sys software volume HD1.2 (new volume you created and booted to unsuccessfully)
Link to comment
Share on other sites

  • 2 weeks later...

This is my next attempt to get the upgrade of BIG-IQ from 8.1.0.1 to 8.2.0

1) Create new install volume using addvol

 addvol HD1.2 /shared/images/BIG-IQ-8.2.0-0.0.310.iso

2) Verify the newly created /var is at least 40 GB (based on anticipated resulting /var size of 20 GB)

lvscan | grep _var

  ACTIVE            '/dev/vg-db-sda/set.1._var' [75.00 GiB] inherit
  ACTIVE            '/dev/vg-db-sda/set.2._var' [40.00 GiB] inherit    <-- This should show 40 GB or more

   If the size of set.2._var is less than 40 GB, resize it to 40G

resizevol /var 41943040

3) Install V8.2 with no reboot

 tmsh install sys software image BIG-IQ-8.2.0-0.0.310.iso volume HD1.2 create-volume

4) Mount the new volume 

     volumeset -f mount HD1.2
     cd /mnt/HD1.2

  a) Make changes to allow appiq daemons to log to the proper location:

 find var/config/appiq -name "log4j2.xml" -exec vi {} \;

    Modify the RollingFile name= line to include /var/log/appiq/ for the fileName and filePattern
     Under <Root level = "info"> add <AppenderRef ref="RollingFile"/>
     For postaggregator only, modify the following line:
     From:              <DefaultRolloverStrategy max="10"/>
       To:              <DefaultRolloverStrategy max="10">

 Or
     Follow the work-around steps in https://cdn.f5.com/product/bugtracker/ID1117597.html and download the gz to /shared/tmp and extract the files then run the following
       i.    yes | cp /shared/tmp/agentmanager_log4j2.xml /mnt/HD1.2/var/config/appiq/agentmanager/config/log4j2.xml
     ii.    yes | cp /shared/tmp/configserver_log4j2.xml /mnt/HD1.2/var/config/appiq/configserver/config/log4j2.xml
    iii.    yes | cp /shared/tmp/queryservice_log4j2.xml /mnt/HD1.2/var/config/appiq/queryservice/config/log4j2.xml
     iv.    yes | cp /shared/tmp/postaggregator_log4j2.xml /mnt/HD1.2/var/config/appiq/postaggregator/config/log4j2.xml


  b) Make change to /mnt/HD1.2/usr/share/rest/tokumon/config/modules/global.js
                    },
                    "declaration": {
                        "enabled": false
                    },                                 <--- Add
                    "body": {                          <--- Add
                        "enabled": false               <--- Add
                    }

  c) Make changes to /mnt/HD1.2/etc/biq_daemon_provision.json to change tokumond find.oplog and send.oplog.
     Under big_iq.tokumond, modify the batch_size for send_find_buffer_allocation.SYS_32GB and send_oplog_buffer_allocation.SYS_32GB to 324 (match SYS_16GB)
     I discussed the changes that were made to big_iq.elasticsearch (SYS_23GB increased to 4000m) and big_iq.appiqpostaggregator (SYS_32GB increased to 1000m) on September first during an online session for https://tron.f5net.com/sr/1-8494851811 / https://f5.my.salesforce.com/5001T00001kD2B4).  There were no specific notes on this change that I could find,  so I decided the change was likely an attempt to fix the UI not coming online.  It was later determined that the UI was not coming online because of tokumond, so these changes ultimately had no affect getting the system to function.  Because of this I'm omitting these changes.

  d) Move /mnt/HD1.2/etc/cron.daily/update-top-pg-tables to /etc/cron.d
     mv /mnt/HD1.2/etc/cron.daily/update-top-pg-tables /mnt/HD1.2/etc/cron.d
   
  e) chmod 644 /mnt/HD1.2/etc/cron.d/update-top-pg-tables

5) Reboot into the new volume, which will initiate the remainder of the upgrade.
     tmsh reboot volume HD1.2
     
6) Monitoring the upgrade progress:
     Bootstrap (took about 10 minutes):  This is the process initiated following the reboot that gets the database loaded and upgraded to the current version.  The progress can be monitored using the following command:
         tail -f /var/log/bootstrap/bootstrap-<dateTime>.out
         The bootstrap process is finished when we see the following message:
         ==== BOOTSTRAP SUCCESSFUL ====
     
     RBAC Reset (took about 9 hours):  This process is initiated following the database upgrade that causes all of the role based access controls to be rebuilt.  This is the process that can cause /var to grow rapidly.  
     The progress can be monitored using the following command:
         psql -U postgres -d bigiq_db -c "select count(*) from bigiqauthorization.shared_authorization_journals;"
         The RBAC Reset is complete when the count goes to zero.  It will jump up to 7K or so within the first few minutes of the process.
         You can create a short script to monitor the progress at an interval (change the sleep value at the end to control that):
         while true ; do date ; psql -U postgres -d bigiq_db -c "select count(*) from bigiqauthorization.shared_authorization_journals;" ; sleep 60 ; done
     
     Tokumon Database (took about 10 minutes):  This process builds the cross reference tables used by the UI.  The progress can be monitored using the following command:
         While waiting for RBAC Reset to complete, the tokumon logs will show the following messages:
         tailf /var/log/tokumon/current
            [INFO] postgres_configurator: configurePostgresqlIsReady:readWriteReady. ready:true
            [INFO] postgres_configurator: configurePostgresqlIsReady:replicationReady. ready:true
            [INFO] postgres_configurator: configurePostgresqlIsReady:rbacReady. ready:false
            [INFO] postgres_configurator: is db ready. dbIsReady:false
            [WARNING] tokumon: isDBReady db is NOT ready.

         Once the RBAC Reset is complete, the tokumon database rebuild will actually start.
         tail -f /var/log/tokumon/current | grep Progress: 
         Once the Progress bar shows percentComplete:98., check to see that it saves the checkpoint indicating it finished completely:
            [INFO] mode: setting Silent to false
            [INFO] tokumon: Saving Checkpoing.... Checkpoint{ _id: '649/5649DC58',

          
         After the above messages, the logs will begin to show messages similar to these:
            [INFO] logical-replication: Acknowledge 649/5C1E5918
            [INFO] logical-replication: Acknowledge 649/5C4FC188
            [INFO] logical-replication: Acknowledge 649/5CFCEBA8

     guiserver (seconds after tokumon database is finished):  This is responsible for the UI.  The progress can be monitored using the following command:
         tail -f /var/log/guiserver/current
         While waiting for RBAC Reset to finish, we will see messages similar to the following:
            info: f5IndexingStatus: checking status of index...
            info: f5IndexingStatus: received 404 response, indicating initial indexing is not yet complete
            info: initial indexing not yet complete, reason: no indexing status document found

         While waiting for Tokumon Database to be rebuilt, we will see messages similar to the following:
            info: f5IndexingStatus: checking status of index...
            info: initial indexing not yet complete, reason: indexing mode is find

         The UI should be online and accessible when we see the following message:
            info: initial indexing is complete

Link to comment
Share on other sites



×
×
  • Create New...