Cowboy Denny Posted September 1, 2022 Share Posted September 1, 2022 I've been trying to upgrade from BIG-IQ 8.1.0.2 up to 8.2.x but without much luck since whenever I try to upgrade var runs out of space. One mistake I did was not leverage the LARGE .ova when deploying BIG-IQ to the CM and the DCD's which if I need to rebuild, I will change this since the disk is 500GB on large vs 95GB on standard. I've run across this F5 article on how to prep a partition on the CM for the upgrade which I'll do in a few weeks. Something to note: your cluster MUST BE green. CLUSTER curl -s -u admin:admin --insecure https://localhost:9200/_cluster/health?pretty CHECK STATUS OF NODES (CM and DCDs) curl -s -u admin:admin --insecure https://localhost:9200/_cat/nodes?v Asterick tells you which one is master and they must all match same device must be master curl -s -u admin:admin --insecure https://localhost:9200/_cluster/settings | jq . change minimum_master_nodes to 5 from 3 ALWAYS reboot DCDs first then CM curl -s localhost:9210/_cat/indices?v ping 10.47.208.45 | while read pong; do echo "$(date): $pong"; done Link to comment Share on other sites More sharing options...
Cowboy Denny Posted September 24, 2022 Author Share Posted September 24, 2022 Sometimes it helps to alter /etc/biq_daemon_provision.json Default settings "elasticsearch": { "active": true, "memory_allocation": { "SYS_4GB": "100m", "SYS_8GB": "200m", "SYS_16GB": "500m", "SYS_32GB": "2600m", "SYS_64GB": "8000m", "SYS_96GB": "10000m", "SYS_128GB": "12000m" } }, New settings "elasticsearch": { "active": true, "memory_allocation": { "SYS_4GB": "100m", "SYS_8GB": "200m", "SYS_16GB": "500m", "SYS_32GB": "1600m", "SYS_64GB": "3200m", "SYS_96GB": "4800m", "SYS_128GB": "6400m" } }, bigstart restart restjavad bigstart restart elasticsearch curl -s localhost:9210/_cat/indices?v Link to comment Share on other sites More sharing options...
Cowboy Denny Posted September 24, 2022 Author Share Posted September 24, 2022 If you installed the new version on HD1.2 and the original version is on HD1.1 to rollback you just type: tmsh reboot volume HD1.1 Then when it finally comes back online delete HD1.1 tmsh delete /sys software volume HD1.2 Before you reinstall 8.2.0 on HD1.2 make sure your cluster is green curl -s localhost:9200/_cluster/health?pretty { "cluster_name" : "a6f1a0d6-c57a-40bf-8037-eef75a6b7f5b", "status" : "green", "timed_out" : false, "number_of_nodes" : 9, "number_of_data_nodes" : 8, "active_primary_shards" : 1567, "active_shards" : 3140, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 0, "delayed_unassigned_shards" : 0, "number_of_pending_tasks" : 0, "number_of_in_flight_fetch" : 0, "task_max_waiting_in_queue_millis" : 0, "active_shards_percent_as_number" : 100.0 } If it's green, then continue to reinstall the image cd /shared/images tmsh install /sys software image BIG-IQ-8.2.0-0.0.310.iso volume HD1.2 create-volume tmsh show sys software status Once it shows fully installed, reboot to the new image tmsh reboot volume HD1.2 If you get stuck with Waiting for BIG-IQ services to become available then check for errors tailf /var/log/tokumon/current 2022-09-30_12:42:29.95716 [INFO] postgres_configurator: configurePostgresqlIsReady:readWriteReady. ready:true 2022-09-30_12:42:29.95725 [INFO] postgres_configurator: configurePostgresqlIsReady:replicationReady. ready:true 2022-09-30_12:42:29.95750 [INFO] postgres_configurator: configurePostgresqlIsReady:rbacReady. ready:false 2022-09-30_12:42:29.95758 [INFO] postgres_configurator: is db ready. dbIsReady:false 2022-09-30_12:42:29.95778 [WARNING] tokumon: isDBReady db is NOT ready Then try this solution article: https://support.f5.com/csp/article/K61023744 Link to comment Share on other sites More sharing options...
Cowboy Denny Posted October 4, 2022 Author Share Posted October 4, 2022 Next I have to try what is stated in this article https://support.f5.com/csp/article/K14532110 In short... Download the BIG-IQ 8.2.0 disk-sizing tools from F5 Extract the tools: tar xpf /shared/F5_Networks_Disk_Size_Tools.tar -C /shared/. Discover what size disk BIG-IQ 8.2.0 needs: /shared/F5_Networks_Disk_Size_Tools/imageplan /shared/images/BIG-IQ-8.2.0-0.0.310.iso Standard plan (500GB HDD) Mount point: /, Size: 450000k Mount point: /usr, Size: 10485760k Mount point: /config, Size: 3320000k Mount point: /var, Size: 78643200k Tiny plan (95GB) Mount point: /, Size: 450000k Mount point: /usr, Size: 10485760k Mount point: /config, Size: 500000k Mount point: /var, Size: 26214400k Utilize vgdisplay to see what size your HDD drive is # vgdisplay --- Volume group --- VG Name vg-db-sda System ID Format lvm2 Metadata Areas 1 Metadata Sequence No 188 VG Access read/write VG Status resizable MAX LV 0 Cur LV 12 Open LV 7 Max PV 0 Cur PV 1 Act PV 1 VG Size 498.50 GiB PE Size 4.00 MiB Total PE 127617 Alloc PE / Size 84563 / 330.32 GiB Free PE / Size 43054 / 168.18 GiB VG UUID niI0Pt-8xVm-GghP-hXHK-X1YG-5Bm6-wWFAII lvs create volume manually: addvol 1.2 /shared/images/BIG-IQ-8.2.0-0.0.310.iso # addvol 1.2 /shared/images/BIG-IQ-8.2.0-0.0.310.iso create volume 1.2 for image /shared/images/BIG-IQ-8.2.0-0.0.310.iso product BIG-IQ version 8.2.0 build 0.0.310 (BIGIQ820) selected Creating new location sda, 2... warning: tm_install::VolumeSet::choose_filesystem_plan -- Selected standard plan (disk = 524288000, sz_standard = 92898960) warning: tm_install::VolumeSet::choose_filesystem_plan -- Selected standard plan (disk = 524288000, sz_standard = 92898960) reboot Verify HD1.2 exists (software is not installed to it yet) : tmsh show sys software status when you run lvs you will see the volumes are not large enough and you will need to resize the volumes first Normally the only volume that is wrong is the /var volume (set to 75GB instead of 120GB) so you need to resize it by running: resizevol /var 125829120 (NOTE: it does it for all logical volumes regardless of what partition you are on (HD1.1, HD1.2, HD1.3) but it doesn't affect anything.) NOW the partition is prepared, install the OS to it. cd /shared/images tmsh install /sys software image BIG-IQ-8.2.0-0.0.310.iso volume HD1.2 tmsh show /sys software status Confirm that /var is still set to 120GB and not 75GB still: lvs When ready to boot to the new volume run: tmsh reboot volume HD1.2 ROLLBACK is super easy tmsh reboot volume HD1.1 (original working volume) tmsh delete /sys software volume HD1.2 (new volume you created and booted to unsuccessfully) Link to comment Share on other sites More sharing options...
Cowboy Denny Posted October 18, 2022 Author Share Posted October 18, 2022 This is my next attempt to get the upgrade of BIG-IQ from 8.1.0.1 to 8.2.0 1) Create new install volume using addvol addvol HD1.2 /shared/images/BIG-IQ-8.2.0-0.0.310.iso 2) Verify the newly created /var is at least 40 GB (based on anticipated resulting /var size of 20 GB) lvscan | grep _var ACTIVE '/dev/vg-db-sda/set.1._var' [75.00 GiB] inherit ACTIVE '/dev/vg-db-sda/set.2._var' [40.00 GiB] inherit <-- This should show 40 GB or more If the size of set.2._var is less than 40 GB, resize it to 40G resizevol /var 41943040 3) Install V8.2 with no reboot tmsh install sys software image BIG-IQ-8.2.0-0.0.310.iso volume HD1.2 create-volume 4) Mount the new volume volumeset -f mount HD1.2 cd /mnt/HD1.2 a) Make changes to allow appiq daemons to log to the proper location: find var/config/appiq -name "log4j2.xml" -exec vi {} \; Modify the RollingFile name= line to include /var/log/appiq/ for the fileName and filePattern Under <Root level = "info"> add <AppenderRef ref="RollingFile"/> For postaggregator only, modify the following line: From: <DefaultRolloverStrategy max="10"/> To: <DefaultRolloverStrategy max="10"> Or Follow the work-around steps in https://cdn.f5.com/product/bugtracker/ID1117597.html and download the gz to /shared/tmp and extract the files then run the following i. yes | cp /shared/tmp/agentmanager_log4j2.xml /mnt/HD1.2/var/config/appiq/agentmanager/config/log4j2.xml ii. yes | cp /shared/tmp/configserver_log4j2.xml /mnt/HD1.2/var/config/appiq/configserver/config/log4j2.xml iii. yes | cp /shared/tmp/queryservice_log4j2.xml /mnt/HD1.2/var/config/appiq/queryservice/config/log4j2.xml iv. yes | cp /shared/tmp/postaggregator_log4j2.xml /mnt/HD1.2/var/config/appiq/postaggregator/config/log4j2.xml b) Make change to /mnt/HD1.2/usr/share/rest/tokumon/config/modules/global.js }, "declaration": { "enabled": false }, <--- Add "body": { <--- Add "enabled": false <--- Add } c) Make changes to /mnt/HD1.2/etc/biq_daemon_provision.json to change tokumond find.oplog and send.oplog. Under big_iq.tokumond, modify the batch_size for send_find_buffer_allocation.SYS_32GB and send_oplog_buffer_allocation.SYS_32GB to 324 (match SYS_16GB) I discussed the changes that were made to big_iq.elasticsearch (SYS_23GB increased to 4000m) and big_iq.appiqpostaggregator (SYS_32GB increased to 1000m) on September first during an online session for https://tron.f5net.com/sr/1-8494851811 / https://f5.my.salesforce.com/5001T00001kD2B4). There were no specific notes on this change that I could find, so I decided the change was likely an attempt to fix the UI not coming online. It was later determined that the UI was not coming online because of tokumond, so these changes ultimately had no affect getting the system to function. Because of this I'm omitting these changes. d) Move /mnt/HD1.2/etc/cron.daily/update-top-pg-tables to /etc/cron.d mv /mnt/HD1.2/etc/cron.daily/update-top-pg-tables /mnt/HD1.2/etc/cron.d e) chmod 644 /mnt/HD1.2/etc/cron.d/update-top-pg-tables 5) Reboot into the new volume, which will initiate the remainder of the upgrade. tmsh reboot volume HD1.2 6) Monitoring the upgrade progress: Bootstrap (took about 10 minutes): This is the process initiated following the reboot that gets the database loaded and upgraded to the current version. The progress can be monitored using the following command: tail -f /var/log/bootstrap/bootstrap-<dateTime>.out The bootstrap process is finished when we see the following message: ==== BOOTSTRAP SUCCESSFUL ==== RBAC Reset (took about 9 hours): This process is initiated following the database upgrade that causes all of the role based access controls to be rebuilt. This is the process that can cause /var to grow rapidly. The progress can be monitored using the following command: psql -U postgres -d bigiq_db -c "select count(*) from bigiqauthorization.shared_authorization_journals;" The RBAC Reset is complete when the count goes to zero. It will jump up to 7K or so within the first few minutes of the process. You can create a short script to monitor the progress at an interval (change the sleep value at the end to control that): while true ; do date ; psql -U postgres -d bigiq_db -c "select count(*) from bigiqauthorization.shared_authorization_journals;" ; sleep 60 ; done Tokumon Database (took about 10 minutes): This process builds the cross reference tables used by the UI. The progress can be monitored using the following command: While waiting for RBAC Reset to complete, the tokumon logs will show the following messages: tailf /var/log/tokumon/current [INFO] postgres_configurator: configurePostgresqlIsReady:readWriteReady. ready:true [INFO] postgres_configurator: configurePostgresqlIsReady:replicationReady. ready:true [INFO] postgres_configurator: configurePostgresqlIsReady:rbacReady. ready:false [INFO] postgres_configurator: is db ready. dbIsReady:false [WARNING] tokumon: isDBReady db is NOT ready. Once the RBAC Reset is complete, the tokumon database rebuild will actually start. tail -f /var/log/tokumon/current | grep Progress: Once the Progress bar shows percentComplete:98., check to see that it saves the checkpoint indicating it finished completely: [INFO] mode: setting Silent to false [INFO] tokumon: Saving Checkpoing.... Checkpoint{ _id: '649/5649DC58', After the above messages, the logs will begin to show messages similar to these: [INFO] logical-replication: Acknowledge 649/5C1E5918 [INFO] logical-replication: Acknowledge 649/5C4FC188 [INFO] logical-replication: Acknowledge 649/5CFCEBA8 guiserver (seconds after tokumon database is finished): This is responsible for the UI. The progress can be monitored using the following command: tail -f /var/log/guiserver/current While waiting for RBAC Reset to finish, we will see messages similar to the following: info: f5IndexingStatus: checking status of index... info: f5IndexingStatus: received 404 response, indicating initial indexing is not yet complete info: initial indexing not yet complete, reason: no indexing status document found While waiting for Tokumon Database to be rebuilt, we will see messages similar to the following: info: f5IndexingStatus: checking status of index... info: initial indexing not yet complete, reason: indexing mode is find The UI should be online and accessible when we see the following message: info: initial indexing is complete Link to comment Share on other sites More sharing options...
Recommended Posts