Problem:
Attempted BMC ROERG using option REBALANCE with SHRLEVEL
CHANGE between partitions to resolve space issue on a table partition, as
application needed 24/7 availability, which put the table partition in Recovery
pending as the job failed during ALTER Table limit key stage after REORG Switch
phase.
Reason:
During utility terminate phase of BMC REORG with REBALACNE
option it completes the switch to shadow dataset first and then attempt to
alter statement, where the control is passed to DB2. DB2
must invalidate dependent plans and packages on that object to complete the
ALTER successfully.
The BMC Reorg was holding a DRAIN on the object and
successfully completed the switch phase, But that DRAIN did not prevent
dependent packages from running, as long as they are not currently accessing
this object. And if the packages are running, then DB2 cannot invalidate them.
It is also likely in this case that this thread was waiting on the Reorg plus
DRAIN in order to access this object, resulting in a deadlock. To put it
another way, Reorg Plus is waiting on the thread in order to invalidate the
package in order to complete the ALTER, and the thread is waiting on Reorg Plus
to access the object, so its package cannot be invalidated, and so the ALTER
cannot be completed.
The BMC recommended default option ALTRFAIL=RCVRPEND caused
the table to be put in Recovery pending status to allow it to be restarted to
complete the REORG. And this option ALTRFAIL
is an installation option so it cannot be overridden with REORG Control card.
Either you need to change the Default ALTRFAIL=TERM or create new DOPTS module
which takes time to make and needs system DBA support.
With
ALTRFAIL=TERM, REORG PLUS automatically backs out the reorganization when a
failure occurs during processing of limit-key ALTER statements
Fix:
We resolve the space issue with IBM REORG as it build good
compression dictionary and does better space compression than BMC REORG.
Lessons Learnt:
BMC REORG REBALANCE with SHRLEVEL CHANGE even though say no
outage in the manual, it is not really a solution for an application that is
active and busy 24/7.
Always have a user module with ALTRFAIL=TERM and use it to
avoid outage to the application if at all you are going to try to attempt BMC
REORG REBALANACE with SHRLEVL CHANGE for any immediate fix without an outage.