Thursday, May 28, 2015

BMC REORG –Options Parameter - Defaults not set properly




Problem:   We had encountered issue with Online BMC DB2 Reorg job, the job was in Terminate utility phase and went into RETRY mode, and able to complete the REORG as some thread was holding the REORG job.  The job was in retry mode for a long time and the table was in locked status, due to the BMC defaults options in the system.  As the table locks was one of the critical table impact was huge.
Reason:
The BMC REORG job was not cancellable by anyone, as BMC have made their job non-cancellable during utility termination phase.  We were not be able to track the thread that was holding the BMC REORG job as ours is a very busy system and even DB2 Omegamon tool was not showing any lock conflicts. There were no application batch jobs other than REORG batch jobs, as in our system we have mechanism to hold all application batch during DBA REORG to avoid contentions.  Display thread on the table shows huge list of threads as it was one on the critical table in the system.

Fix: Followed below steps
  1. Terminate the utility.  The utility was terminated but still kept retry due to the default setting.
  2. Cancelling the reorg – BMC had made the REORG job uncancellable in Utility terminate phase when DRNWAIT is set to UTIL.
  3. Performed DB2 rebounce – table in question was still in recovery pending mode after that.
  4. Restarted the DB2 table in question after health check the table. This caused an outage for about an hour.
  5. We were not able to identify the DB2 Thread causing the issue. (side issue)


However, the best fix would be to identify the thread that is holding the REORG job and cancel it.
Lesson Learnt: 
The combination of DRNRETRY = 255 and DRNWAIT=UTIL had caused the Reorg to retry 255 times every 5 mins during which terminating would not have effect and Cancellation would not be allowed.
Best values to use are
DRNWAIT=NONE  (this was set to UTIL by default) à Which means that the drain request issued by REORG PLUS times out immediately if the drain cannot acquire the lock. NONE prevents any application transactions from being queued during the drain process. BMC recommends that we specify NONE in high-transaction environments.
DSPLOCKS=RETRY which will display the locks and claimers after every RETRY. This will be useful information to identify a common thread across all RETRYs to narrow down the offending thread.  
TIMEOUT=TERM à This parameter leaves the objects in their original states (RW) and terminates the job.
BMC Recommend the default options for DRNWAIT=NONE and TIMEOUT=TERM in there manual.



No comments:

Post a Comment