Problem: We had encountered
issue with Online BMC DB2 Reorg job, the job was in Terminate utility phase and
went into RETRY mode, and able to complete the REORG as some thread was holding
the REORG job. The job was in retry mode
for a long time and the table was in locked status, due to the BMC defaults
options in the system. As the table locks was one of the critical table
impact was huge.
Reason:
The BMC REORG job was not cancellable by anyone, as BMC have made their
job non-cancellable during utility termination phase. We were not be able to track the thread that
was holding the BMC REORG job as ours is a very busy system and even DB2
Omegamon tool was not showing any lock conflicts. There were no application
batch jobs other than REORG batch jobs, as in our system we have mechanism to
hold all application batch during DBA REORG to avoid contentions. Display thread on the table shows huge list of
threads as it was one on the critical table in the system.
Fix: Followed below steps
- Terminate the utility. The utility was terminated but still kept retry due to the default setting.
- Cancelling the reorg – BMC had made the REORG job uncancellable in Utility terminate phase when DRNWAIT is set to UTIL.
- Performed DB2 rebounce – table in question was still in recovery pending mode after that.
- Restarted the DB2 table in question after health check the table. This caused an outage for about an hour.
- We were not able to identify the DB2 Thread causing the issue. (side issue)
However, the best fix would be to identify the thread
that is holding the REORG job and cancel it.
Lesson Learnt:
The combination of DRNRETRY = 255 and DRNWAIT=UTIL had
caused the Reorg to retry 255 times every 5 mins during which terminating would
not have effect and Cancellation would not be allowed.
Best values to use are
DRNWAIT=NONE (this was set to UTIL by default) à
Which means that the drain request issued by REORG PLUS times out immediately
if the drain cannot acquire the lock. NONE prevents any application
transactions from being queued during the drain process. BMC recommends that we
specify NONE in high-transaction environments.
DSPLOCKS=RETRY which will display the locks and claimers after every
RETRY. This will be useful information to identify a common thread across all
RETRYs to narrow down the offending thread.
TIMEOUT=TERM à This parameter leaves the objects in their original
states (RW) and terminates the job.
BMC Recommend the default options for DRNWAIT=NONE and
TIMEOUT=TERM in there manual.
No comments:
Post a Comment