Author: sgandhla
Subject: Batch job tunning
Posted: Fri Mar 24, 2017 9:41 pm (GMT 5.5)
Hi Everyone,
This is my first post in the forum, I spent one month in analyzing an issue in my new role without any luck. Hope I find some assistance here.
we have a batch job hosted from different locations, both of them are running in EC12, everyday we recycle the job at 15:00 GMT , so job is down from 11:00 t0 15:00 when we restart the job one site is able to process ~15 million txn in 15 minutes interval other site can process ~3million txn, but by end of the day both are able to process same number of txn(~200 million), the slow running lpar catches up after 2 hours , it is able to process ~10 million in 15 minutes. so I looked few things like looked at system loads at that time, both are running less than 50% , WLM policies , its the same , job is exactly same(as per application team), I changed weights of the LPAR to get more vertical Highs. without any improvement in performance. I pulled numbers from SMFINTRV member which shows both consume same CPU time, but the LPAR that process slowly has higher I/O time than the other one. as one more attempt we made a WLM change to the slow running LPAR service class to increase I/O priority to high, the one unsolved puzzle is when the job process less number of txn it goes to DW status and does nothing for atleast 10 minutes in 15 minutes in when I looked in real time from SDSF.
_________________
SG
Subject: Batch job tunning
Posted: Fri Mar 24, 2017 9:41 pm (GMT 5.5)
Hi Everyone,
This is my first post in the forum, I spent one month in analyzing an issue in my new role without any luck. Hope I find some assistance here.
we have a batch job hosted from different locations, both of them are running in EC12, everyday we recycle the job at 15:00 GMT , so job is down from 11:00 t0 15:00 when we restart the job one site is able to process ~15 million txn in 15 minutes interval other site can process ~3million txn, but by end of the day both are able to process same number of txn(~200 million), the slow running lpar catches up after 2 hours , it is able to process ~10 million in 15 minutes. so I looked few things like looked at system loads at that time, both are running less than 50% , WLM policies , its the same , job is exactly same(as per application team), I changed weights of the LPAR to get more vertical Highs. without any improvement in performance. I pulled numbers from SMFINTRV member which shows both consume same CPU time, but the LPAR that process slowly has higher I/O time than the other one. as one more attempt we made a WLM change to the slow running LPAR service class to increase I/O priority to high, the one unsolved puzzle is when the job process less number of txn it goes to DW status and does nothing for atleast 10 minutes in 15 minutes in when I looked in real time from SDSF.
_________________
SG