Monitoring job failures such as runtime errors and delays that exceed a certain threshold are part of the SAP Batch Job Monitoring. By setting a threshold, an alert message can be generated when the set limits are exceeded, which usually indicates a performance issue that needs to be investigated.
In IT-Conductor, job failures, runtimes, and delays that exceed pre-defined thresholds can be monitored.
Set Threshold for Job Runtime
Runtime is a period of time when a program is running. An alert may be generated when a problem or timeout occurs while the program is running, and metrics to alert long runtime jobs can be set in the threshold.
In the IT-Conductor dashboard, active alerts can be seen in the Alert Panel. Here, you can identify the alert caused by a runtime error. By clicking on the cause section, a chart will pop up with the details of the runtime that is currently tracking inflight time for jobs.
Figure 1: Sample Alerts in Alert Panel Click the data point to get the list of jobs captured during the selected interval and get details such as user, inflight time, and job name from each job.
Figure 2: In-Flight Time Graph Figure 3: In-Flight Time Sample Returning to the chart, click on the Threshold Override icon to see the alerts captured by the threshold.
Figure 4: Threshold Override Icon Figure 5: Sample of Runtime Overrides Click on an alert to get information such as Warning Value, Warning Severity, and Alarm Severity from each job, all these can be set to your standard metrics. You can also set when the alert will be triggered.
Figure 6: Modify SAPJobDefinition In-Flight Time Retrievers deliver all information about a monitored system. Every component or application has a dedicated retriever that can help ensure that each application's state is reported accurately.
In the IT-Conductor dashboard, navigate to SAP System ID → Retrievers.
Figure 7: IT-Conductor Retrievers View in Service Grid Failed Batch Jobs
Most common alerts are caused by failed jobs. Looking for these alerts is one of the basic requirements when monitoring systems.
1. In the IT-Conductor dashboard, navigate to SAP System ID → Background Jobs → Failed.
2. Click on the graph on the chart to see the list of failed batch jobs. You can also click on any of the job names to get more details about the alert.
Figure 9: Sample Chart for Failed Batch Jobs Figure 10: Sample Failed Batch Jobs In-Flight Times
In-flight time is the time when the background jobs are running.
1. In the IT-Conductor dashboard, navigate to SAP System ID → Background Jobs → Runtime.
In any of the data points you click, it captures the maximum runtime of the jobs during the interval. You can set the frequency on how many jobs would trigger the alert or set the threshold based on severity.
Figure 11: IT-Conductor Runtime View in Service Grid Figure 12: Sample Chart for Runtime Figure 13: Sample List of Job Names with In-Flight Time Details Set Threshold for Delay
A delay means that something is preventing a scheduled background job from starting, impacting the job's overall performance. This can be caused by several reasons (for example, there are not enough dialog processes for the scheduler). All reasons can be captured in IT-Conductor, and a threshold can be created to set metrics for alerts.
1. In the IT-Conductor dashboard, navigate to SAP System ID → Background Jobs → Delay Time.
Click the chart title, then click the Threshold Override icon to see the available job names.
Figure 14: IT-Conductor Delay Time View in Service Grid Figure 15: Threshold Override Icon You can also set the override threshold by clicking on the job name. This override allows tracking when the system is busy or when jobs are scheduled but not able to run on time.
Figure 16: Delay Time Overrides Jobs run on particular servers. Checking the performance metrics on these servers gives you an idea of why the job takes much longer than usual. This can also show you what resources are accumulated, as this can impact other shared resources in the system.
Figure 17: IT-Conductor Server View in Service Grid Monitor Overall Health
Service Health Monitoring provides all the information about the system's overall health. In the service grid, click on Health. This will show the monitoring components of the system.
Figure 18: IT-Conductor Health View in Service Grid Expanding the components will give visibility to the alert warning symbols indicating if there are specific alerts in the current state.
In this option, all the graphs and details can be seen simultaneously synchronizing with the time of the other performance criteria.
This is beneficial in doing analytics to figure out where the bottlenecks are in terms of workload.
Figure 19: Health Explorer Sample View