Advanced SAP Batch Job Monitoring

Background

Monitoring job failures such as runtime errors and delays that exceed a certain threshold are part of the SAP Batch Job Monitoring. By setting a threshold, an alert message can be generated when the set limits are exceeded and this usually indicates a performance issue that needs to be investigated.

In IT-Conductor, we have the ability to track job failures as well as job runtimes and delays that exceed pre-defined thresholds.

Setting Threshold for Job Runtime

Runtime is a period of time when a program is running. An alert may be generated when a problem or timeout occurred while the program is running, and metrics to alert long runtime jobs can be set in the threshold.

1. In the IT-Conductor dashboard, active alerts can be seen in the Alert Panel. Here you can identify the alert that was caused by a runtime error. By clicking on the cause section, a chart will pop up with the details of the runtime that is currently tracking inflight time for jobs.

Figure 1: Sample Alerts in Alert Panel

2. Click on the data point to get the list of jobs captured during the selected interval and get details such as user, inflight time, and job name from each job.

Figure 2: In-Flight Time Graph
Figure 3: In-Flight Time Sample

3. Returning to the chart, click on the “Threshold Override” icon to see the alerts that were captured by the threshold.

Figure 4: Threshold Override Icon
Figure 5: Sample of Runtime Overrides

4. Click on an alert to get information such as Warning Value, Warning Severity, and Alarm Severity from each job, all these can be set to your standard metrics. You can also set when the alert will be triggered.

Figure 6: Modify SAPJobDefinition In-Flight Time

Monitoring Specific SAP Batch Job Performance

Retrievers

Retrievers deliver all information about a monitored system. Every component or application has a dedicated retriever that can help to ensure that the state of each application is reported accurately.

1. In the IT-Conductor dashboard, navigate to SAP System ID > Retrievers.

Figure 7: IT-Conductor Retrievers View in Service Grid

Failed Batch Jobs

Most common alerts are caused by failed jobs. Looking for these alerts is one of the basic requirements when monitoring systems.

1. In the IT-Conductor dashboard, navigate to SAP System ID > Background Jobs > Failed.

2. Click on the graph on the chart to see the list of failed batch jobs. You can also click on any of the job names to get more details about the alert.

Figure 9: Sample Chart for Failed Batch Jobs
Figure 10: Sample Failed Batch Jobs

In-Flight Times

In-flight time is the time when the background jobs are running.

1. In the IT-Conductor dashboard, navigate to SAP System ID > Background Jobs > Runtime.

In any of the data points you click, it captures the maximum runtime of the jobs during the interval. You can set the frequency on how many jobs would trigger the alert or set the threshold based on severity.

Figure 11: IT-Conductor Runtime View in Service Grid
Figure 12: Sample Chart for Runtime
Figure 13: Sample List of Job Names with In-Flight Time Details

Setting Threshold for Delay

A delay means that something is preventing a scheduled background job to start, impacting the job's overall performance. This is caused by several reasons (for example: that there are not enough dialog processes for the scheduler). All reasons can be captured in IT-Conductor and a threshold can be created to set metrics for alerts.

1. In the IT-Conductor dashboard, navigate to SAP System ID > Background Jobs > Delay Time.

Click on the title of the chart and then the “Threshold Override” icon to see the available job names.

Figure 14: IT-Conductor Delay Time View in Service Grid
Figure 15: Threshold Override Icon

2. You can also set the override threshold by clicking on the job name. This override gives the ability to track when the system is busy or when jobs are scheduled but not able to run on time.

Figure 16: Delay Time Overrides

Monitoring Job Performance in Application Server

Jobs run on particular servers. Checking the performance metrics on these servers gives you an idea of why the job is taking much more time than usual. This can also show you what resources are accumulated as this can impact other shared resources in the system.

Figure 17: IT-Conductor Server View in Service Grid

Monitoring Overall Health

Service Health Monitoring provides all the information about the system's overall health. In the service grid, click on Health. This will show the monitoring components of the system.

Figure 18: IT-Conductor Health View in Service Grid

Expanding the components will give visibility to the alert warning symbols indicating if there are specific alerts in the current state.

In this option, all the graphs and details can be seen simultaneously synchronizing with the time of the other performance criteria.

This is beneficial in doing analytics to figure out where the bottlenecks are in terms of workload.

Figure 19: Health Explorer Sample View

Video

Last updated

#660:

Change request updated