Advanced SAP Batch Job Monitoring

Monitoring job failures such as runtime errors and delays that exceed a certain threshold are part of the SAP Batch Job Monitoring. By setting a threshold, an alert message can be generated when the set limits are exceeded and this usually indicates a performance issue that needs to be investigated.

In IT-Conductor, we have the ability to track job failures as well as job runtimes and delays that exceed pre-defined thresholds.

Setting Threshold for Job Runtime

Runtime is a period of time when a program is running. An alert may be generated when a problem or timeout occurred while the program is running, and metrics to alert long runtime jobs can be set in the threshold.

1. In the IT-Conductor dashboard, active alerts can be seen in the Alert Panel. Here you can identify the alert that was caused by a runtime error. By clicking on the cause section, a chart will pop up with the details of the runtime that is currently tracking inflight time for jobs.

2. Click on the data point to get the list of jobs captured during the selected interval and get details such as user, inflight time, and job name from each job.

3. Returning to the chart, click on the Threshold Override icon to see the alerts that were captured by the threshold.

4. Click on an alert to get information such as Warning Value, Warning Severity, and Alarm Severity from each job, all these can be set to your standard metrics. You can also set when the alert will be triggered.

Monitoring Specific SAP Batch Job Performance

Retrievers

Retrievers deliver all information about a monitored system. Every component or application has a dedicated retriever that can help to ensure that the state of each application is reported accurately.

1. In the IT-Conductor dashboard, navigate to SAP System ID → Retrievers.

Failed Batch Jobs

Most common alerts are caused by failed jobs. Looking for these alerts is one of the basic requirements when monitoring systems.

1. In the IT-Conductor dashboard, navigate to SAP System ID → Background Jobs → Failed.

2. Click on the graph on the chart to see the list of failed batch jobs. You can also click on any of the job names to get more details about the alert.

In-Flight Times

In-flight time is the time when the background jobs are running.

1. In the IT-Conductor dashboard, navigate to SAP System ID → Background Jobs → Runtime.

In any of the data points you click, it captures the maximum runtime of the jobs during the interval. You can set the frequency on how many jobs would trigger the alert or set the threshold based on severity.

Setting Threshold for Delay

A delay means that something is preventing a scheduled background job from starting, impacting the job's overall performance. This is caused by several reasons (for example: that there are not enough dialog processes for the scheduler). All reasons can be captured in IT-Conductor and a threshold can be created to set metrics for alerts.

1. In the IT-Conductor dashboard, navigate to SAP System ID → Background Jobs → Delay Time.

Click on the title of the chart and then the Threshold Override icon to see the available job names.

2. You can also set the override threshold by clicking on the job name. This override gives the ability to track when the system is busy or when jobs are scheduled but not able to run on time.

Monitoring Job Performance in Application Server

Jobs run on particular servers. Checking the performance metrics on these servers gives you an idea of why the job is taking much more time than usual. This can also show you what resources are accumulated as this can impact other shared resources in the system.

Monitoring Overall Health

Service Health Monitoring provides all the information about the system's overall health. In the service grid, click on Health. This will show the monitoring components of the system.

Expanding the components will give visibility to the alert warning symbols indicating if there are specific alerts in the current state.

In this option, all the graphs and details can be seen simultaneously synchronizing with the time of the other performance criteria.

This is beneficial in doing analytics to figure out where the bottlenecks are in terms of workload.

Video

Last updated