Job history is an important topic of conversation in the job scheduling world. In recent articles, we’ve talked about the importance of having reliable and thorough audit history reports of your job schedule, which serves as an important security feature. When you know who is changing your job schedule and how they’re changing it—when you are armed with the details of every job that is run in your environment—you are not only prepared for an audit, but you can better optimize your job schedule based on your detailed knowledge of what’s working and what isn’t.
The Building Blocks of Job History
So what are the important components of job history? Jared breaks job history into the five basic facts—plus some extra credit features that make job history a critical part of your enterprise scheduling.
Your job history reports should let you know when a job has entered the job queue, when a job leaves the job queue, when it started running, and when it ended. Your job history should also report on the differences between the clock on your agent and the clock on your server; you need to be able to track both sides so that you have a record of which events are reacting to the server side and which events are reacting on the agents’ side.
What was the status of each job? The “what” of the job history report should explain how the job went: did it run to completion? Did it fail? Job history is so useful because you can see each of the commands that were executed as well as the values of any environmental variables that you set or any path that you set. Seeing the log files in their entirety and being able to see any output that was generated by your job scheduling program is also extremely important.
It is useful to know where your jobs are running, especially if you have load balancing groups or multiple agents from which you could run a single job. Because enterprise schedulers can run jobs on multiple servers across the enterprise, it’s helpful to know where a job was running in case it fails; you’ll need to know where to begin troubleshooting. Did your job run on one agent? Did you specify that the scheduler should choose the agent that’s least busy within a load balancing group and run the job there?
Did the job scheduler choose to run a job because it was the scheduled run time? Was a job reacting to an event that you had already previously defined? Or was the job kicked off by one of the users of your product? In this way, who is a sub-set of asking questions about why a job ran. If someone manually triggered a job, you will want to know which user ran the job. Understanding why certain jobs ran—in reaction to a previous event or because an employee realized the need for a job to run on a certain server—becomes really important if and when something were to go wrong. If a business-critical job were to be run two weeks early, you’d want to know whether it was an error in the command or if someone accidentally triggered this important job at an incorrect time.
Rounding Out Job History
There are a few extra features beyond the basics that make for a robust job history tool in your enterprise scheduler.
The restart or recover feature is built based on the understanding that job failures are going to happen—not because of the way jobs are designed or set up but because of unforeseen and occasionally inevitable outages. In the event of system downtime, the restart or recover feature allows you to kick off a job some number of commands from the start so that you do not need to repeat any processes that have already completed.
The more jobs you have in your job schedule, the more job history you’ll accumulate. An extra filtering feature allows you to look at smaller segments of the job history. You can filter your job history to see just a single job. You can also utilize tags that allow you to quickly categorize and view the job histories for different groupings of jobs within your schedule. This feature is great for reaching a diverse analysis and understanding of how different parts of your job schedule are performing.
Different jobs will require different sets of rules. If a job is high frequency—running multiple times a day, perhaps—you might want to keep less of that history. In turn, if there is a larger job that runs only once a month, you might want to keep the accumulation of those job history reports for three months in order to get an understanding of the success of that job.
Finally, having good access to your job history through Web services is an important “extra” feature of job history reporting with an enterprise job scheduler. This allows external applications to kick off jobs through inside your job scheduler with the ability to uniquely and accurately track the execution of those jobs. The reason why this is helpful is because the jobs will have unique identifiers on them: you’ll know you’re looking at the same instance of the job you kicked off.
Conclusion: Why Job History is Indispensable
Job history reports provide the backbone for making informed and effective decisions about the future of your enterprise job schedule. Armed with the detailed information of what jobs are happening when, where, why, and by whom across your enterprise, you’ll better understand the nuances of your scheduled production jobs and be better prepared to forecast any necessary or helpful changes to build a smarter, more flexible and successful schedule.