How to monitor cron jobs: a practical guide

Cron is fire-and-forget by design. It runs your job, throws away the output, and never tells you when something breaks. This guide explains why scheduled jobs fail silently, the one pattern that catches every failure mode, and how to wire it up for plain crontab, systemd timers, Symfony Scheduler, and Laravel.

Why a failed cron job is invisible

A crontab line like 0 2 * * * /usr/local/bin/backup.sh runs at 02:00 and exits. If the script crashes, the host is rebooting at 02:00, the disk is full, or someone commented the line out three deploys ago, nothing happens — no exception reaches you, no dashboard turns red. Cron mails output to the local user by default, but on a headless server that mail goes nowhere anyone reads.

The failure you actually care about is the job that didn't run. You can't catch the absence of an event from inside the job — there's no code to run. You need something watching from the outside that expects a check-in and raises the alarm when it doesn't arrive.

Four ways a scheduled job fails

Failure	What happened	Caught by
Didn't run	Crontab edited away, daemon down, host off, timer disabled.	A missed expected check-in.
Ran late	Queue backed up, a previous run still holding a lock.	A check-in after the grace window.
Ran but errored	Exit code ≠ 0 — exception, failed query, missing file.	An explicit failure signal from the job.
Ran too long	Started, then hung — network stall, runaway loop.	A start signal with no matching finish.

A bare "did it ping?" check only catches the first one. Catching all four means reporting the start, the success, and the failure of each run — see the recipe below.

The heartbeat pattern (dead man's switch)

The fix is a heartbeat: every run, your job makes one tiny HTTP request to a known URL to say "I'm alive and I finished." A monitoring service knows your schedule, so it knows when the next heartbeat is due. Miss it past a grace period and it pages you. Because the alert is driven by the absence of a ping, it fires even when your server is on fire and your code never executes — hence the older name, dead man's switch.

The mechanics Cronheart uses:

Create a monitor

Tell us the schedule (cron, interval, or a simple preset) and a grace window. You get back a ping URL with an opaque UUID — treat that UUID as a secret.

Ping from the job

Add one curl after your command. GET or POST, both work. Pings are idempotent within a 2-second window, so a --retry that stutters won't double-count.

Get paged on a miss

A scanner sweeps every 30 seconds. Past next_expected_at + grace with no ping, the monitor goes late and the alert fans out to your channels.

Monitor a plain cron job

The minimal version — chain a heartbeat onto a successful run with &&:

0 2 * * * /usr/local/bin/backup.sh && curl -fsS -m 10 --retry 5 https://cronheart.com/ping/<uuid>

That catches "didn't run" and "ran late". To also catch errors and hangs, report start up front and branch on the exit code so a failure pages you immediately instead of waiting for the next window:

#!/usr/bin/env bash
BASE=https://cronheart.com/ping/<uuid>
curl -fsS -m 10 --retry 5 "$BASE/start"
if /usr/local/bin/backup.sh > /tmp/backup.log 2>&1; then
    curl -fsS -m 10 --retry 5 --data-binary @<(tail -c 8000 /tmp/backup.log) "$BASE/success"
else
    curl -fsS -m 10 --retry 5 --data-binary @<(tail -c 8000 /tmp/backup.log) "$BASE/fail"
fi

A POST may carry up to 10 KB of captured output (stdout/stderr); anything longer is truncated, not rejected. That snippet rides along on the alert, so the page that wakes you already contains the stack trace.

Monitor a systemd timer

For a systemd timer, the cleanest place to ping is an ExecStartPost on the service unit — it runs only after the main command exits 0, so a crash never reports success. Cronheart's own production timers are wired exactly this way.

# /etc/systemd/system/backup.service
[Service]
Type=oneshot
ExecStart=/usr/local/bin/backup.sh
ExecStartPost=/usr/bin/curl -fsS -m 10 --retry 5 https://cronheart.com/ping/<uuid>

Size the monitor's grace window to clear the timer's own AccuracySec plus the job's typical runtime — otherwise a run that legitimately starts a few seconds late trips a false alert.

Monitor Symfony Scheduler tasks

If you run Symfony Scheduler, the cron-monitor/php-sdk package wires heartbeats in without touching each task by hand. Install it:

composer require cron-monitor/php-sdk

Register the bundle and map each console command to its monitor UUID. The SDK pings start/success/fail around the command for you:

# config/packages/cron_monitor.yaml
cron_monitor:
    commands:
        'app:reports:nightly': '<monitor-uuid>'

Prefer to keep the UUID next to the code? Put it on the command class with the #[Monitor] attribute — read from an env var in production:

use CronMonitor\Attribute\Monitor;
use Symfony\Component\Console\Attribute\AsCommand;

#[AsCommand(name: 'app:reports:nightly')]
#[Monitor(env: 'CRON_MONITOR_REPORTS_NIGHTLY_UUID')]
final class GenerateNightlyReportCommand extends Command
{
    // ...
}

Run php bin/console cron-monitor:sync to list every scheduled task and the YAML you'd add to map it. The SDK supports Symfony 6.4 and 7.x on PHP 8.2+; source is on GitHub.

Monitor Laravel scheduled tasks

On Laravel, the same package auto-discovers a service provider and adds a ->monitor() macro to the scheduler. Chain it onto any scheduled command with the monitor UUID:

use Illuminate\Support\Facades\Schedule;

Schedule::command('reports:nightly')
    ->dailyAt('02:00')
    ->monitor('<monitor-uuid>');

The macro hooks the scheduler's before, onSuccess, and onFailure callbacks, so a failed run reports fail rather than just going silent. Publish config/cron-monitor.php with php artisan vendor:publish --tag=cron-monitor-config to tune timeout and endpoint.

Monitor WordPress cron (WP-Cron)

WP-Cron is request-driven: WordPress only runs due events when a visitor hits the site. On a low-traffic site, no visit means no run — a scheduled backup or digest can stall for weeks while the site itself answers HTTPS perfectly, so an uptime monitor never notices. The Cronheart plugin turns WP-Cron into a dead-man's switch.

Install it from the WordPress plugin directory (search "Cronheart") or with Composer:

composer require cronheart/wp

It registers a 5-minute site heartbeat that proves WP-Cron is firing at all, and you can attach start / success / fail pings to any individual scheduled hook:

cronheart_monitor( 'my_nightly_report', '<monitor-uuid>' );

Keep the UUID out of the database and out of git by defining CRONHEART_HEARTBEAT_UUID in wp-config.php, or set it from Settings → Cronheart. The plugin also captures PHP fatal errors raised inside a scheduled callback and ships the summary in the failure ping. Source is on GitHub.

Any other language

There's no SDK requirement — the ping endpoint is plain HTTP, so anything that can make a request works. From PHP without a framework, the SDK ships a tiny client that never throws:

use CronMonitor\Client\CronMonitorClient;

$client = CronMonitorClient::create();
$uuid   = '<monitor-uuid>';

$client->start($uuid);
try {
    run_the_job();
    $client->success($uuid);
} catch (\Throwable $e) {
    $client->fail($uuid, $e->getMessage());
    throw $e;
}

For non-PHP stacks, a single curl (or the standalone vendor/bin/cron-monitor heartbeat <uuid> CLI) is all you need. See the full ping API reference for the start / success / fail actions and the outgoing webhook contract.

Best practices

Size the grace window to reality. Set it to the job's typical runtime plus a margin, not zero. Too tight and a normal slow run pages you; too loose and a real outage hides for an hour.
Ping on success, not at the top of the script. A heartbeat fired before the work runs reports "alive" even when the work then crashes. Send success after the job exits 0.
Report failures explicitly. Branch on the exit code and send fail so an errored run pages you in seconds instead of waiting for the next missed window.
Test the alert path before you depend on it. Fire a test alert from the dashboard so you know the email or Slack message actually lands — discovering a broken channel during a real incident is the worst time.
Treat the ping URL as a credential. The UUID is the only thing guarding the endpoint. Keep it out of public repos and CI logs; rotate it from the dashboard if it leaks.
Tune for alert fatigue. Route noisy low-stakes jobs to a different channel than the ones that should wake someone, and let the anti-flap dedupe stop a recovering job from paging you twice.

Evaluating tools? See how Cronheart compares to Healthchecks.io, Cronitor, and Dead Man's Snitch.

Start watching your first job

Free for 20 monitors, no credit card. One curl and you're covered.

Create a free account Read the docs