Back to blog
Cron Jobs

5 Ways Your Cron Jobs Are Failing Silently

Matt6 min read

I Lost a Week of Backups Before I Learned This

TL;DR: Cron jobs fail silently when errors go unlogged, PATH breaks in the cron environment, required environment variables are missing, disk space fills up, or overlapping runs corrupt shared state. The fix is dead man's switch monitoring: your job pings a URL after every successful run, and the absence of that ping triggers an alert.

Last year I got paged because a customer needed a database restore. I SSH'd in, checked the backup directory, and found the last file was from 8 days ago. The crontab entry was still there. The script was still there. Nothing had changed -except the disk had filled up, and the script had been silently producing zero-byte files ever since.

That experience sent me down a rabbit hole. Turns out there are at least five ways cron jobs fail without making a sound, and I'd hit most of them at various points in my career without realizing they were systemic problems.

Why Does Cron Output Disappear?

Cron tries to email stdout/stderr to the local user. On basically every modern server, local mail isn't configured, so the output just vanishes. You redirect to a log file, logrotate deletes it, nobody reads it. The error happened, got recorded, got deleted.

# This output goes into the void on most servers
0 2 * * * /usr/local/bin/backup.sh

# Better -but only if someone actually reads the log
0 2 * * * /usr/local/bin/backup.sh >> /var/log/backup.log 2>&1

Even the "better" version only works if someone checks the log. And if the script exits 0 despite failing (which happens more than you'd think), the log might not even have anything useful.

Why Does PATH Break in Cron?

This one bit me early in my career. I tested a script in my shell, it worked fine, I added it to crontab, and... nothing. Cron runs with a minimal PATH -usually just /usr/bin:/bin. Commands like node, python3, docker, or aws aren't on that path.

# Works in your shell, silently fails in cron
0 3 * * * node /app/scripts/sync.js

# Use full paths
0 3 * * * /usr/local/bin/node /app/scripts/sync.js

# Or set PATH at the top of crontab
PATH=/usr/local/bin:/usr/bin:/bin
0 3 * * * node /app/scripts/sync.js

The really annoying part: the line "runs" -cron executes it -but the binary isn't found, so it exits immediately with an error code that nobody sees (see point #1).

Why Are Environment Variables Empty in Cron?

Your shell sources .bashrc, .profile, and whatever else you've set up. Cron doesn't. If your script relies on DATABASE_URL or AWS_ACCESS_KEY_ID, those variables are empty in cron.

# This connects to... nothing, because $DATABASE_URL is empty
#!/bin/bash
pg_dump $DATABASE_URL > /backups/db.sql

# Source your env explicitly
0 2 * * * . /home/deploy/.env && /usr/local/bin/backup.sh

I've seen this produce a valid-looking backup file that was actually a dump of the default local database (which was empty). Everything looked fine until restore day.

What Happens When the Disk Fills Up?

Disks fill up. When they do, writes fail partway through. Your backup script starts pg_dump, it writes half the data, disk runs out, and you end up with a truncated file. The pipe swallows the error code because gzip exits successfully.

# gzip succeeds even when pg_dump fails -exit code is 0
pg_dump mydb | gzip > /backups/db.sql.gz

# Fix: pipefail + file size check
#!/bin/bash
set -euo pipefail
pg_dump mydb | gzip > /tmp/db.sql.gz
SIZE=$(stat --format=%s /tmp/db.sql.gz)
if [ "$SIZE" -lt 1000 ]; then
  echo "Backup suspiciously small: $SIZE bytes" >&2
  exit 1
fi
mv /tmp/db.sql.gz /backups/db.sql.gz

What If the Previous Cron Run Is Still Going?

Cron doesn't care if the last run finished. A job that normally takes 3 minutes suddenly takes 45 because the database is under load. Cron fires the next run anyway. Now you've got two instances fighting over the same resources.

# Runs every 5 minutes, but sometimes takes 20
*/5 * * * * /usr/local/bin/process-queue.sh

# flock prevents overlapping -but silently skips
*/5 * * * * flock -n /tmp/process-queue.lock /usr/local/bin/process-queue.sh

flock prevents overlap, but now you've got a different problem: if the job is stuck, every subsequent run gets silently skipped. The job is effectively dead, and flockis just quietly covering for it.

How Do You Catch Silent Cron Failures?

I tried better logging, email alerts from cron, log monitoring -all of it was fragile in its own way. What actually worked was inverting the problem entirely.

Instead of trying to detect every possible failure, I added a singlecurl at the end of the script that only fires on success. An external service expects that ping. If it doesn't arrive, I get alerted. That's it.

#!/bin/bash
set -euo pipefail

# Your actual job
pg_dump mydb | gzip > /backups/db.sql.gz

# If we got here, it worked
curl -fsS --retry 3 https://deadping.io/api/ping/your-monitor-id

PATH wrong? Ping never fires. Disk full? Script exits before the curl. Process stuck? Ping never arrives. Every failure mode results in exactly one observable thing: a missed ping.

I built DeadPing because I got tired of duct-taping monitoring onto cron. You create a monitor, set the schedule, add one curl. If the ping doesn't show up on time, you get an email or Slack message. Grace periods keep it from being noisy. Check out the API reference or the Docker integration guide to get started.

Start monitoring in 60 seconds

Free forever for up to 20 monitors. No credit card required.

Get Started Free