From the Keyboard of Zachary Wagner

Just a few things on my mind.

If you've been using Home Assistant long enough, you've been woken up at 2am by a notification that shouldn't have fired. A sensor came back online, a service restarted, HA rebooted — and suddenly your phone is buzzing about a “power outage” that never happened.

The root cause is almost always the same: automations that don't account for the full lifecycle of entity states.

This post covers a hardening system I've developed to eliminate those false triggers — including from_state/to_state guards, the availability template pattern, and how to bake these protections into blueprints so every automation gets them for free.


The Problem: Entities Don't Just Toggle

Most automations are written assuming an entity transitions cleanly between on and off. In reality, entities go through a much messier lifecycle:

unavailable → on → off → unavailable → unknown → on

This happens constantly:

  • HA restarts and entities briefly report unavailable
  • A network blip drops a Zigbee device
  • An MQTT service restarts and retained values are republished
  • A template sensor recalculates when its source entity comes back online

Each of these transitions can look like a real state change to an automation. Without guards, every one of them can fire your notification, trigger your lights, or run your scripts.


The Two-Guard Pattern

The most important hardening technique is guarding both the from_state and to_state of every trigger:

conditions:
  - condition: template
    value_template: >
      {{ trigger.from_state.state not in ['', 'unavailable', 'unknown'] }}
  - condition: template
    value_template: >
      {{ trigger.to_state.state not in ['', 'unavailable', 'unknown'] }}

Most people only guard to_state — checking that the new state isn't unavailable. But the from_state guard is equally important. Without it, the transition unavailable → on passes right through, which is exactly what happens when a sensor comes back online after a restart.

Both states must be real values for the automation to proceed.


The Availability Template Pattern

For template sensors, the old pattern for handling unavailable sources looked like this:

# Old pattern — prone to issues
state: >
  {% set s = states('sensor.some_entity') %}
  {% if s in ['unavailable', 'unknown', 'none'] %}
    {{ None }}
  {% else %}
    {{ s == 'on' }}
  {% endif %}

The problem is that returning None from a state template just sets the state to the string "None" — not actually unavailable. So you still get state transitions that can trigger automations.

The correct approach is to use the availability template, which is specifically designed for this:

state: >
  {{ states('sensor.some_entity') == 'on' }}
availability: >
  {{ states('sensor.some_entity') not in ['unavailable', 'unknown'] }}

When availability returns false, the sensor itself becomes unavailable in HA. This means the transition on restart becomes unavailable → on instead of off → on — and the from_state guard catches it.

For sensors with multiple dependencies, use the list pattern:

availability: >
  {% set entities = [
      'binary_sensor.some_sensor',
      'input_boolean.some_boolean',
      'switch.some_switch',
  ] %}
  {{ entities | select('is_state', 'unavailable') | list | count == 0
     and entities | select('is_state', 'unknown') | list | count == 0 }}

Baking Guards Into Blueprints

Writing these conditions into every automation manually is error-prone and inconsistent. The better approach is to encode them into blueprints so every automation instance gets them automatically.

Here's a hardened binary sensor blueprint:

blueprint:
  name: When Binary Sensor is Toggled
  description: |
    Triggers user-specified actions when a binary sensor changes to on and off.
    Ignores unavailable/unknown states and attribute-only changes.
  domain: automation
  input:
    binary_sensor:
      name: Binary Sensor
      selector:
        entity:
          domain: binary_sensor
    on_action:
      name: On Action
      default: []
      selector:
        action: {}
    off_action:
      name: Off Action
      default: []
      selector:
        action: {}

trigger:
  - platform: state
    entity_id: !input binary_sensor

condition:
  - condition: template
    value_template: >
      {{ trigger.from_state.state not in ['', 'unavailable', 'unknown'] }}
  - condition: template
    value_template: >
      {{ trigger.to_state.state not in ['', 'unavailable', 'unknown'] }}

action:
  - choose:
      - conditions:
          - condition: state
            entity_id: !input binary_sensor
            state: "on"
        sequence: !input on_action
      - conditions:
          - condition: state
            entity_id: !input binary_sensor
            state: "off"
        sequence: !input off_action
    default: []
mode: single

Every automation using this blueprint is hardened by default. You can create similar blueprints for input booleans, media players, outlets — anything you trigger off regularly.

For even simpler cases, a generic entity state change blueprint covers everything:

blueprint:
  name: When Entity State Changes
  description: |
    Triggers user-specified actions when any entity's state changes.
    Ignores unavailable/unknown states and attribute-only changes.
  domain: automation
  input:
    entity:
      name: Entity
      selector:
        entity: {}
    actions:
      name: Actions
      default: []
      selector:
        action: {}

trigger:
  - platform: state
    entity_id: !input entity

condition:
  - condition: template
    value_template: >
      {{ trigger.from_state.state not in ['', 'unavailable', 'unknown'] }}
  - condition: template
    value_template: >
      {{ trigger.to_state.state not in ['', 'unavailable', 'unknown'] }}

action: !input actions
mode: single

Automations using this blueprint become pure configuration:

alias: When There is a New Channels DVR Recording
use_blueprint:
  path: zackwag/entity_state_change.yaml
  input:
    entity: sensor.channels_dvr_latest_recording
    actions:
      - action: script.channels_dvr_new_recording_handler
        metadata: {}

When NOT to Use These Guards

Not every automation should use these guards. The pattern is deliberately designed to ignore unavailable — but sometimes you want to know when something goes unavailable.

A backup monitoring automation is a good example:

triggers:
  - trigger: state
    entity_id: sensor.container_backup_status
    to: failed
  - trigger: state
    entity_id: sensor.container_backup_status
    to: unavailable
  - trigger: state
    entity_id: sensor.container_backup_status
    to: unknown

If your backup service goes dark, that's exactly the alert you want. Adding the unavailability guards here would defeat the purpose. Use your judgment — the guards are the right default, but they're not universal.


The Full Defense Stack

For critical alerts like UPS power monitoring, you can combine all of these techniques into a robust defense:

  1. availability on template sensors — prevents unavailable from masquerading as off
  2. from_state guard — blocks transitions out of unavailable
  3. to_state guard — blocks transitions into unavailable
  4. Connectivity condition — suppresses alerts when the monitoring service itself is down
  5. Persistent last values — prevents MQTT services from republishing retained values on restart

Each layer handles a different failure mode. The result is an alerting system you can actually trust — when your phone buzzes, something real happened.


Summary

The core principles:

  • Always guard both from_state and to_state — not just to_state
  • Use availability templates instead of returning None from state templates
  • Encode guards into blueprints so they're applied consistently
  • Know when to break the rules — some automations should fire on unavailable

Once you've applied this system across your automations, the 2am false alerts stop. And when a real alert fires, you'll actually pay attention to it.

We've all been there. You're wiring up a new Docker stack, things are finally working, and you commit and push before realizing your password is sitting right there in plain text in your compose.yaml. In my case it was MLB credentials for mlbserver. Oops.

Here's how I cleaned it up and what I'm doing going forward.

What Happened

I committed a compose.yaml with credentials hardcoded directly in the environment block:

environment:
  - account_username=zackwag@gmail.com
  - account_password=hunter2

Pushed it to a public GitHub repo. Caught it quickly, reset the password, but the damage was done — the secret was in the git history even after I deleted the file.

Removing It From History

My first instinct was BFG Repo Cleaner, but BFG matches on filename only — not path. Since I have multiple compose.yaml files across my stacks, that was a non-starter.

git filter-repo supports path filtering, which is exactly what I needed:

cd /tmp
git clone https://github.com/zackwag/docker.git
cd docker
git filter-repo --path opt/stacks/channels-addons/compose.yaml --invert-paths
git remote add origin https://github.com/zackwag/docker.git
git push --force origin main

Worth noting: git filter-repo refuses to run on a non-fresh clone by default. Clone fresh, run it there, force push. Don't fight it.

The Right Pattern Going Forward

The fix is straightforward — .env files. Keep secrets out of the compose file entirely and reference them as variables.

compose.yaml

environment:
  - account_username=${MLB_USERNAME}
  - account_password=${MLB_PASSWORD}

.env (never committed)

MLB_USERNAME=zackwag@gmail.com
MLB_PASSWORD=your_password_here

.env.example (committed as a template)

MLB_USERNAME=
MLB_PASSWORD=

.gitignore

.env

Docker Compose picks up .env automatically from the same directory as your compose.yaml. No extra configuration needed.

Not Everything Needs to Be a Secret

Worth calling out — not everything in your compose file needs to move to .env. In my Caddy stack I have things like DOMAIN, EMAIL, upstream IPs, and internal TLDs. None of that is sensitive. The rule of thumb:

  • Secrets.env (passwords, tokens, API keys)
  • Config → fine in compose.yaml (domains, IPs, emails, paths)

Bonus: Nuking Your Git History Entirely

Since I'd already made a mess of the history, I took the opportunity to squash everything down to a single clean commit:

git checkout --orphan fresh
git add -A
git commit -m "Initial commit"
git branch -D main
git branch -m main
git push --force origin main

Clean slate. Felt good.

Takeaways

  • Add .gitignore and .env.example before you write your first compose.yaml
  • If you do commit a secret, reset it immediately — history cleanup is hygiene, not the fix
  • git filter-repo is the right tool for surgical history rewrites
  • Public repo bots are fast. Assume any exposed secret was seen.

Back in February, I wrote about how I finally gave my home lab a real backup strategy using a containerized Flask server, rclone, and OneDrive. The solution worked well — but it only worked for a single host. If you run containers across multiple machines, you were on your own.

That changes with v2.0.

What Was Missing

The original setup was straightforward: one container, one host, one containers.json, and a cron job to kick things off. It solved the problem I had at the time.

But home labs grow. As I added more hosts, I found myself duplicating the setup and having no single place to check on the health of all my backups. That itch needed scratching.

Introducing the Hub/Spoke Architecture

v2.0 introduces a hub/spoke model for multi-host backup orchestration. Every instance of flask-container-backup is now a spoke by default — it behaves exactly as it did in v1.0. Nothing breaks.

The new piece is the hub. Set MODE=hub in your environment, point it at a spokes.json config file listing your remote spoke agents, and you now have a central orchestrator that can coordinate backups and aggregate status across your entire fleet.

environment:
  - MODE=hub
[
  { "name": "host-a", "url": "http://192.168.1.10:2128" },
  { "name": "host-b", "url": "http://192.168.1.11:2128" }
]

A couple of guardrails worth noting: if you set MODE=hub but spokes.json is missing or empty, the container will exit at startup with a fatal error. No silent failures.

New: The /status Endpoint

Every spoke now exposes a GET /status endpoint that returns the result of the last backup run — including a timestamp, which containers were backed up, and any errors encountered. Results are also written to backup_result.json after each run, so they survive a container restart.

{
  "timestamp": "2026-04-08T13:00:00",
  "containers_backed_up": ["caddy", "freshrss", "homeassistant"],
  "errors": []
}

The hub aggregates this across all configured spokes when you hit its own /status endpoint, giving you a unified view of backup health across every host in one call.

Config Consolidation

All config files — containers.json, spokes.json, and the new backup_result.json — now live in /app/config, which maps to a single mounted volume. Cleaner, and easier to manage.

Upgrading

If you're already running v1.0, upgrading is non-breaking. Pull the new image, update your compose file to mount /app/config, and you're done. The MODE environment variable defaults to spoke, so existing single-host setups continue to work exactly as before.

services:
  container-backup:
    image: zackwag/flask-container-backup:latest
    container_name: container-backup
    restart: unless-stopped
    ports:
      - 2128:2128
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - /docker/container-backup/config:/app/config
      - /docker:/source
      - onedrive-backup:/destination
    environment:
      - PYTHONUNBUFFERED=1
      - TZ=America/New_York

The image is available on Docker Hub at zackwag/flask-container-backup, and the source is on GitHub.

I run several containers in my home lab and the missing piece to my puzzle has been backups. That is to say I have no backup strategy. In my spare time, I've been working out how to backup the container storage and I feel pretty satisfied with my current solution.

Prerequisites

I keep all my container persistent data in a folder on my host machine in the format of /docker/{service-name} so /docker/caddy for instance. Therefore, only this folder should be backed up as it's the hardest to recreate.

Some of my containers use SQLite, some start up a separate container for a database (MySQL). Copying the persistent data while the container is running could cause corruption. So, stopping and starting the container is necessary.

Finally, I want maximum flexibility so I want to be able to just copy the data in the filesystem.

Solution

In order to backup persistent data in a simple way, I would need to map the destination in the filesystem (rclone), have a simple server that can take requests and perform business logic (flask), have the whole thing be ephemeral (Docker) and finally be able to call a backup at any time (REST).

rclone

Rclone is an application that allows you to mount cloud storage as a logical drive and perform I/O operations against it. Since I am a Microsoft 365 subscriber, I chose to use OneDrive.

Flask Server

I wanted to have a container that would perform actions based on either RESTful command or cronjobs. So I created flask-cron-server which will spin up a Flask server at port 2128.

From there I was able to create server.py. This is the file that runs the Flask server and is executed on container startup:

It reads in a JSON file called /app/config/containers.json that defines all the containers along with the folder that should be archived and where that archive should be stored.

The call to backup is executed in a separate thread and a 202 Accepted response is sent to the caller to let them know that the command was received, but it is unknown how long it will take.

Finally, the whole thing is driven by a simple JSON

[
    {
        "container_name": "caddy",
        "source_folder": "/source/caddy",
        "destination_folder": "/destination/caddy",
        "retention_days": 7
    },
    {
        "container_name": "freshrss",
        "source_folder": "/source/freshrss",
        "destination_folder": "/destination/freshrss",
        "retention_days": 7
    },
    {
        "container_name": "guacamole",
        "source_folder": "/source/guacamole",
        "destination_folder": "/destination/guacamole",
        "retention_days": 7
    },
    {
        "container_name": "mosquitto",
        "source_folder": "/source/mosquitto",
        "destination_folder": "/destination/mosquitto",
        "retention_days": 7
    },
    {
        "container_name": "ps5-mqtt",
        "source_folder": "/source/ps5-mqtt",
        "destination_folder": "/destination/ps5-mqtt",
        "retention_days": 7
    },
    {
        "container_name": "slash",
        "source_folder": "/source/slash",
        "destination_folder": "/destination/slash",
        "retention_days": 7
    },
    {
        "container_name": "write-freely",
        "source_folder": "/source/writefreely",
        "destination_folder": "/destination/writefreely",
        "retention_days": 7
    },
    {
        "container_name": "homeassistant",
        "source_folder": "/source/ha",
        "destination_folder": "/destination/ha",
        "retention_days": 14
    }
]

Docker Container

The final step was to create the Docker container and stack

services:
  container-backup:
    image: zackwag/flask-container-backup:latest
    container_name: container-backup
    restart: unless-stopped
    ports:
      - 2128:2128
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - /docker/container-backup/config:/app/config
      - /docker:/source
      - onedrive-backup:/destination
    environment:
      - PYTHONUNBUFFERED=1
      - TZ=America/New_York
volumes:
  onedrive-backup:
    driver: rclone
    driver_opts:
      remote: onedrive:backup
      allow_other: "true"
      vfs-cache-mode: writes
networks: {}

I'm in the timezone of New York, so you will need to change it to where you live.

Also, I made sure to include /var/run/docker.sock:/var/run/docker.sock:ro so that I could start and stop containers.

Finally, I followed the directions for Docker Volume Plugin. This allowed me to create the volume onedrive-backup that points to the /backup folder in OneDrive.

RESTfully Performing Backups

Now that the container is running I can simply call

curl -X POST [IP ADDRESS]:2128/backup

to backup all the containers specified in containers.json or just

curl -X POST {IP ADDRESS}:2128/backup/{container name}

To backup a specific container specified in containers.json.

I have setup automations that daily in Home Assistant, that call the main /backup endpoint to kick off backups.