Add media retention cleanup command
All checks were successful
buildbot/nix-eval Build done.
buildbot/nix-build Build done.
buildbot/nix-effects Build done.

This commit is contained in:
Abel Luck 2026-05-27 13:04:47 +02:00
parent 3b6503a6ed
commit 507074b80e
10 changed files with 722 additions and 52 deletions

View file

@ -72,6 +72,27 @@ Operational notes:
Reordering `REPUBLISHER_IMAGE` changes canonical feed image URLs.
- Job logs and stats artifacts are written under `out/logs/`.
Media cleanup:
- Published media can outlive the current feed when articles fall out of the
feed window. Use `cleanup-media` to delete old media files that are no longer
referenced by the latest published `feed.rss`.
- The default retention window is 25 days. Run a dry run first:
```sh
uv run repub cleanup-media --feeds-dir out/feeds --days 25 --dry-run
```
- Remove `--dry-run` to delete matching files. The command protects media
referenced by the latest published feed and uses a lock to avoid racing with
active crawls.
- For config-driven deployments, pass the runtime config so cleanup uses the
configured `out_dir` and media directory names:
```sh
uv run repub cleanup-media --config repub.toml --dry-run
```
The legacy one-shot config-driven crawler is still available:
```sh