add dev-mode

add empty table placeholders
humanize sizes
2026-03-30 15:36:12 +02:00 · 2026-03-30 15:28:56 +02:00 · 2026-03-30 15:25:28 +02:00 · 2026-03-30 15:25:10 +02:00 · 2026-03-30 15:23:34 +02:00 · 2026-03-30 15:21:39 +02:00
35 changed files with 7536 additions and 85 deletions
--- a/.gitignore
+++ b/.gitignore
@ -12,3 +12,6 @@ data
 logs
 archive
 *egg-info
 *.db
 *.db-shm
 *.db-wal
--- a/AGENTS.md
+++ b/AGENTS.md
@ -1,5 +1,7 @@
 # republisher-redux
 See @README.md
 ## Overview
 - `republisher-redux` is a Scrapy-based tool that mirrors RSS and Atom feeds for offline use.
@ -8,6 +10,71 @@
 - Nix development and packaging use `flake.nix`.
 - Formatting is managed through `treefmt-nix`, exposed via `nix fmt`.
 - Prefer immutable style functional programming style
    - functions that operate on data over classes that encapsulate state
 - No backwards-compatibility guarantees; prefer breaking changes over backwards compat and complexity.
 - Think carefully and implement the most concise solution that changes as little code as possible.
 ## HTML/Datastar Rules
 Very important rules for datastar usage.
 The views are pure functions data in -> html out.
 - we only use full page morph mode. no diffing
    Why large/fat/main morphs (aka immediate mode)?
    By only using data: mode morph and always targeting the main element of the document the API can be massively simplified. This avoids having the explosion of endpoints you get with HTMX and makes reasoning about your app much simpler.
 - we only have a single render function per page
    By having a single render function per page you can simplify the reasoning about your app to view = f(state). You can then reason about your pushed updates as a continuous signal rather than discrete event stream. The benefit of this is you don't have to handle missed events, disconnects and reconnects. When the state changes on the server you push down the latest view, not the delta between views. On the client idiomorph can translate that into fine grained dom updates.
 - any database change -> re render all connected users with 200ms throttle
    When your events are not homogeneous, you can't miss events, so you cannot throttle your events without losing data.
    But, wait! Won't that mean every change will cause all users to re-render? Yes, but at a maximum rate determined by the throttle. This, might sound scary at first but in practice:
        The more shared views the users have the more likely most of the connected users will have to re-render when a change happen.
        The more events that are happening the more likely most users will have to re-render.
    This means you actually end up doing more work with a non homogeneous event system under heavy load than with this simple homogeneous event system that's throttled (especially it there's any sort of common/shared view between users).
 - Signals are only for ephemeral client side state
    Signals should only be used for ephemeral client side state. Things like: the current value of a text input, whether a popover is visible, current csrf token, input validation errors. Signals can be controlled on the client via expressions, or from the backend via patch-signals.
 - Signals in elements should be declared __ifmissing
    Because signals are only being used to represent ephemeral client state that means they can only be initialised by elements and they can only be changed via expressions on the client or from the server via patch-signals in an action. Signals in elements should be declared __ifmissing unless they are "view only".
 - View only signals, are signals that can only be changed by the server. These should not be declared __ifmissing instead they should be made "local" by starting their key with an _ this prevents the client from sending them up to the server.
 - Actions should not update the view themselves directly
    Actions should not update the view via patch elements. This is because the changes they make would get overwritten on the next render-fn that pushes a new view down the updates SSE connection. However, they can still be used to update signals as those won't be changed by elements patch. This allows you to do things like validation on the server.
 - Stateless views
 The only way for actions to affect the view returned by the render-fn running in a connection is via the database. The ensures CQRS. This means there is no connection state that needs to be persisted or maintained (so missed events and shutdowns/deploys will not lead to lost state). Even when you are running in a single process there is no way for an action (command) to communicate with/affect a view render (query) without going through the database.
 - CQRS
    Actions modify the database and return a 204 or a 200 if they patch-signals.
    Render functions re-render when the database changes and send an update down the updates SSE connection.
 - Work sharing (caching)
    Work sharing is the term I'm using for sharing renders between connected users. This can be useful when a lot of connected users share the same view. For example a leader board, game board, presence indicator etc. It ensures the work (eg: query and html generation) for that view is only done once regardless of the number of connected users. The simplest way to do this is to recalculate and cache values after after a batch has been run.
 - Use data-on:pointerdown/mousedown over data-on:click
    This is a small one but can make even the slowest of networks feel much snappier.
 - No CORS By hosting all assets on the same origin we avoid the need for CORS. This avoids additional server round trips and helps reduce latency.
 - Rendering an initial shim -Rather than returning the whole page on initial render and having two render paths, one for initial render and one for subsequent rendering a shell is rendered and then populated when the page connects to the updates endpoint for that page. This has a few advantages:
    The page will only render dynamic content if the user has javascript and first party cookies enabled.
    The initial shell page can generated and compressed once.
    The server only does more work for actual users and less work for link preview crawlers and other bots (that don't support javascript or cookies).
 ## Workflow
 - Use Python 3.13.
@ -44,3 +111,7 @@ uv run repub crawl -c repub.toml
 - The console entrypoint is `repub`.
 - Runtime ffmpeg availability is provided by the flake package and devshell.
 - Tests live under `tests/`.
 - `prompts/` is git ignored intentionally
 - Never search the web for this repo. If an external resource, document, or reference is needed, stop and ask the user to provide it.
 - Treat the repo-root `republisher.db` as user-owned local state. Do not delete or reset it as part of routine testing or verification.
 - For automated tests or isolated verification, use a separate database path via `REPUBLISHER_DB_PATH` instead of mutating or removing the repo-root database.
--- a/README.md
+++ b/README.md
@ -4,60 +4,65 @@ The AnyNews Republisher is a tool for mirroring news content to alternative dist
 The organization with the original news content is the "publisher".
-The AnyNews Republisher can be configured with various publisher news sources. Then on an interval the Republisher crawls the sources, mirrors the content (text and media) offline into an RSS feed.
+The AnyNews Republisher is managed through a local web UI. Sources, schedules, and job executions are stored in SQLite. On an interval the Republisher crawls the configured sources and mirrors the content (text and media) offline into an RSS feed.
 The [AnyNews app][app] can then be configured to use this mirror (or more than one such mirror).
 The Republisher currently accepts the following source input types:
- RSS Feeds
+- RSS and Atom feeds
 - Pangea sources via `pygea`
 [app]: https://gitlab.com/guardianproject/anynews/anynews-web-client
 ## Usage
 Sync dependencies and start the admin UI:
-``` shell
+```sh
 nix develop
 uv sync --all-groups
-cat > repub.toml <<'EOF'
+uv run repub
 out_dir = "out"
 [[feeds]]
 name = "Guardian Project Podcast"
 slug = "gp-pod"
 url = "https://guardianproject.info/podcast/podcast.xml"
 [[feeds]]
 name = "NASA Breaking News"
 slug = "nasa"
 url = "https://www.nasa.gov/rss/dyn/breaking_news.rss"
 EOF
 uv run repub --config repub.toml
 ```
-`out_dir` may be relative or absolute. Relative paths are resolved against the
+With no arguments, `uv run repub` starts the web UI in local dev mode and serves published feed files from `/feeds/...` out of `out/feeds/...`.
 directory containing the config file. Each feed now needs a user-provided
 `slug`, which is used for output paths and filenames. Optional Scrapy runtime
 overrides can be set in the same file:
-```toml
+By default the UI listens on `127.0.0.1:8080`. You can override that with `REPUBLISHER_HOST` and `REPUBLISHER_PORT`, or with:
-[scrapy.settings]
+
-LOG_LEVEL = "DEBUG"
+```sh
-DOWNLOAD_TIMEOUT = 30
+uv run repub serve --host 0.0.0.0 --port 8080
 ```
-Additional feed definitions can also be imported from one or more TOML files,
+If you invoke the `serve` subcommand explicitly, use `--dev-mode` to expose published feeds directly from the Quart app:
 including a `pygea`-generated `manifest.toml`:
-```toml
+```sh
-feed_config_files = ["/absolute/path/to/pygea/feed/manifest.toml"]
+uv run repub serve --dev-mode
 ```
-Imported files only need `[[feeds]]` entries with `name`, `slug`, and `url`.
+In `--dev-mode`, requests under `/feeds/...` are served from `out/feeds/...`.
-See [`demo/README.md`](/home/abel/src/guardianproject/anynews/republisher-redux/demo/README.md) for a self-contained example config.
+Important: the admin UI has no built-in authentication. Keep it bound to localhost or put it behind a trusted network layer such as Tailscale.
-## TODO
+Once the UI is running:
 1. Open `http://127.0.0.1:8080/`.
 2. Create a source. Feed sources take a feed URL. Pangea sources take a domain plus category configuration.
 3. Configure the job schedule and any spider arguments.
 4. Use `Run now` to trigger an immediate crawl, or leave the job enabled for scheduled runs.
 5. Watch running jobs and logs live from the Runs pages.
 Operational notes:
 - The default database path is `republisher.db`. Set `REPUBLISHER_DB_PATH` to use a different SQLite file.
 - Mirrored feeds are written under `out/feeds/<slug>/`.
 - Job logs and stats artifacts are written under `out/logs/`.
 The legacy one-shot config-driven crawler is still available:
 ```sh
 uv run repub crawl -c repub.toml
 ```
 ## Roadmap
 - [x] Offlines RSS feed xml
 - [x] Downloads media and enclosures
@ -68,9 +73,8 @@ See [`demo/README.md`](/home/abel/src/guardianproject/anynews/republisher-redux/
 - [ ] Image compression - Do we want this? -> DEFERED for now
 - [x] Download and rewrite media embedded in content/CDATA fields
 - [x] Config file to drive the program
- [ ] Add sqlite database and simple admin UI to replace config
+- [x] Add sqlite database and simple admin UI to replace config
- [ ] Integrate pygea as input source
+- [x] Integrate pygea as input source
 - [ ] Daemonize the program
 - [ ] Operationalize with metrics and error reporting
 ## License
--- a/flake.nix
+++ b/flake.nix
@ -239,7 +239,10 @@
            inherit src;
            dontConfigure = true;
            dontBuild = true;
-            nativeBuildInputs = [ testVenv ];
+            nativeBuildInputs = [
              pkgs.pyright
              testVenv
            ];
            checkPhase = ''
              runHook preCheck
              pyright
--- a/pyproject.toml
+++ b/pyproject.toml
@ -19,6 +19,7 @@ dependencies = [
  "aiosqlite>=0.21.0,<0.22.0",
  "datastar-py>=0.8.0,<0.9.0",
  "greenlet>=3.2.4,<4.0.0",
  "htpy>=25.12.0,<26.0.0",
  "peewee>=3.19.0,<4.0.0",
  "pygea @ git+https://guardianproject.dev/anynews/pygea.git",
 ]
@ -49,6 +50,9 @@ include-package-data = true
 where = ["."]
 include = ["repub*"]
 [tool.setuptools.package-data]
 repub = ["sql/*.sql"]
 [tool.pytest.ini_options]
 testpaths = ["tests"]
@ -65,6 +69,14 @@ max-line-length = "88"
 [tool.pyright]
 include = ["repub", "tests"]
 exclude = [
  "repub/crawl.py",
  "repub/exporters.py",
  "repub/media.py",
  "repub/rss.py",
  "repub/spiders",
  "repub/srcset.py",
 ]
 pythonVersion = "3.13"
 typeCheckingMode = "basic"
 reportMissingImports = false
--- a/repub/components.py
+++ b/repub/components.py
@ -0,0 +1,412 @@
 from __future__ import annotations
 import htpy as h
 from htpy import Node, Renderable
 def base_layout(*, page_title: str, stylesheet_href: str, content: Node) -> Renderable:
    return h.html(lang="en", class_="h-full bg-slate-100")[
        h.head[
            h.meta(charset="utf-8"),
            h.meta(name="viewport", content="width=device-width, initial-scale=1"),
            h.title[page_title],
            h.link(rel="stylesheet", href=stylesheet_href),
        ],
        h.body(
            class_="h-full bg-linear-to-br from-stone-100 via-amber-50 to-orange-100 text-slate-900"
        )[content],
    ]
 def nav_link(
    *, label: str, href: str, active: bool = False, badge: str | None = None
 ) -> Renderable:
    link_class = (
        "group flex items-center justify-between rounded-xl px-3 py-2 text-sm font-medium transition "
        + (
            "bg-white text-slate-950 shadow-sm ring-1 ring-white/10"
            if active
            else "text-slate-300 hover:bg-white/5 hover:text-white"
        )
    )
    badge_class = "rounded-full px-2 py-0.5 text-[11px] font-semibold " + (
        "bg-amber-200 text-amber-950" if active else "bg-slate-800 text-slate-300"
    )
    return h.a(href=href, class_=link_class)[
        h.span[label],
        badge and h.span(class_=badge_class)[badge],
    ]
 def admin_sidebar(*, current_path: str) -> Renderable:
    return h.aside(
        class_="relative overflow-hidden bg-slate-950 px-6 py-8 text-white lg:min-h-screen"
    )[
        h.div(
            class_="absolute inset-x-0 top-0 h-40 bg-radial from-amber-400/25 via-amber-400/10 to-transparent"
        ),
        h.div(class_="relative flex h-full flex-col")[
            h.div(class_="flex items-center gap-3")[
                h.div(
                    class_="flex size-11 items-center justify-center rounded-2xl bg-amber-400 text-base font-black text-slate-950"
                )["AR"],
                h.div[
                    h.p(
                        class_="text-xs font-semibold uppercase tracking-[0.24em] text-amber-300"
                    )["Republisher"],
                ],
            ],
            h.nav(class_="mt-10 space-y-2")[
                nav_link(
                    label="Dashboard",
                    href="/",
                    active=current_path == "/",
                    badge="Live",
                ),
                nav_link(
                    label="Sources",
                    href="/sources",
                    active=current_path.startswith("/sources"),
                    badge="12",
                ),
                nav_link(
                    label="Runs",
                    href="/runs",
                    active=current_path.startswith("/runs")
                    or current_path.startswith("/job/"),
                    badge="3",
                ),
            ],
            h.div(class_="mt-auto rounded-3xl bg-white/5 p-5 ring-1 ring-white/10")[
                h.p(class_="text-sm font-semibold text-white")[
                    "AnyNews Republisher v2.0"
                ],
                h.p(class_="mt-4 text-xs uppercase tracking-[0.22em] text-slate-400")[
                    "by Guardian Project"
                ],
            ],
        ],
    ]
 def header_action_link(*, href: str, label: str) -> Renderable:
    return h.a(
        href=href,
        class_="inline-flex items-center rounded-full bg-amber-400 px-4 py-2.5 text-sm font-semibold text-slate-950 shadow-sm transition hover:bg-amber-300",
    )[label]
 def header_secondary_link(*, href: str, label: str) -> Renderable:
    return h.a(
        href=href,
        class_="inline-flex items-center rounded-full border border-white/15 bg-white/5 px-4 py-2.5 text-sm font-semibold text-white transition hover:bg-white/10",
    )[label]
 def muted_action_link(*, href: str, label: str) -> Renderable:
    return h.a(
        href=href,
        class_="inline-flex items-center rounded-full border border-slate-200 bg-white px-3.5 py-2 text-sm font-semibold text-slate-700 shadow-sm transition hover:bg-slate-50",
    )[label]
 def inline_link(*, href: str, label: str, tone: str = "default") -> Renderable:
    classes = {
        "default": "text-slate-700 hover:text-slate-950",
        "amber": "text-amber-700 hover:text-amber-800",
        "rose": "text-rose-700 hover:text-rose-800",
    }
    return h.a(
        href=href,
        class_=f"inline-flex items-center whitespace-nowrap text-sm font-semibold {classes[tone]}",
    )[label]
 def inline_button(
    *, label: str, tone: str = "default", disabled: bool = False
 ) -> Renderable:
    classes = {
        "default": "bg-stone-100 text-slate-700 hover:bg-stone-200",
        "danger": "bg-rose-50 text-rose-700 hover:bg-rose-100",
        "success": "bg-emerald-100 text-emerald-800 hover:bg-emerald-200",
    }
    class_name = (
        "cursor-not-allowed bg-slate-100 text-slate-400" if disabled else classes[tone]
    )
    return h.button(
        type="button",
        disabled=disabled,
        class_=f"inline-flex items-center whitespace-nowrap rounded-full px-3 py-1.5 text-sm font-semibold transition {class_name}",
    )[label]
 def page_shell(
    *,
    current_path: str,
    eyebrow: str,
    title: str,
    description: str | None = None,
    actions: Node | None = None,
    content: Node,
 ) -> Renderable:
    return h.main(
        id="morph",
        class_="min-h-screen lg:grid lg:grid-cols-[18rem_minmax(0,1fr)]",
    )[
        admin_sidebar(current_path=current_path),
        h.div(class_="px-4 py-4 sm:px-5 lg:px-6 lg:py-5")[
            h.div(class_="mx-auto max-w-7xl space-y-5")[
                h.section[
                    h.div(
                        class_="flex flex-col gap-4 sm:flex-row sm:items-start sm:justify-between"
                    )[
                        h.div(class_="max-w-3xl")[
                            h.h1(
                                class_="text-3xl font-semibold tracking-tight text-slate-950"
                            )[title],
                            (
                                description
                                and h.p(class_="mt-1 text-sm text-slate-600")[
                                    description
                                ]
                            ),
                        ],
                        actions and h.div(class_="flex flex-wrap gap-2")[actions],
                    ]
                ],
                content,
            ]
        ],
    ]
 def section_card(*, content: Node) -> Renderable:
    return h.section(class_="space-y-4")[content]
 def table_section(
    *,
    eyebrow: str | None = None,
    title: str,
    subtitle: str | None = None,
    empty_message: str,
    headers: tuple[str, ...],
    rows: tuple[tuple[Node, ...], ...],
    actions: Node | None = None,
 ) -> Renderable:
    def render_row(row: tuple[Node, ...]) -> Renderable:
        first_cell, *other_cells = row
        return h.tr(class_="align-top")[
            h.td(class_="py-4 pr-6 pl-4 text-sm font-medium text-slate-950 sm:pl-6")[
                first_cell
            ],
            (
                h.td(
                    class_="px-3 py-4 align-top text-sm whitespace-nowrap text-slate-600"
                )[cell]
                for cell in other_cells
            ),
        ]
    body_rows: Node
    if rows:
        body_rows = (render_row(row) for row in rows)
    else:
        body_rows = h.tr[
            h.td(
                colspan=str(len(headers)),
                class_="px-4 py-8 text-center text-sm text-slate-500 sm:px-6",
            )[empty_message]
        ]
    return h.section[
        h.div(class_="flex flex-col gap-3 sm:flex-row sm:items-end sm:justify-between")[
            h.div[
                eyebrow
                and h.p(
                    class_="text-xs font-semibold uppercase tracking-[0.22em] text-amber-600"
                )[eyebrow],
                h.h2(class_="mt-1 text-xl font-semibold text-slate-950")[title],
                subtitle and h.p(class_="mt-1 text-sm text-slate-600")[subtitle],
            ],
            actions,
        ],
        h.div(
            class_="mt-3 overflow-hidden rounded-2xl bg-white shadow-sm ring-1 ring-slate-200"
        )[
            h.div(class_="overflow-x-auto")[
                h.table(
                    class_="relative w-full min-w-[72rem] divide-y divide-slate-200 table-auto"
                )[
                    h.thead(class_="bg-stone-50")[
                        h.tr[
                            (
                                h.th(
                                    scope="col",
                                    class_="px-3 py-2.5 text-left text-xs font-semibold uppercase tracking-[0.18em] whitespace-nowrap text-slate-500 first:pl-4 sm:first:pl-6",
                                )[header]
                                for header in headers
                            )
                        ]
                    ],
                    h.tbody(class_="divide-y divide-slate-200 bg-white")[body_rows],
                ]
            ]
        ],
    ]
 def stat_card(*, label: str, value: str, detail: str) -> Renderable:
    return h.div(
        class_="rounded-3xl bg-white/85 p-5 shadow-sm ring-1 ring-slate-200 backdrop-blur"
    )[
        h.dt(class_="text-sm font-medium text-slate-500")[label],
        h.dd(class_="mt-3 text-3xl font-semibold tracking-tight text-slate-950")[value],
        h.p(class_="mt-2 text-sm text-slate-600")[detail],
    ]
 def input_field(
    *,
    label: str,
    field_id: str,
    value: str = "",
    placeholder: str = "",
    help_text: str | None = None,
    signal_name: str | None = None,
    disabled: bool = False,
 ) -> Renderable:
    class_name = (
        "mt-2 block w-full rounded-2xl border-0 px-3.5 py-2.5 text-sm shadow-sm ring-1 "
        + (
            "cursor-not-allowed bg-slate-100 text-slate-500 ring-slate-200"
            if disabled
            else "bg-white text-slate-900 ring-slate-200 placeholder:text-slate-400 focus:outline-hidden focus:ring-2 focus:ring-amber-500"
        )
    )
    return h.div[
        h.label(for_=field_id, class_="block text-sm font-medium text-slate-900")[
            label
        ],
        h.input(
            {"data-bind": signal_name} if signal_name is not None else {},
            id=field_id,
            name=field_id,
            type="text",
            value=value,
            placeholder=placeholder,
            disabled=disabled,
            class_=class_name,
        ),
        help_text and h.p(class_="mt-2 text-xs text-slate-500")[help_text],
    ]
 def select_field(
    *,
    label: str,
    field_id: str,
    options: tuple[str, ...],
    selected: str,
    help_text: str | None = None,
    signal_name: str | None = None,
 ) -> Renderable:
    return h.div[
        h.label(for_=field_id, class_="block text-sm font-medium text-slate-900")[
            label
        ],
        h.select(
            {"data-bind": signal_name} if signal_name is not None else {},
            id=field_id,
            name=field_id,
            class_="mt-2 block w-full rounded-2xl border-0 bg-white px-3.5 py-2.5 text-sm text-slate-900 shadow-sm ring-1 ring-slate-200 focus:outline-hidden focus:ring-2 focus:ring-amber-500",
        )[
            (
                h.option(value=option, selected=option == selected)[option]
                for option in options
            )
        ],
        help_text and h.p(class_="mt-2 text-xs text-slate-500")[help_text],
    ]
 def textarea_field(
    *,
    label: str,
    field_id: str,
    value: str,
    rows: str = "4",
    signal_name: str | None = None,
 ) -> Renderable:
    return h.div[
        h.label(for_=field_id, class_="block text-sm font-medium text-slate-900")[
            label
        ],
        h.textarea(
            {"data-bind": signal_name} if signal_name is not None else {},
            id=field_id,
            name=field_id,
            rows=rows,
            class_="mt-2 block w-full rounded-2xl border-0 bg-white px-3.5 py-2.5 text-sm text-slate-900 shadow-sm ring-1 ring-slate-200 placeholder:text-slate-400 focus:outline-hidden focus:ring-2 focus:ring-amber-500",
        )[value],
    ]
 def toggle_field(
    *,
    label: str,
    description: str,
    signal_name: str,
    checked: bool = False,
 ) -> Renderable:
    signal_value = str(checked).lower()
    return h.div(
        {"data-signals__ifmissing": f"{{{signal_name}: {signal_value}}}"},
        class_="rounded-3xl bg-white p-4 shadow-sm",
    )[
        h.div(class_="flex items-start justify-between gap-4")[
            h.div[
                h.h3(class_="text-sm font-semibold text-slate-900")[label],
                h.p(class_="mt-1 text-sm text-slate-600")[description],
            ],
            h.label(class_="mt-0.5 cursor-pointer")[
                h.div(
                    {
                        "data-class:bg-amber-500": f"${signal_name}",
                        "data-class:bg-slate-200": f"!${signal_name}",
                    },
                    class_="group relative inline-flex w-11 shrink-0 rounded-full bg-slate-200 p-0.5 outline-offset-2 outline-amber-500 transition",
                )[
                    h.span(
                        {
                            "data-class:translate-x-5": f"${signal_name}",
                            "data-class:translate-x-0": f"!${signal_name}",
                        },
                        class_="size-5 translate-x-0 rounded-full bg-white shadow-xs ring-1 ring-slate-900/5 transition-transform",
                    ),
                    h.input(
                        {"data-bind": signal_name},
                        type="checkbox",
                        name=signal_name,
                        checked=checked,
                        class_="sr-only",
                    ),
                ],
            ],
        ]
    ]
 def status_badge(*, label: str, tone: str) -> Renderable:
    tones = {
        "running": "bg-emerald-100 text-emerald-800",
        "scheduled": "bg-sky-100 text-sky-800",
        "idle": "bg-slate-200 text-slate-700",
        "failed": "bg-rose-100 text-rose-800",
        "done": "bg-emerald-100 text-emerald-800",
    }
    return h.span(
        class_=f"inline-flex rounded-full px-2.5 py-1 text-xs font-semibold {tones[tone]}"
    )[label]
--- a/repub/config.py
+++ b/repub/config.py
@ -30,6 +30,14 @@ class RepublisherConfig:
    scrapy_settings: dict[str, Any]
 def feed_output_dir(*, out_dir: Path, feed_slug: str) -> Path:
    return out_dir / "feeds" / feed_slug
 def feed_output_path(*, out_dir: Path, feed_slug: str) -> Path:
    return feed_output_dir(out_dir=out_dir, feed_slug=feed_slug) / "feed.rss"
 def _resolve_path(base_path: Path, value: str) -> Path:
    path = Path(value).expanduser()
    if not path.is_absolute():
@ -173,7 +181,7 @@ def build_feed_settings(
    out_dir: Path,
    feed_slug: str,
 ) -> Settings:
-    feed_dir = out_dir / feed_slug
+    feed_dir = feed_output_dir(out_dir=out_dir, feed_slug=feed_slug)
    image_dir = base_settings.get("REPUBLISHER_IMAGE_DIR", IMAGE_DIR)
    video_dir = base_settings.get("REPUBLISHER_VIDEO_DIR", VIDEO_DIR)
    audio_dir = base_settings.get("REPUBLISHER_AUDIO_DIR", AUDIO_DIR)
@ -192,7 +200,7 @@ def build_feed_settings(
        {
            "REPUBLISHER_OUT_DIR": str(out_dir),
            "FEEDS": {
-                str(out_dir / f"{feed_slug}.rss"): {
+                str(feed_output_path(out_dir=out_dir, feed_slug=feed_slug)): {
                    "format": "rss",
                    "postprocessing": [],
                    "feed_name": feed_slug,
--- a/repub/crawl.py
+++ b/repub/crawl.py
@ -11,6 +11,7 @@ from repub.config import (
    FeedConfig,
    build_base_settings,
    build_feed_settings,
    feed_output_dir,
    load_config,
 )
 from repub.media import check_runtime
@ -30,7 +31,9 @@ class FeedNameFilter:
 def prepare_output_dirs(out_dir: Path, feed_name: str) -> None:
    (out_dir / "logs").mkdir(parents=True, exist_ok=True)
    (out_dir / "httpcache").mkdir(parents=True, exist_ok=True)
-    (out_dir / feed_name).mkdir(parents=True, exist_ok=True)
+    feed_output_dir(out_dir=out_dir, feed_slug=feed_name).mkdir(
        parents=True, exist_ok=True
    )
 def create_feed_crawler(
--- a/repub/datastar.py
+++ b/repub/datastar.py
@ -0,0 +1,89 @@
 from __future__ import annotations
 import asyncio
 import hashlib
 from collections.abc import AsyncGenerator, Awaitable, Callable
 from typing import Protocol
 from datastar_py import ServerSentEventGenerator as SSE
 from datastar_py.sse import DatastarEvent
 class HtmlRenderable(Protocol):
    def __html__(self) -> str: ...
 RenderResult = str | HtmlRenderable
 RenderFunction = Callable[[], Awaitable[RenderResult]]
 class RefreshBroker:
    def __init__(self) -> None:
        self._subscribers: dict[asyncio.Queue[object], asyncio.AbstractEventLoop] = {}
    def subscribe(self) -> asyncio.Queue[object]:
        queue: asyncio.Queue[object] = asyncio.Queue(maxsize=1)
        self._subscribers[queue] = asyncio.get_running_loop()
        return queue
    def unsubscribe(self, queue: asyncio.Queue[object]) -> None:
        self._subscribers.pop(queue, None)
    def publish(self, event: object = "refresh-event") -> None:
        for queue, loop in tuple(self._subscribers.items()):
            loop.call_soon_threadsafe(_publish_event, queue, event)
 def _publish_event(queue: asyncio.Queue[object], event: object) -> None:
    if queue.full():
        try:
            queue.get_nowait()
        except asyncio.QueueEmpty:
            pass
    try:
        queue.put_nowait(event)
    except asyncio.QueueFull:
        return
 async def render_sse_event(
    render: RenderFunction, *, last_event_id: str | None = None
 ) -> tuple[str | None, DatastarEvent | None]:
    html = _coerce_html(await render())
    event_id = _render_hash(html)
    if event_id == last_event_id:
        return last_event_id, None
    return event_id, SSE.patch_elements(html, event_id=event_id)
 async def render_stream(
    queue: asyncio.Queue[object],
    render: RenderFunction,
    *,
    last_event_id: str | None = None,
    render_on_connect: bool = True,
 ) -> AsyncGenerator[DatastarEvent, None]:
    if render_on_connect:
        last_event_id, event = await render_sse_event(
            render, last_event_id=last_event_id
        )
        if event is not None:
            yield event
    while True:
        await queue.get()
        last_event_id, event = await render_sse_event(
            render, last_event_id=last_event_id
        )
        if event is not None:
            yield event
 def _coerce_html(view: RenderResult) -> str:
    if isinstance(view, str):
        return view
    return view.__html__()
 def _render_hash(html: str) -> str:
    return hashlib.blake2s(html.encode("utf-8"), digest_size=16).hexdigest()
--- a/repub/entrypoint.py
+++ b/repub/entrypoint.py
@ -34,14 +34,19 @@ def parse_args(argv: list[str] | None = None) -> tuple[str, argparse.Namespace]:
    serve_parser = subparsers.add_parser("serve", help="Start the republisher web UI")
    serve_parser.add_argument(
        "--host",
-        default=os.environ.get("REPUB_HOST", "127.0.0.1"),
+        default=os.environ.get("REPUBLISHER_HOST", "127.0.0.1"),
        help="Host interface for the web UI",
    )
    serve_parser.add_argument(
        "--port",
-        default=os.environ.get("REPUB_PORT", "8080"),
+        default=os.environ.get("REPUBLISHER_PORT", "8080"),
        help="Port for the web UI",
    )
    serve_parser.add_argument(
        "--dev-mode",
        action="store_true",
        help="Serve published feeds from /feeds for local development",
    )
    crawl_parser = subparsers.add_parser("crawl", help="Run the feed crawler once")
    crawl_parser.add_argument(
@ -51,11 +56,11 @@ def parse_args(argv: list[str] | None = None) -> tuple[str, argparse.Namespace]:
        help="Path to runtime config TOML file",
    )
    if not raw_args:
-        raw_args = ["serve"]
+        raw_args = ["serve", "--dev-mode"]
    elif raw_args[0] in {"-c", "--config"}:
        raw_args = ["crawl", *raw_args]
    elif raw_args[0] not in {"serve", "crawl"}:
-        raw_args = ["serve", *raw_args]
+        raw_args = ["serve", "--dev-mode", *raw_args]
    args = parser.parse_args(raw_args)
    command = args.command or "serve"
@ -72,10 +77,10 @@ def entrypoint(argv: list[str] | None = None) -> int:
    try:
        port = int(args.port)
    except ValueError:
-        logger.error("Invalid REPUB_PORT/--port value: %s", args.port)
+        logger.error("Invalid REPUBLISHER_PORT/--port value: %s", args.port)
        return 2
-    app = create_app()
+    app = create_app(dev_mode=bool(args.dev_mode))
    app.run(host=args.host, port=port)
    return 0
--- a/repub/job_runner.py
+++ b/repub/job_runner.py
@ -0,0 +1,468 @@
 from __future__ import annotations
 import argparse
 import json
 import signal
 import sys
 from dataclasses import dataclass
 from datetime import UTC, datetime
 from pathlib import Path
 from typing import Any
 from pygea.config import LoggingConfig, PygeaConfig, ResultsConfig, RuntimeConfig
 from scrapy.crawler import CrawlerProcess
 from scrapy.statscollectors import StatsCollector
 from twisted.python.failure import Failure
 from repub.config import (
    FeedConfig,
    RepublisherConfig,
    build_base_settings,
    build_feed_settings,
    feed_output_dir,
 )
 from repub.crawl import prepare_output_dirs
 from repub.model import (
    Job,
    Source,
    SourceFeed,
    SourcePangea,
    database,
    initialize_database,
 )
 from repub.spiders.rss_spider import RssFeedSpider
 def _json_default(value: Any) -> Any:
    if isinstance(value, datetime):
        if value.tzinfo is None:
            return value.replace(tzinfo=UTC).isoformat()
        return value.astimezone(UTC).isoformat()
    return str(value)
 def _normalized_stats(stats: dict[str, Any]) -> dict[str, Any]:
    cache_store = int(stats.get("httpcache/store", 0))
    cache_hits = int(stats.get("httpcache/hit", 0))
    cache_misses = int(stats.get("httpcache/miss", 0))
    return {
        **stats,
        "requests_count": int(stats.get("downloader/request_count", 0)),
        "items_count": int(stats.get("item_scraped_count", 0)),
        "warnings_count": int(stats.get("log_count/WARNING", 0)),
        "errors_count": int(stats.get("log_count/ERROR", 0)),
        "bytes_count": int(stats.get("downloader/response_bytes", 0)),
        "retries_count": int(stats.get("retry/count", 0)),
        "exceptions_count": int(stats.get("spider_exceptions/count", 0)),
        "cache_size_count": cache_store,
        "cache_object_count": cache_store + cache_hits + cache_misses,
    }
 class ExecutionStatsCollector(StatsCollector):
    def __init__(self, crawler: Any):
        super().__init__(crawler)
        self._stats_path = Path(crawler.settings["REPUB_JOB_STATS_PATH"])
        self._stats_path.parent.mkdir(parents=True, exist_ok=True)
    def set_value(self, key: str, value: Any, spider: Any | None = None) -> None:
        super().set_value(key, value, spider)
        self._write_snapshot()
    def set_stats(self, stats: dict[str, Any], spider: Any | None = None) -> None:
        super().set_stats(stats, spider)
        self._write_snapshot()
    def inc_value(
        self,
        key: str,
        count: int = 1,
        start: int = 0,
        spider: Any | None = None,
    ) -> None:
        super().inc_value(key, count, start, spider)
        self._write_snapshot()
    def max_value(self, key: str, value: Any, spider: Any | None = None) -> None:
        super().max_value(key, value, spider)
        self._write_snapshot()
    def min_value(self, key: str, value: Any, spider: Any | None = None) -> None:
        super().min_value(key, value, spider)
        self._write_snapshot()
    def clear_stats(self, spider: Any | None = None) -> None:
        super().clear_stats(spider)
        self._write_snapshot()
    def open_spider(self, spider: Any | None = None) -> None:
        super().open_spider(spider)
        self._write_snapshot()
    def _persist_stats(self, stats: dict[str, Any]) -> None:
        self._write_snapshot(stats)
    def _write_snapshot(self, stats: dict[str, Any] | None = None) -> None:
        payload = {
            "timestamp": datetime.now(UTC).isoformat(),
            **_normalized_stats(self._stats if stats is None else stats),
        }
        with self._stats_path.open("a", encoding="utf-8") as handle:
            handle.write(json.dumps(payload, sort_keys=True, default=_json_default))
            handle.write("\n")
 def pangea_feed_class():
    from pygea.pangeafeed import PangeaFeed
    return PangeaFeed
 def generate_pangea_feed(
    *,
    name: str,
    slug: str,
    domain: str,
    category_name: str,
    content_type: str,
    only_newest: bool,
    max_articles: int,
    oldest_article: int,
    include_authors: bool,
    exclude_media: bool,
    include_content: bool,
    content_format: str,
    out_dir: str | Path,
    log_path: str | Path,
 ) -> Path:
    resolved_out_dir = Path(out_dir).resolve()
    resolved_log_path = Path(log_path).resolve()
    pangea_out_dir = feed_output_dir(out_dir=resolved_out_dir, feed_slug=slug)
    config = PygeaConfig(
        config_path=resolved_out_dir / "pygea-runtime.toml",
        domain=domain,
        default_content_type=content_type,
        feeds=(
            {
                "name": category_name,
                "slug": slug,
                "only_newest": only_newest,
                "content_type": content_type,
            },
        ),
        runtime=RuntimeConfig(
            api_key=None,
            max_articles=max_articles,
            oldest_article=oldest_article,
            authors_p=include_authors,
            no_media_p=exclude_media,
            content_inc_p=include_content,
            content_format=content_format,
            verbose_p=True,
        ),
        results=ResultsConfig(
            output_to_file_p=True,
            output_file_name="pangea.rss",
            output_directory=pangea_out_dir.parent,
        ),
        logging=LoggingConfig(
            log_file=resolved_log_path,
            default_log_level="INFO",
        ),
    )
    feed_class = pangea_feed_class()
    feed = feed_class(config, list(config.feeds))
    feed.acquire_content()
    feed.generate_feed()
    output_path = feed.disgorge(slug)
    if output_path is None:
        raise RuntimeError(f"pygea did not write an output file for {name!r}")
    return output_path.resolve()
@dataclass(frozen=True)
 class JobSourceConfig:
    source_name: str
    source_slug: str
    source_type: str
    spider_arguments: dict[str, str]
    feed_url: str | None = None
    pangea_domain: str | None = None
    pangea_category: str | None = None
    content_type: str | None = None
    only_newest: bool = True
    max_articles: int = 10
    oldest_article: int = 3
    include_authors: bool = True
    exclude_media: bool = False
    include_content: bool = True
    content_format: str = "MOBILE_3"
 def parse_args(argv: list[str] | None = None) -> argparse.Namespace:
    parser = argparse.ArgumentParser(description="Run a republisher job worker")
    parser.add_argument("--job-id", type=int, required=True)
    parser.add_argument("--execution-id", type=int, required=True)
    parser.add_argument("--db-path", required=True)
    parser.add_argument("--out-dir", required=True)
    parser.add_argument("--stats-path", required=True)
    return parser.parse_args(argv)
 def main(argv: list[str] | None = None) -> int:
    args = parse_args(argv)
    stop_requested = False
    process: CrawlerProcess | None = None
    def request_stop(signum: int, frame: object | None) -> None:
        del signum, frame
        nonlocal stop_requested
        stop_requested = True
        print(
            f"worker[{args.job_id}:{args.execution_id}]: graceful stop requested",
            flush=True,
        )
        if process is None:
            return
        try:
            from twisted.internet import reactor
            call_from_thread = getattr(reactor, "callFromThread", None)
            if callable(call_from_thread):
                call_from_thread(process.stop)
            else:
                process.stop()
        except Exception as error:
            print(
                f"worker[{args.job_id}:{args.execution_id}]: failed to stop reactor gracefully: {error}",
                flush=True,
            )
    signal.signal(signal.SIGTERM, request_stop)
    signal.signal(signal.SIGINT, request_stop)
    try:
        source_config = _load_job_source_config(
            db_path=args.db_path, job_id=args.job_id
        )
    except Exception as error:
        print(
            f"worker[{args.job_id}:{args.execution_id}]: failed to load job config: {error}",
            flush=True,
        )
        return 1
    out_dir = Path(args.out_dir).resolve()
    stats_path = Path(args.stats_path).resolve()
    log_path = stats_path.with_suffix(".log")
    try:
        feed = _resolve_feed(
            source_config=source_config,
            out_dir=out_dir,
            log_path=log_path,
        )
        process = CrawlerProcess(
            _build_crawl_settings(
                out_dir=out_dir,
                feed=feed,
                stats_path=stats_path,
            )
        )
        print(
            f"worker[{args.job_id}:{args.execution_id}]: starting crawl for {source_config.source_slug}",
            flush=True,
        )
        exit_code = _run_crawl(
            process=process,
            feed=feed,
            spider_arguments=source_config.spider_arguments,
        )
    except Exception as error:
        print(
            f"worker[{args.job_id}:{args.execution_id}]: crawl failed: {error}",
            flush=True,
        )
        return 1
    if stop_requested:
        print(
            f"worker[{args.job_id}:{args.execution_id}]: stopping after graceful request",
            flush=True,
        )
        return 130
    if exit_code == 0:
        print(
            f"worker[{args.job_id}:{args.execution_id}]: completed successfully",
            flush=True,
        )
    return exit_code
 def _load_job_source_config(*, db_path: str, job_id: int) -> JobSourceConfig:
    initialize_database(db_path)
    primary_key = getattr(Job, "_meta").primary_key
    with database.connection_context():
        job = (
            Job.select(Job, Source)
            .join(Source)
            .where(primary_key == job_id)
            .get_or_none()
        )
        if job is None:
            raise ValueError(f"job {job_id} does not exist")
        source = job.source
        spider_arguments = _parse_spider_arguments(job.spider_arguments)
        if source.source_type == "feed":
            feed = SourceFeed.get_or_none(SourceFeed.source == source)
            if feed is None:
                raise ValueError(
                    f"feed source {source.slug!r} is missing its feed config"
                )
            return JobSourceConfig(
                source_name=source.name,
                source_slug=source.slug,
                source_type=source.source_type,
                spider_arguments=spider_arguments,
                feed_url=feed.feed_url,
            )
        pangea = SourcePangea.get_or_none(SourcePangea.source == source)
        if pangea is None:
            raise ValueError(
                f"pangea source {source.slug!r} is missing its pangea config"
            )
        return JobSourceConfig(
            source_name=source.name,
            source_slug=source.slug,
            source_type=source.source_type,
            spider_arguments=spider_arguments,
            pangea_domain=pangea.domain,
            pangea_category=pangea.category_name,
            content_type=pangea.content_type,
            only_newest=bool(pangea.only_newest),
            max_articles=int(pangea.max_articles),
            oldest_article=int(pangea.oldest_article),
            include_authors=bool(pangea.include_authors),
            exclude_media=bool(pangea.exclude_media),
            include_content=bool(pangea.include_content),
            content_format=pangea.content_format,
        )
 def _parse_spider_arguments(raw_value: str) -> dict[str, str]:
    arguments: dict[str, str] = {}
    for raw_line in raw_value.splitlines():
        line = raw_line.strip()
        if line == "":
            continue
        key, separator, value = line.partition("=")
        key = key.strip()
        if separator == "" or key == "":
            raise ValueError(
                f"invalid spider argument {raw_line!r}; expected key=value"
            )
        arguments[key] = value
    return arguments
 def _resolve_feed(
    *,
    source_config: JobSourceConfig,
    out_dir: Path,
    log_path: Path,
 ) -> FeedConfig:
    if source_config.source_type == "feed":
        assert source_config.feed_url is not None
        return FeedConfig(
            name=source_config.source_name,
            slug=source_config.source_slug,
            url=source_config.feed_url,
        )
    generated_feed_path = generate_pangea_feed(
        name=source_config.source_name,
        slug=source_config.source_slug,
        domain=_require_value(source_config.pangea_domain, "pangea_domain"),
        category_name=_require_value(source_config.pangea_category, "pangea_category"),
        content_type=_require_value(source_config.content_type, "content_type"),
        only_newest=source_config.only_newest,
        max_articles=source_config.max_articles,
        oldest_article=source_config.oldest_article,
        include_authors=source_config.include_authors,
        exclude_media=source_config.exclude_media,
        include_content=source_config.include_content,
        content_format=source_config.content_format,
        out_dir=out_dir,
        log_path=log_path.with_suffix(".pygea.log"),
    )
    print(
        f"pygea: generated intermediate feed at {generated_feed_path}",
        flush=True,
    )
    return FeedConfig(
        name=source_config.source_name,
        slug=source_config.source_slug,
        url=generated_feed_path.as_uri(),
    )
 def _build_crawl_settings(*, out_dir: Path, feed: FeedConfig, stats_path: Path):
    base_settings = build_base_settings(
        RepublisherConfig(
            config_path=out_dir / "job-runner.toml",
            out_dir=out_dir,
            feeds=(feed,),
            scrapy_settings={},
        )
    )
    prepare_output_dirs(out_dir, feed.slug)
    settings = build_feed_settings(base_settings, out_dir=out_dir, feed_slug=feed.slug)
    settings.set("LOG_FILE", None, priority="cmdline")
    settings.set(
        "STATS_CLASS",
        "repub.job_runner.ExecutionStatsCollector",
        priority="cmdline",
    )
    settings.set("REPUB_JOB_STATS_PATH", str(stats_path), priority="cmdline")
    return settings
 def _run_crawl(
    *,
    process: CrawlerProcess,
    feed: FeedConfig,
    spider_arguments: dict[str, str],
 ) -> int:
    results: list[Failure | None] = []
    deferred = process.crawl(
        RssFeedSpider,
        feed_name=feed.slug,
        url=feed.url,
        **spider_arguments,
    )
    def handle_success(_: object) -> None:
        results.append(None)
        return None
    def handle_error(failure: Failure) -> None:
        print(failure.getTraceback(), flush=True)
        results.append(failure)
        return None
    deferred.addCallbacks(handle_success, handle_error)
    process.start()
    return 1 if any(result is not None for result in results) else 0
 def _require_value(value: str | None, field_name: str) -> str:
    if value is None or value == "":
        raise ValueError(f"missing {field_name}")
    return value
 if __name__ == "__main__":
    sys.exit(main())
--- a/repub/jobs.py
+++ b/repub/jobs.py
@ -0,0 +1,747 @@
 from __future__ import annotations
 import json
 import subprocess
 import sys
 from dataclasses import dataclass
 from datetime import UTC, datetime, timedelta
 from pathlib import Path
 from typing import Callable, TextIO, cast
 from apscheduler.schedulers.background import BackgroundScheduler
 from apscheduler.triggers.cron import CronTrigger
 from repub.config import feed_output_dir, feed_output_path
 from repub.model import Job, JobExecution, JobExecutionStatus, Source, database, utc_now
 SCHEDULER_JOB_PREFIX = "job-"
 POLL_JOB_ID = "runtime-poll-workers"
 SYNC_JOB_ID = "runtime-sync-jobs"
@dataclass(frozen=True)
 class JobArtifacts:
    log_path: Path
    stats_path: Path
    @classmethod
    def for_execution(
        cls, *, log_dir: Path, job_id: int, execution_id: int
    ) -> "JobArtifacts":
        prefix = f"job-{job_id}-execution-{execution_id}"
        return cls(
            log_path=log_dir / f"{prefix}.log",
            stats_path=log_dir / f"{prefix}.jsonl",
        )
@dataclass
 class RunningWorker:
    execution_id: int
    process: subprocess.Popen[str]
    log_handle: TextIO
    artifacts: JobArtifacts
    stats_offset: int = 0
@dataclass(frozen=True)
 class ExecutionLogView:
    job_id: int
    execution_id: int
    title: str
    description: str
    status_label: str
    status_tone: str
    log_text: str
    error_message: str | None = None
 class JobRuntime:
    def __init__(
        self,
        *,
        log_dir: str | Path,
        refresh_callback: Callable[[], None] | None = None,
        graceful_stop_seconds: float = 15.0,
    ) -> None:
        self.log_dir = Path(log_dir)
        self.refresh_callback = refresh_callback
        self.graceful_stop_seconds = graceful_stop_seconds
        self.scheduler = BackgroundScheduler(timezone=UTC)
        self._workers: dict[int, RunningWorker] = {}
        self._started = False
    def start(self) -> None:
        if self._started:
            return
        self._reconcile_stale_executions()
        self.scheduler.start()
        self.scheduler.add_job(
            self.poll_workers,
            "interval",
            id=POLL_JOB_ID,
            seconds=0.25,
            replace_existing=True,
            max_instances=1,
            coalesce=True,
        )
        self.scheduler.add_job(
            self.sync_jobs,
            "interval",
            id=SYNC_JOB_ID,
            seconds=1,
            replace_existing=True,
            max_instances=1,
            coalesce=True,
        )
        self.sync_jobs()
        self._started = True
    def shutdown(self) -> None:
        for execution_id in tuple(self._workers):
            worker = self._workers.pop(execution_id)
            if worker.process.poll() is None:
                worker.process.kill()
                worker.process.wait(timeout=2)
            worker.log_handle.close()
        if self._started:
            self.scheduler.shutdown(wait=False)
            self._started = False
    def sync_jobs(self) -> None:
        with database.connection_context():
            jobs = tuple(Job.select().where(Job.enabled == True))  # noqa: E712
        desired_ids = set()
        for job in jobs:
            scheduler_job_id = _scheduler_job_id(_job_id(job))
            desired_ids.add(scheduler_job_id)
            self.scheduler.add_job(
                self.run_scheduled_job,
                trigger=_job_trigger(job),
                args=(_job_id(job),),
                id=scheduler_job_id,
                replace_existing=True,
                max_instances=1,
                coalesce=True,
                misfire_grace_time=1,
            )
        for scheduled_job in tuple(self.scheduler.get_jobs()):
            if (
                scheduled_job.id.startswith(SCHEDULER_JOB_PREFIX)
                and scheduled_job.id not in desired_ids
            ):
                self.scheduler.remove_job(scheduled_job.id)
    def run_scheduled_job(self, job_id: int) -> None:
        self.run_job_now(job_id, reason="scheduled")
    def run_job_now(self, job_id: int, *, reason: str) -> int | None:
        del reason
        self.start()
        with database.connection_context():
            job = Job.get_or_none(id=job_id)
            if job is None:
                return None
            already_running = (
                JobExecution.select()
                .where(
                    (JobExecution.job == job)
                    & (JobExecution.running_status == JobExecutionStatus.RUNNING)
                )
                .exists()
            )
            if already_running:
                return None
            execution = JobExecution.create(
                job=job,
                started_at=utc_now(),
                running_status=JobExecutionStatus.RUNNING,
            )
            execution_id = _execution_id(execution)
        artifacts = JobArtifacts.for_execution(
            log_dir=self.log_dir, job_id=job_id, execution_id=execution_id
        )
        artifacts.log_path.parent.mkdir(parents=True, exist_ok=True)
        log_handle = artifacts.log_path.open("a", encoding="utf-8", buffering=1)
        log_handle.write(
            f"scheduler: starting execution {execution_id} for job {job_id}\n"
        )
        process = subprocess.Popen(
            [
                sys.executable,
                "-u",
                "-m",
                "repub.job_runner",
                "--job-id",
                str(job_id),
                "--execution-id",
                str(execution_id),
                "--db-path",
                str(database.database),
                "--out-dir",
                str(self.log_dir.parent),
                "--stats-path",
                str(artifacts.stats_path),
            ],
            stdout=log_handle,
            stderr=subprocess.STDOUT,
            text=True,
        )
        self._workers[execution_id] = RunningWorker(
            execution_id=execution_id,
            process=process,
            log_handle=log_handle,
            artifacts=artifacts,
        )
        self._trigger_refresh()
        return execution_id
    def request_execution_cancel(self, execution_id: int) -> bool:
        with database.connection_context():
            execution = JobExecution.get_or_none(id=execution_id)
            if execution is None:
                return False
            if execution.running_status != JobExecutionStatus.RUNNING:
                return False
            if execution.stop_requested_at is None:
                execution.stop_requested_at = utc_now()
                execution.save()
        worker = self._workers.get(execution_id)
        if worker is not None and worker.process.poll() is None:
            worker.log_handle.write(
                f"scheduler: graceful stop requested for execution {execution_id}\n"
            )
            worker.process.terminate()
        self._trigger_refresh()
        return True
    def set_job_enabled(self, job_id: int, *, enabled: bool) -> bool:
        with database.connection_context():
            job = Job.get_or_none(id=job_id)
            if job is None:
                return False
            job.enabled = enabled
            job.save()
        self.sync_jobs()
        self._trigger_refresh()
        return True
    def poll_workers(self) -> None:
        for execution_id in tuple(self._workers):
            worker = self._workers[execution_id]
            self._apply_stats(worker)
            self._enforce_graceful_stop(worker)
            returncode = worker.process.poll()
            if returncode is None:
                continue
            self._apply_stats(worker)
            with database.connection_context():
                execution = JobExecution.get_by_id(execution_id)
                execution.ended_at = utc_now()
                execution.running_status = _final_status(
                    execution=execution,
                    returncode=returncode,
                )
                execution.save()
            worker.log_handle.close()
            del self._workers[execution_id]
            self._trigger_refresh()
    def _apply_stats(self, worker: RunningWorker) -> None:
        if not worker.artifacts.stats_path.exists():
            return
        with worker.artifacts.stats_path.open("r", encoding="utf-8") as handle:
            handle.seek(worker.stats_offset)
            payload = handle.read()
            worker.stats_offset = handle.tell()
        lines = [line for line in payload.splitlines() if line.strip()]
        if not lines:
            return
        stats = json.loads(lines[-1])
        with database.connection_context():
            execution = JobExecution.get_by_id(worker.execution_id)
            execution.requests_count = int(stats.get("requests_count", 0))
            execution.items_count = int(stats.get("items_count", 0))
            execution.warnings_count = int(stats.get("warnings_count", 0))
            execution.errors_count = int(stats.get("errors_count", 0))
            execution.bytes_count = int(stats.get("bytes_count", 0))
            execution.retries_count = int(stats.get("retries_count", 0))
            execution.exceptions_count = int(stats.get("exceptions_count", 0))
            execution.cache_size_count = int(stats.get("cache_size_count", 0))
            execution.cache_object_count = int(stats.get("cache_object_count", 0))
            execution.raw_stats = json.dumps(stats, sort_keys=True)
            execution.save()
        self._trigger_refresh()
    def _enforce_graceful_stop(self, worker: RunningWorker) -> None:
        with database.connection_context():
            execution = JobExecution.get_by_id(worker.execution_id)
            if execution.stop_requested_at is None:
                return
            elapsed = utc_now() - _coerce_datetime(execution.stop_requested_at)
        if (
            elapsed >= timedelta(seconds=self.graceful_stop_seconds)
            and worker.process.poll() is None
        ):
            worker.process.kill()
    def _trigger_refresh(self) -> None:
        if self.refresh_callback is not None:
            self.refresh_callback()
    def _reconcile_stale_executions(self) -> None:
        with database.connection_context():
            stale_executions = tuple(
                JobExecution.select(JobExecution, Job)
                .join(Job)
                .where(JobExecution.running_status == JobExecutionStatus.RUNNING)
            )
            for execution in stale_executions:
                job = cast(Job, execution.job)
                execution_id = _execution_id(execution)
                artifacts = JobArtifacts.for_execution(
                    log_dir=self.log_dir,
                    job_id=_job_id(job),
                    execution_id=execution_id,
                )
                artifacts.log_path.parent.mkdir(parents=True, exist_ok=True)
                with artifacts.log_path.open("a", encoding="utf-8") as log_handle:
                    log_handle.write(
                        "scheduler: execution marked failed after app restart\n"
                    )
                execution.ended_at = utc_now()
                execution.running_status = (
                    JobExecutionStatus.CANCELED
                    if execution.stop_requested_at is not None
                    else JobExecutionStatus.FAILED
                )
                execution.save()
        if stale_executions:
            self._trigger_refresh()
 def load_runs_view(
    *, log_dir: str | Path, now: datetime | None = None
 ) -> dict[str, tuple[dict[str, object], ...]]:
    reference_time = now or datetime.now(UTC)
    resolved_log_dir = Path(log_dir)
    with database.connection_context():
        jobs = tuple(Job.select(Job, Source).join(Source).order_by(Source.name.asc()))
        running_executions = tuple(
            JobExecution.select(JobExecution, Job, Source)
            .join(Job)
            .join(Source)
            .where(JobExecution.running_status == JobExecutionStatus.RUNNING)
            .order_by(JobExecution.started_at.desc())
        )
        completed_executions = tuple(
            JobExecution.select(JobExecution, Job, Source)
            .join(Job)
            .join(Source)
            .where(
                JobExecution.running_status.in_(
                    (
                        JobExecutionStatus.SUCCEEDED,
                        JobExecutionStatus.FAILED,
                        JobExecutionStatus.CANCELED,
                    )
                )
            )
            .order_by(JobExecution.ended_at.desc())
            .limit(20)
        )
        running_by_job = {
            _job_id(execution.job): execution for execution in running_executions
        }
    return {
        "running": tuple(
            _project_running_execution(execution, resolved_log_dir, reference_time)
            for execution in running_executions
        ),
        "upcoming": tuple(
            _project_upcoming_job(job, running_by_job.get(job.id), reference_time)
            for job in jobs
        ),
        "completed": tuple(
            _project_completed_execution(execution, resolved_log_dir, reference_time)
            for execution in completed_executions
        ),
    }
 def load_dashboard_view(
    *, log_dir: str | Path, now: datetime | None = None
 ) -> dict[str, object]:
    reference_time = now or datetime.now(UTC)
    runs_view = load_runs_view(log_dir=log_dir, now=reference_time)
    output_dir = Path(log_dir).parent
    with database.connection_context():
        sources = tuple(Source.select().order_by(Source.name.asc()))
        failed_last_day = (
            JobExecution.select()
            .where(
                (JobExecution.running_status == JobExecutionStatus.FAILED)
                & (JobExecution.ended_at.is_null(False))
            )
            .count()
        )
    upcoming_ready = sum(
        1 for job in runs_view["upcoming"] if str(job["run_reason"]) == "Ready"
    )
    footprint_bytes = _directory_size(output_dir)
    return {
        "running": runs_view["running"],
        "source_feeds": tuple(
            _project_source_feed(source, output_dir, reference_time)
            for source in sources
        ),
        "snapshot": {
            "running_now": str(len(runs_view["running"])),
            "upcoming_today": str(upcoming_ready),
            "failures_24h": str(failed_last_day),
            "artifact_footprint": _format_bytes(footprint_bytes),
        },
    }
 def load_execution_log_view(
    *, log_dir: str | Path, job_id: int, execution_id: int
 ) -> ExecutionLogView:
    with database.connection_context():
        execution = JobExecution.get_or_none(id=execution_id)
    route = f"/job/{job_id}/execution/{execution_id}/logs"
    if execution is None or _job_id(cast(Job, execution.job)) != job_id:
        return ExecutionLogView(
            job_id=job_id,
            execution_id=execution_id,
            title=f"Job {job_id} / execution {execution_id}",
            description="Plain text log view routed through the app.",
            status_label="Unavailable",
            status_tone="failed",
            log_text="",
            error_message="Execution does not exist.",
        )
    artifacts = JobArtifacts.for_execution(
        log_dir=Path(log_dir),
        job_id=job_id,
        execution_id=execution_id,
    )
    if not artifacts.log_path.exists():
        return ExecutionLogView(
            job_id=job_id,
            execution_id=execution_id,
            title=f"Job {job_id} / execution {execution_id}",
            description="Plain text log view routed through the app.",
            status_label=_execution_status_label(execution),
            status_tone=_execution_status_tone(execution),
            log_text="",
            error_message="Log file has not been created yet.",
        )
    return ExecutionLogView(
        job_id=job_id,
        execution_id=execution_id,
        title=f"Job {job_id} / execution {execution_id}",
        description=f"Route: {route}",
        status_label=_execution_status_label(execution),
        status_tone=_execution_status_tone(execution),
        log_text=artifacts.log_path.read_text(encoding="utf-8"),
    )
 def _job_trigger(job: Job) -> CronTrigger:
    expression = " ".join(
        (
            str(job.cron_minute),
            str(job.cron_hour),
            str(job.cron_day_of_month),
            str(job.cron_month),
            str(job.cron_day_of_week),
        )
    )
    return CronTrigger.from_crontab(expression, timezone=UTC)
 def _scheduler_job_id(job_id: int) -> str:
    return f"{SCHEDULER_JOB_PREFIX}{job_id}"
 def _project_running_execution(
    execution: JobExecution, log_dir: Path, reference_time: datetime
 ) -> dict[str, object]:
    job = cast(Job, execution.job)
    job_id = _job_id(job)
    execution_id = _execution_id(execution)
    artifacts = JobArtifacts.for_execution(
        log_dir=log_dir, job_id=job_id, execution_id=execution_id
    )
    started_at = _coerce_datetime(
        cast(datetime | str, execution.started_at or execution.created_at)
    )
    runtime = reference_time - started_at
    return {
        "source": job.source.name,
        "slug": job.source.slug,
        "job_id": job_id,
        "execution_id": execution_id,
        "started_at": started_at.strftime("%Y-%m-%d %H:%M UTC"),
        "runtime": f"running for {int(runtime.total_seconds())}s",
        "status": "Stopping" if execution.stop_requested_at else "Running",
        "stats": _stats_summary(execution),
        "worker": (
            "graceful stop requested"
            if execution.stop_requested_at
            else "streaming stats from worker jsonl"
        ),
        "log_href": f"/job/{job_id}/execution/{execution_id}/logs",
        "log_exists": artifacts.log_path.exists(),
        "cancel_post_path": f"/actions/executions/{execution_id}/cancel",
    }
 def _project_upcoming_job(
    job: Job, running_execution: JobExecution | None, reference_time: datetime
 ) -> dict[str, object]:
    job_id = _job_id(job)
    trigger = _job_trigger(job)
    next_run = (
        trigger.get_next_fire_time(None, reference_time)
        if job.enabled and running_execution is None
        else None
    )
    return {
        "source": job.source.name,
        "slug": job.source.slug,
        "job_id": job_id,
        "next_run": (
            _humanize_relative_time(reference_time, next_run)
            if next_run is not None
            else ("Running now" if running_execution is not None else "Not scheduled")
        ),
        "next_run_at": next_run.isoformat() if next_run is not None else None,
        "schedule": " ".join(
            (
                str(job.cron_minute),
                str(job.cron_hour),
                str(job.cron_day_of_month),
                str(job.cron_month),
                str(job.cron_day_of_week),
            )
        ),
        "enabled_label": "Enabled" if job.enabled else "Disabled",
        "enabled_tone": "scheduled" if job.enabled else "idle",
        "run_disabled": running_execution is not None,
        "run_reason": "Already running" if running_execution is not None else "Ready",
        "toggle_label": "Disable" if job.enabled else "Enable",
        "toggle_enabled": not job.enabled,
        "run_post_path": f"/actions/jobs/{job_id}/run-now",
        "toggle_post_path": f"/actions/jobs/{job_id}/toggle-enabled",
        "delete_post_path": f"/actions/jobs/{job_id}/delete",
    }
 def _project_completed_execution(
    execution: JobExecution, log_dir: Path, reference_time: datetime
 ) -> dict[str, object]:
    job = cast(Job, execution.job)
    job_id = _job_id(job)
    execution_id = _execution_id(execution)
    artifacts = JobArtifacts.for_execution(
        log_dir=log_dir, job_id=job_id, execution_id=execution_id
    )
    ended_at = (
        _coerce_datetime(cast(datetime | str, execution.ended_at))
        if execution.ended_at is not None
        else None
    )
    return {
        "source": job.source.name,
        "slug": job.source.slug,
        "job_id": job_id,
        "execution_id": execution_id,
        "ended_at": (
            _humanize_relative_time(reference_time, ended_at)
            if ended_at is not None
            else "Pending"
        ),
        "ended_at_iso": ended_at.isoformat() if ended_at is not None else None,
        "status": _execution_status_label(execution),
        "status_tone": _execution_status_tone(execution),
        "stats": _stats_summary(execution),
        "summary": (
            "Canceled by operator"
            if execution.running_status == JobExecutionStatus.CANCELED
            else (
                "Worker exited successfully"
                if execution.running_status == JobExecutionStatus.SUCCEEDED
                else "Worker exited with failure"
            )
        ),
        "log_href": f"/job/{job_id}/execution/{execution_id}/logs",
        "log_exists": artifacts.log_path.exists(),
    }
 def _project_source_feed(
    source: Source, output_dir: Path, reference_time: datetime
 ) -> dict[str, object]:
    source_slug = str(source.slug)
    source_dir = feed_output_dir(out_dir=output_dir, feed_slug=source_slug)
    feed_path = feed_output_path(out_dir=output_dir, feed_slug=source_slug)
    feed_exists = feed_path.exists()
    updated_at = (
        datetime.fromtimestamp(feed_path.stat().st_mtime, tz=UTC)
        if feed_exists
        else None
    )
    return {
        "source": source.name,
        "slug": source_slug,
        "feed_href": f"/feeds/{source_slug}/feed.rss",
        "feed_status_label": "Available" if feed_exists else "Missing",
        "feed_status_tone": "done" if feed_exists else "failed",
        "feed_exists": feed_exists,
        "last_updated": (
            _humanize_relative_time(reference_time, updated_at)
            if updated_at is not None
            else "Never published"
        ),
        "last_updated_iso": updated_at.isoformat() if updated_at is not None else None,
        "artifact_footprint": _format_bytes(_directory_size(source_dir)),
    }
 def _execution_status_label(execution: JobExecution) -> str:
    status = JobExecutionStatus(execution.running_status)
    return {
        JobExecutionStatus.PENDING: "Pending",
        JobExecutionStatus.RUNNING: (
            "Stopping" if execution.stop_requested_at else "Running"
        ),
        JobExecutionStatus.SUCCEEDED: "Succeeded",
        JobExecutionStatus.FAILED: "Failed",
        JobExecutionStatus.CANCELED: "Canceled",
    }[status]
 def _execution_status_tone(execution: JobExecution) -> str:
    status = JobExecutionStatus(execution.running_status)
    return {
        JobExecutionStatus.PENDING: "idle",
        JobExecutionStatus.RUNNING: "running",
        JobExecutionStatus.SUCCEEDED: "done",
        JobExecutionStatus.FAILED: "failed",
        JobExecutionStatus.CANCELED: "idle",
    }[status]
 def _stats_summary(execution: JobExecution) -> str:
    bytes_count = cast(int, execution.bytes_count)
    return (
        f"{execution.requests_count} requests"
        f" • {execution.items_count} items"
        f" • {_format_summary_bytes(bytes_count)}"
    )
 def _final_status(*, execution: JobExecution, returncode: int) -> JobExecutionStatus:
    if execution.stop_requested_at is not None:
        return JobExecutionStatus.CANCELED
    if returncode == 0:
        return JobExecutionStatus.SUCCEEDED
    return JobExecutionStatus.FAILED
 def _coerce_datetime(value: datetime | str) -> datetime:
    if isinstance(value, datetime):
        if value.tzinfo is None:
            return value.replace(tzinfo=UTC)
        return value.astimezone(UTC)
    parsed = datetime.fromisoformat(value)
    if parsed.tzinfo is None:
        return parsed.replace(tzinfo=UTC)
    return parsed.astimezone(UTC)
 def _job_id(job: Job) -> int:
    return int(job.get_id())
 def _execution_id(execution: JobExecution) -> int:
    return int(execution.get_id())
 def _directory_size(path: Path) -> int:
    if not path.exists():
        return 0
    return sum(entry.stat().st_size for entry in path.rglob("*") if entry.is_file())
 def _format_bytes(value: int) -> str:
    if value < 1024:
        return f"{value} B"
    if value < 1024 * 1024:
        return f"{value / 1024:.1f} KB"
    if value < 1024 * 1024 * 1024:
        return f"{value / (1024 * 1024):.1f} MB"
    return f"{value / (1024 * 1024 * 1024):.1f} GB"
 def _format_summary_bytes(value: int) -> str:
    if value == 1:
        return "1 byte"
    if value < 1024:
        return f"{value} bytes"
    if value < 1024 * 1024:
        return f"{value / 1024:.1f} KiB"
    if value < 1024 * 1024 * 1024:
        return f"{value / (1024 * 1024):.1f} MiB"
    return f"{value / (1024 * 1024 * 1024):.1f} GiB"
 def _humanize_relative_time(reference_time: datetime, target_time: datetime) -> str:
    delta_seconds = int(round((target_time - reference_time).total_seconds()))
    if delta_seconds == 0:
        return "now"
    absolute_delta_seconds = abs(delta_seconds)
    units = (
        ("day", 24 * 60 * 60),
        ("hour", 60 * 60),
        ("minute", 60),
    )
    for label, size in units:
        if absolute_delta_seconds >= size:
            count = max(1, round(absolute_delta_seconds / size))
            suffix = "" if count == 1 else "s"
            if delta_seconds > 0:
                return f"in {count} {label}{suffix}"
            return f"{count} {label}{suffix} ago"
    if delta_seconds > 0:
        return f"in {absolute_delta_seconds} seconds"
    return f"{absolute_delta_seconds} seconds ago"
--- a/repub/media.py
+++ b/repub/media.py
@ -54,12 +54,25 @@ class VideoMeta(TypedDict):
    bit_rate: float
 def _decode_ffmpeg_output(output: Any) -> str:
    if isinstance(output, bytes):
        return output.decode("utf-8", errors="replace")
    return str(output)
 def _print_ffmpeg_error_output(error: ffmpeg.Error) -> None:
    if error.stderr:
        print(_decode_ffmpeg_output(error.stderr), file=sys.stderr)
    if error.stdout:
        print(_decode_ffmpeg_output(error.stdout))
 def probe_media(file_path) -> Dict[str, Any]:
    """Probes `file_path` using ffmpeg's ffprobe and returns the data."""
    try:
        return ffmpeg.probe(file_path)
    except ffmpeg.Error as e:
-        print(e.stderr, file=sys.stderr)
+        _print_ffmpeg_error_output(e)
        logger.error(f"Failed to probe io {file_path}")
        logger.error(e)
        raise RuntimeError(f"Failed to probe io {file_path}") from e
@ -217,7 +230,7 @@ def transcode_audio(input_file: str, output_dir: str, params: Dict[str, str]) ->
                **params,
                loglevel="quiet",
            )
-            .run()
+            .run(capture_stdout=True, capture_stderr=True)
        )
        before = os.path.getsize(input_file) / 1024
        after = os.path.getsize(output_file) / 1024
@ -229,8 +242,7 @@ def transcode_audio(input_file: str, output_dir: str, params: Dict[str, str]) ->
        )
        return output_file
    except ffmpeg.Error as e:
-        print(e.stderr, file=sys.stderr)
+        _print_ffmpeg_error_output(e)
        print(e.stdout)
        logger.error(e)
        raise RuntimeError(f"Failed to compress audio {input_file}") from e
@ -310,7 +322,7 @@ def transcode_video(input_file: str, output_dir: str, params: Dict[str, Any]) ->
                    **params,
                    # loglevel="quiet",
                )
-                .run()
+                .run(capture_stdout=True, capture_stderr=True)
            )
        else:
            passes = params["passes"]
@ -323,16 +335,18 @@ def transcode_video(input_file: str, output_dir: str, params: Dict[str, Any]) ->
                "-stats"
            )
            logger.info("Running pass #1")
-            std_out, std_err = ffoutput.run(capture_stdout=True)
+            ffoutput.run(capture_stdout=True, capture_stderr=True)
            print(std_out)
            print(std_err)
            logger.info("Running pass #2")
            ffoutput = ffinput.output(video, audio, output_file, **passes[1])
            ffoutput = ffoutput.global_args(
                # "-loglevel", "quiet",
                "-stats"
            )
-            ffoutput.run(overwrite_output=True)
+            ffoutput.run(
                capture_stdout=True,
                capture_stderr=True,
                overwrite_output=True,
            )
        before = os.path.getsize(input_file) / 1024
        after = os.path.getsize(output_file) / 1024
@ -344,7 +358,7 @@ def transcode_video(input_file: str, output_dir: str, params: Dict[str, Any]) ->
        )
        return output_file
    except ffmpeg.Error as e:
-        print(e.stderr, file=sys.stderr)
+        _print_ffmpeg_error_output(e)
        logger.error("Failed to transcode")
        logger.error(e)
        raise RuntimeError(f"Failed to transcode video: {e.stderr.decode()}") from e
--- a/repub/model.py
+++ b/repub/model.py
@ -0,0 +1,446 @@
 from __future__ import annotations
 import os
 from datetime import UTC, datetime
 from enum import IntEnum
 from importlib import resources
 from importlib.resources.abc import Traversable
 from pathlib import Path
 from peewee import (
    BooleanField,
    Check,
    DateTimeField,
    ForeignKeyField,
    IntegerField,
    Model,
    SqliteDatabase,
    TextField,
 )
 DEFAULT_DB_PATH = Path("republisher.db")
 DATABASE_PRAGMAS = {
    "busy_timeout": 5000,
    "cache_size": 15625,
    "foreign_keys": 1,
    "journal_mode": "wal",
    "page_size": 4096,
    "synchronous": "normal",
    "temp_store": "memory",
 }
 SCHEMA_GLOB = "*.sql"
 database = SqliteDatabase(None, pragmas=DATABASE_PRAGMAS)
 class JobExecutionStatus(IntEnum):
    PENDING = 0
    RUNNING = 1
    SUCCEEDED = 2
    FAILED = 3
    CANCELED = 4
 def utc_now() -> datetime:
    return datetime.now(UTC)
 def resolve_database_path(db_path: str | Path | None = None) -> Path:
    raw_value = (
        os.environ.get("REPUBLISHER_DB_PATH", DEFAULT_DB_PATH)
        if db_path is None
        else db_path
    )
    raw_path = Path(raw_value)
    return raw_path.expanduser().resolve()
 def schema_paths() -> tuple[Traversable, ...]:
    schema_dir = resources.files("repub").joinpath("sql")
    return tuple(
        sorted(
            (path for path in schema_dir.iterdir() if path.name.endswith(".sql")),
            key=lambda path: path.name,
        )
    )
 def initialize_database(db_path: str | Path | None = None) -> Path:
    resolved_path = resolve_database_path(db_path)
    resolved_path.parent.mkdir(parents=True, exist_ok=True)
    if not database.is_closed():
        database.close()
    database.init(str(resolved_path), pragmas=DATABASE_PRAGMAS)
    database.connect(reuse_if_open=True)
    try:
        connection = database.connection()
        for path in schema_paths():
            connection.executescript(path.read_text(encoding="utf-8"))
    finally:
        database.close()
    return resolved_path
 def source_slug_exists(slug: str) -> bool:
    with database.connection_context():
        return Source.select().where(Source.slug == slug).exists()
 def load_source_form(slug: str) -> dict[str, object] | None:
    with database.connection_context():
        source = Source.get_or_none(Source.slug == slug)
        if source is None:
            return None
        job = Job.get(Job.source == source)
        form_data: dict[str, object] = {
            "name": source.name,
            "slug": source.slug,
            "source_type": source.source_type,
            "notes": source.notes,
            "spider_arguments": job.spider_arguments,
            "enabled": job.enabled,
            "cron_minute": job.cron_minute,
            "cron_hour": job.cron_hour,
            "cron_day_of_month": job.cron_day_of_month,
            "cron_day_of_week": job.cron_day_of_week,
            "cron_month": job.cron_month,
            "feed_url": "",
            "pangea_domain": "",
            "pangea_category": "",
            "content_format": "MOBILE_3",
            "content_type": "articles",
            "max_articles": "10",
            "oldest_article": "3",
            "only_newest": True,
            "include_authors": True,
            "exclude_media": False,
            "include_content": True,
        }
        if source.source_type == "feed":
            feed = SourceFeed.get(SourceFeed.source == source)
            form_data["feed_url"] = feed.feed_url
        else:
            pangea = SourcePangea.get(SourcePangea.source == source)
            form_data.update(
                {
                    "pangea_domain": pangea.domain,
                    "pangea_category": pangea.category_name,
                    "content_format": pangea.content_format,
                    "content_type": pangea.content_type,
                    "max_articles": str(pangea.max_articles),
                    "oldest_article": str(pangea.oldest_article),
                    "only_newest": pangea.only_newest,
                    "include_authors": pangea.include_authors,
                    "exclude_media": pangea.exclude_media,
                    "include_content": pangea.include_content,
                }
            )
        return form_data
 def create_source(
    *,
    name: str,
    slug: str,
    source_type: str,
    notes: str,
    spider_arguments: str,
    enabled: bool,
    cron_minute: str,
    cron_hour: str,
    cron_day_of_month: str,
    cron_day_of_week: str,
    cron_month: str,
    feed_url: str = "",
    pangea_domain: str = "",
    pangea_category: str = "",
    content_type: str = "",
    only_newest: bool = True,
    max_articles: int | None = None,
    oldest_article: int | None = None,
    include_authors: bool = True,
    exclude_media: bool = False,
    include_content: bool = True,
    content_format: str = "",
 ) -> Source:
    with database.connection_context():
        with database.atomic():
            source = Source.create(
                name=name,
                slug=slug,
                source_type=source_type,
                notes=notes,
            )
            if source_type == "feed":
                SourceFeed.create(
                    source=source,
                    feed_url=feed_url,
                )
            else:
                SourcePangea.create(
                    source=source,
                    domain=pangea_domain,
                    category_name=pangea_category,
                    content_type=content_type,
                    only_newest=only_newest,
                    max_articles=max_articles,
                    oldest_article=oldest_article,
                    include_authors=include_authors,
                    exclude_media=exclude_media,
                    include_content=include_content,
                    content_format=content_format,
                )
            Job.create(
                source=source,
                enabled=enabled,
                spider_arguments=spider_arguments,
                cron_minute=cron_minute,
                cron_hour=cron_hour,
                cron_day_of_month=cron_day_of_month,
                cron_day_of_week=cron_day_of_week,
                cron_month=cron_month,
            )
            return source
 def update_source(
    source_slug: str,
    *,
    name: str,
    slug: str,
    source_type: str,
    notes: str,
    spider_arguments: str,
    enabled: bool,
    cron_minute: str,
    cron_hour: str,
    cron_day_of_month: str,
    cron_day_of_week: str,
    cron_month: str,
    feed_url: str = "",
    pangea_domain: str = "",
    pangea_category: str = "",
    content_type: str = "",
    only_newest: bool = True,
    max_articles: int | None = None,
    oldest_article: int | None = None,
    include_authors: bool = True,
    exclude_media: bool = False,
    include_content: bool = True,
    content_format: str = "",
 ) -> Source | None:
    with database.connection_context():
        with database.atomic():
            source = Source.get_or_none(Source.slug == source_slug)
            if source is None:
                return None
            source.name = name
            source.notes = notes
            source.source_type = source_type
            source.save()
            job = Job.get(Job.source == source)
            job.enabled = enabled
            job.spider_arguments = spider_arguments
            job.cron_minute = cron_minute
            job.cron_hour = cron_hour
            job.cron_day_of_month = cron_day_of_month
            job.cron_day_of_week = cron_day_of_week
            job.cron_month = cron_month
            job.save()
            if source_type == "feed":
                SourcePangea.delete().where(SourcePangea.source == source).execute()
                feed = SourceFeed.get_or_none(SourceFeed.source == source)
                if feed is None:
                    SourceFeed.create(source=source, feed_url=feed_url)
                else:
                    feed.feed_url = feed_url
                    feed.save()
            else:
                SourceFeed.delete().where(SourceFeed.source == source).execute()
                pangea = SourcePangea.get_or_none(SourcePangea.source == source)
                if pangea is None:
                    SourcePangea.create(
                        source=source,
                        domain=pangea_domain,
                        category_name=pangea_category,
                        content_type=content_type,
                        only_newest=only_newest,
                        max_articles=max_articles,
                        oldest_article=oldest_article,
                        include_authors=include_authors,
                        exclude_media=exclude_media,
                        include_content=include_content,
                        content_format=content_format,
                    )
                else:
                    pangea.domain = pangea_domain
                    pangea.category_name = pangea_category
                    pangea.content_type = content_type
                    pangea.only_newest = only_newest
                    pangea.max_articles = max_articles
                    pangea.oldest_article = oldest_article
                    pangea.include_authors = include_authors
                    pangea.exclude_media = exclude_media
                    pangea.include_content = include_content
                    pangea.content_format = content_format
                    pangea.save()
            return source
 def delete_job_source(job_id: int) -> bool:
    with database.connection_context():
        with database.atomic():
            job = Job.get_or_none(id=job_id)
            if job is None:
                return False
            source = Source.get_by_id(job.source_id)
            return source.delete_instance() > 0
 def load_sources() -> tuple[dict[str, object], ...]:
    with database.connection_context():
        sources = tuple(Source.select().order_by(Source.created_at.desc()))
        source_ids = tuple(int(source.get_id()) for source in sources)
        if not source_ids:
            return ()
        jobs = {
            job.source_id: job for job in Job.select().where(Job.source.in_(source_ids))
        }
        feed_configs = {
            config.source_id: config
            for config in SourceFeed.select().where(SourceFeed.source.in_(source_ids))
        }
        pangea_configs = {
            config.source_id: config
            for config in SourcePangea.select().where(
                SourcePangea.source.in_(source_ids)
            )
        }
        return tuple(
            _project_source(source, jobs, feed_configs, pangea_configs)
            for source in sources
        )
 def _project_source(
    source: "Source",
    jobs: dict[int, "Job"],
    feed_configs: dict[int, "SourceFeed"],
    pangea_configs: dict[int, "SourcePangea"],
 ) -> dict[str, object]:
    source_id = int(source.get_id())
    job = jobs[source_id]
    if source.source_type == "feed":
        upstream = feed_configs[source_id].feed_url
        source_type = "Feed"
    else:
        pangea = pangea_configs[source_id]
        upstream = f"{pangea.domain} / {pangea.category_name}"
        source_type = "Pangea"
    return {
        "name": source.name,
        "slug": source.slug,
        "source_type": source_type,
        "upstream": upstream,
        "schedule": (
            f"cron: {job.cron_minute} {job.cron_hour} {job.cron_day_of_month} "
            f"{job.cron_month} {job.cron_day_of_week}"
        ),
        "last_run": "Never run",
        "state": "Enabled" if job.enabled else "Disabled",
        "state_tone": "scheduled" if job.enabled else "idle",
    }
 class BaseModel(Model):
    class Meta:
        database = database
 class Source(BaseModel):
    created_at = DateTimeField(default=utc_now)
    updated_at = DateTimeField(default=utc_now)
    name = TextField()
    slug = TextField(unique=True)
    source_type = TextField(constraints=[Check("source_type IN ('feed', 'pangea')")])
    notes = TextField(default="")
    class Meta:
        table_name = "source"
 class SourceFeed(BaseModel):
    source = ForeignKeyField(Source, primary_key=True, backref="feed_config")
    feed_url = TextField()
    etag = TextField(null=True)
    last_modified = TextField(null=True)
    class Meta:
        table_name = "source_feed"
 class SourcePangea(BaseModel):
    source = ForeignKeyField(Source, primary_key=True, backref="pangea_config")
    domain = TextField()
    category_name = TextField()
    content_type = TextField()
    only_newest = BooleanField()
    max_articles = IntegerField()
    oldest_article = IntegerField()
    include_authors = BooleanField()
    exclude_media = BooleanField()
    include_content = BooleanField()
    content_format = TextField()
    class Meta:
        table_name = "source_pangea"
 class Job(BaseModel):
    source = ForeignKeyField(Source, unique=True, backref="job")
    created_at = DateTimeField(default=utc_now)
    updated_at = DateTimeField(default=utc_now)
    enabled = BooleanField()
    spider_arguments = TextField(default="")
    cron_minute = TextField()
    cron_hour = TextField()
    cron_day_of_month = TextField()
    cron_day_of_week = TextField()
    cron_month = TextField()
    class Meta:
        table_name = "job"
 class JobExecution(BaseModel):
    job = ForeignKeyField(Job, backref="executions")
    created_at = DateTimeField(default=utc_now)
    started_at = DateTimeField(null=True)
    ended_at = DateTimeField(null=True)
    stop_requested_at = DateTimeField(null=True)
    running_status = IntegerField(
        default=JobExecutionStatus.PENDING,
        constraints=[Check("running_status BETWEEN 0 AND 4")],
    )
    requests_count = IntegerField(default=0)
    items_count = IntegerField(default=0)
    warnings_count = IntegerField(default=0)
    errors_count = IntegerField(default=0)
    bytes_count = IntegerField(default=0)
    retries_count = IntegerField(default=0)
    exceptions_count = IntegerField(default=0)
    cache_size_count = IntegerField(default=0)
    cache_object_count = IntegerField(default=0)
    raw_stats = TextField(default="{}")
    class Meta:
        table_name = "job_execution"
--- a/repub/pages/init.py
+++ b/repub/pages/init.py
@ -0,0 +1,15 @@
 from repub.pages.dashboard import dashboard_page, dashboard_page_with_data
 from repub.pages.runs import execution_logs_page, runs_page
 from repub.pages.shim import shim_page
 from repub.pages.sources import create_source_page, edit_source_page, sources_page
 __all__ = [
    "create_source_page",
    "dashboard_page",
    "dashboard_page_with_data",
    "edit_source_page",
    "execution_logs_page",
    "runs_page",
    "shim_page",
    "sources_page",
 ]
--- a/repub/pages/dashboard.py
+++ b/repub/pages/dashboard.py
@ -0,0 +1,267 @@
 from __future__ import annotations
 from collections.abc import Mapping
 import htpy as h
 from htpy import Node, Renderable
 from repub.components import (
    admin_sidebar,
    header_action_link,
    inline_button,
    inline_link,
    muted_action_link,
    stat_card,
    status_badge,
    table_section,
 )
 def _text(values: Mapping[str, object], key: str) -> str:
    return str(values[key])
 def _running_execution_row(execution: Mapping[str, object]) -> tuple[Node, ...]:
    status_tone = "running" if _text(execution, "status") != "Succeeded" else "done"
    return (
        h.div[
            h.div(class_="font-semibold text-slate-950")[_text(execution, "source")],
            h.p(class_="mt-0.5 font-mono text-[11px] text-slate-500")[
                _text(execution, "slug")
            ],
        ],
        h.div[
            h.p(class_="font-medium text-slate-900")[
                f"#{_text(execution, 'execution_id')}"
            ],
            h.p(class_="mt-0.5 text-[11px] text-slate-500")[
                f"job {_text(execution, 'job_id')}"
            ],
        ],
        h.div[
            h.p(class_="font-medium text-slate-900")[_text(execution, "started_at")],
            h.p(class_="mt-0.5 text-[11px] text-slate-500")[
                _text(execution, "runtime")
            ],
        ],
        status_badge(label=_text(execution, "status"), tone=status_tone),
        h.div(class_="min-w-56 whitespace-normal")[
            h.p(class_="font-medium text-slate-900")[_text(execution, "stats")],
            h.p(class_="mt-0.5 text-[11px] text-slate-500")[_text(execution, "worker")],
        ],
        h.div(class_="flex flex-nowrap items-center gap-3")[
            inline_link(
                href=_text(execution, "log_href"),
                label="View log",
                tone="amber",
            ),
            inline_button(label="Stop", tone="danger"),
        ],
    )
 def dashboard_header() -> Renderable:
    return h.section[
        h.div(
            class_="flex flex-col gap-4 sm:flex-row sm:items-start sm:justify-between"
        )[
            h.div[
                h.h1(class_="text-3xl font-semibold tracking-tight text-slate-950")[
                    "Republisher"
                ],
            ],
            h.div(class_="flex flex-wrap gap-2")[
                header_action_link(href="/sources/create", label="Create source"),
                muted_action_link(href="/sources", label="View sources"),
            ],
        ]
    ]
 def operational_snapshot(*, snapshot: Mapping[str, str] | None = None) -> Renderable:
    values = snapshot or {
        "running_now": "0",
        "upcoming_today": "0",
        "failures_24h": "0",
        "artifact_footprint": "0 B",
    }
    return h.section[
        h.div(class_="mb-3 flex items-end justify-between gap-4")[
            h.div[
                h.p(
                    class_="text-xs font-semibold uppercase tracking-[0.22em] text-slate-500"
                )["Overview"],
                h.h2(class_="mt-1 text-xl font-semibold tracking-tight text-slate-950")[
                    "Operational snapshot"
                ],
            ],
        ],
        h.dl(class_="grid gap-3 md:grid-cols-2 xl:grid-cols-4")[
            stat_card(
                label="Running now",
                value=values["running_now"],
                detail="Currently active job executions.",
            ),
            stat_card(
                label="Upcoming today",
                value=values["upcoming_today"],
                detail="Enabled jobs that are ready for their next run.",
            ),
            stat_card(
                label="Failures in 24h",
                value=values["failures_24h"],
                detail="Recent failed executions recorded by the scheduler.",
            ),
            stat_card(
                label="Artifact footprint",
                value=values["artifact_footprint"],
                detail="Current artifact size under the output path.",
            ),
        ],
    ]
 def running_executions_table(
    *, running_executions: tuple[Mapping[str, object], ...] | None = None
 ) -> Renderable:
    rows = tuple(
        _running_execution_row(execution) for execution in (running_executions or ())
    )
    headers = ("Source", "Execution", "Started", "Status", "Stats", "Actions")
    def render_row(row: tuple[Node, ...]) -> Renderable:
        first_cell, *other_cells = row
        return h.tr(class_="align-top")[
            h.td(class_="py-3 pr-6 pl-4 text-sm font-medium text-slate-950 sm:pl-4")[
                first_cell
            ],
            (
                h.td(
                    class_="px-3 py-3 align-top text-sm whitespace-nowrap text-slate-600"
                )[cell]
                for cell in other_cells
            ),
        ]
    body_rows: Node
    if rows:
        body_rows = (render_row(row) for row in rows)
    else:
        body_rows = h.tr[
            h.td(
                colspan=str(len(headers)),
                class_="px-4 py-8 text-center text-sm text-slate-500",
            )["No job executions are running."]
        ]
    return h.section[
        h.div(class_="mb-3 flex items-end justify-between gap-4")[
            h.div[
                h.p(
                    class_="text-xs font-semibold uppercase tracking-[0.22em] text-amber-600"
                )["Live work"],
                h.h2(class_="mt-1 text-xl font-semibold text-slate-950")[
                    "Running executions"
                ],
            ],
            muted_action_link(href="/runs", label="Open runs"),
        ],
        h.div(
            class_="overflow-hidden rounded-2xl bg-white shadow-sm ring-1 ring-slate-200"
        )[
            h.div(class_="overflow-x-auto")[
                h.table(
                    class_="w-full min-w-[70rem] divide-y divide-slate-200 table-auto"
                )[
                    h.thead(class_="bg-stone-50")[
                        h.tr[
                            (
                                h.th(
                                    scope="col",
                                    class_="px-3 py-2.5 text-left text-[11px] font-semibold uppercase tracking-[0.18em] whitespace-nowrap text-slate-500 first:pl-4",
                                )[header]
                                for header in headers
                            )
                        ]
                    ],
                    h.tbody(class_="divide-y divide-slate-200 bg-white")[body_rows],
                ]
            ]
        ],
    ]
 def _source_feed_row(source_feed: Mapping[str, object]) -> tuple[Node, ...]:
    last_updated_iso = source_feed.get("last_updated_iso")
    last_updated = (
        h.time(
            datetime=str(last_updated_iso),
            title=str(last_updated_iso),
            class_="font-medium text-slate-900",
        )[str(source_feed["last_updated"])]
        if last_updated_iso is not None
        else h.p(class_="font-medium text-slate-900")[str(source_feed["last_updated"])]
    )
    return (
        h.div[
            h.div(class_="font-semibold text-slate-950")[str(source_feed["source"])],
            h.p(class_="mt-0.5 font-mono text-[11px] text-slate-500")[
                str(source_feed["slug"])
            ],
        ],
        h.div(class_="min-w-64")[
            inline_link(
                href=str(source_feed["feed_href"]),
                label=str(source_feed["feed_href"]),
                tone="amber",
            )
        ],
        status_badge(
            label=str(source_feed["feed_status_label"]),
            tone=str(source_feed["feed_status_tone"]),
        ),
        last_updated,
        h.p(class_="font-medium text-slate-900")[
            str(source_feed["artifact_footprint"])
        ],
    )
 def published_feeds_table(
    *, source_feeds: tuple[Mapping[str, object], ...] | None = None
 ) -> Renderable:
    rows = tuple(_source_feed_row(source_feed) for source_feed in (source_feeds or ()))
    return table_section(
        eyebrow="Published feeds",
        title="Published feeds",
        empty_message="No feeds have been published yet.",
        headers=("Source", "Feed URL", "Status", "Last updated", "Disk usage"),
        rows=rows,
        actions=muted_action_link(href="/sources", label="Manage sources"),
    )
 def dashboard_page() -> Renderable:
    return dashboard_page_with_data()
 def dashboard_page_with_data(
    *,
    snapshot: Mapping[str, str] | None = None,
    running_executions: tuple[Mapping[str, object], ...] | None = None,
    source_feeds: tuple[Mapping[str, object], ...] | None = None,
 ) -> Renderable:
    return h.main(
        id="morph",
        class_="min-h-screen lg:grid lg:grid-cols-[18rem_minmax(0,1fr)]",
    )[
        admin_sidebar(current_path="/"),
        h.div(class_="px-4 py-4 sm:px-5 lg:px-6 lg:py-5")[
            h.div(class_="mx-auto max-w-7xl space-y-5")[
                dashboard_header(),
                operational_snapshot(snapshot=snapshot),
                running_executions_table(running_executions=running_executions),
                published_feeds_table(source_feeds=source_feeds),
            ]
        ],
    ]
--- a/repub/pages/runs.py
+++ b/repub/pages/runs.py
@ -0,0 +1,358 @@
 from __future__ import annotations
 from collections.abc import Mapping
 import htpy as h
 from htpy import Node, Renderable
 from repub.components import (
    inline_link,
    muted_action_link,
    page_shell,
    section_card,
    status_badge,
    table_section,
 )
 def _action_button(
    *,
    label: str,
    tone: str = "default",
    disabled: bool = False,
    post_path: str | None = None,
 ) -> Renderable:
    classes = {
        "default": "bg-stone-100 text-slate-700 hover:bg-stone-200",
        "danger": "bg-rose-50 text-rose-700 hover:bg-rose-100",
    }
    class_name = (
        "cursor-not-allowed bg-slate-100 text-slate-400" if disabled else classes[tone]
    )
    attributes: dict[str, str] = {}
    if post_path is not None and not disabled:
        attributes["data-on:pointerdown"] = f"@post('{post_path}')"
    return h.button(
        attributes,
        type="button",
        disabled=disabled,
        class_=(
            "inline-flex items-center whitespace-nowrap rounded-full px-3 py-1.5 "
            f"text-sm font-semibold transition {class_name}"
        ),
    )[label]
 def _text(values: Mapping[str, object], key: str) -> str:
    return str(values[key])
 def _maybe_text(values: Mapping[str, object], key: str) -> str | None:
    value = values.get(key)
    if value in {None, ""}:
        return None
    return str(value)
 def _flag(values: Mapping[str, object], key: str) -> bool:
    return bool(values[key])
 def _running_row(execution: Mapping[str, object]) -> tuple[Node, ...]:
    return (
        h.div[
            h.div(class_="font-semibold text-slate-950")[_text(execution, "source")],
            h.p(class_="mt-1 font-mono text-xs text-slate-500")[
                _text(execution, "slug")
            ],
        ],
        h.div[
            h.p(class_="font-medium text-slate-900")[
                f"#{_text(execution, 'execution_id')}"
            ],
        ],
        h.div[
            h.p(class_="font-medium text-slate-900")[_text(execution, "started_at")],
            h.p(class_="mt-1 text-xs text-slate-500")[_text(execution, "runtime")],
        ],
        status_badge(label=_text(execution, "status"), tone="running"),
        h.div(class_="min-w-56 whitespace-normal")[
            h.p(class_="font-medium text-slate-900")[_text(execution, "stats")],
            h.p(class_="mt-1 text-xs text-slate-500")[_text(execution, "worker")],
        ],
        h.div(class_="flex flex-nowrap items-center gap-3")[
            inline_link(
                href=_text(execution, "log_href"),
                label="View log",
                tone="amber",
            ),
            _action_button(
                label="Stop",
                tone="danger",
                post_path=_maybe_text(execution, "cancel_post_path"),
            ),
        ],
    )
 def _upcoming_row(job: Mapping[str, object]) -> tuple[Node, ...]:
    next_run_at = _maybe_text(job, "next_run_at")
    next_run_label: Node = h.p(class_="font-medium text-slate-900")[
        _text(job, "next_run")
    ]
    if next_run_at is not None:
        next_run_label = h.time(
            {
                "data-next-run-at": next_run_at,
                "title": next_run_at,
            },
            datetime=next_run_at,
            class_="font-medium text-slate-900",
        )[_text(job, "next_run")]
    return (
        h.div[
            h.div(class_="font-semibold text-slate-950")[_text(job, "source")],
            h.p(class_="mt-1 font-mono text-xs text-slate-500")[_text(job, "slug")],
        ],
        h.div[next_run_label,],
        h.p(class_="font-mono text-xs text-slate-600")[_text(job, "schedule")],
        status_badge(
            label=_text(job, "enabled_label"),
            tone=_text(job, "enabled_tone"),
        ),
        h.p(class_="max-w-40 whitespace-normal text-sm text-slate-500")[
            _text(job, "run_reason")
        ],
        h.div(class_="flex flex-nowrap items-center gap-2")[
            _action_button(
                label="Run now",
                disabled=_flag(job, "run_disabled"),
                post_path=_maybe_text(job, "run_post_path"),
            ),
            _action_button(
                label=_text(job, "toggle_label"),
                post_path=_maybe_text(job, "toggle_post_path"),
            ),
            _action_button(
                label="Delete",
                tone="danger",
                post_path=_maybe_text(job, "delete_post_path"),
            ),
        ],
    )
 def _completed_row(execution: Mapping[str, object]) -> tuple[Node, ...]:
    ended_at = _maybe_text(execution, "ended_at_iso")
    ended_at_label: Node = h.p(class_="font-medium text-slate-900")[
        _text(execution, "ended_at")
    ]
    if ended_at is not None:
        ended_at_label = h.time(
            {
                "data-ended-at": ended_at,
                "title": ended_at,
            },
            datetime=ended_at,
            class_="font-medium text-slate-900",
        )[_text(execution, "ended_at")]
    return (
        h.div[
            h.div(class_="font-semibold text-slate-950")[_text(execution, "source")],
            h.p(class_="mt-1 font-mono text-xs text-slate-500")[
                _text(execution, "slug")
            ],
        ],
        h.div[
            h.p(class_="font-medium text-slate-900")[
                f"#{_text(execution, 'execution_id')}"
            ],
        ],
        h.div[
            ended_at_label,
            h.p(class_="mt-1 text-xs text-slate-500")[_text(execution, "summary")],
        ],
        status_badge(
            label=_text(execution, "status"),
            tone=_text(execution, "status_tone"),
        ),
        h.div(class_="min-w-48 whitespace-normal")[
            h.p(class_="font-medium text-slate-900")[_text(execution, "stats")]
        ],
        inline_link(
            href=_text(execution, "log_href"),
            label="View log",
            tone="amber",
        ),
    )
 def runs_page(
    *,
    running_executions: tuple[Mapping[str, object], ...] | None = None,
    upcoming_jobs: tuple[Mapping[str, object], ...] | None = None,
    completed_executions: tuple[Mapping[str, object], ...] | None = None,
 ) -> Renderable:
    running_items = running_executions or ()
    upcoming_items = upcoming_jobs or ()
    completed_items = completed_executions or ()
    running_rows = tuple(_running_row(execution) for execution in running_items)
    upcoming_rows = tuple(_upcoming_row(job) for job in upcoming_items)
    completed_rows = tuple(_completed_row(execution) for execution in completed_items)
    return page_shell(
        current_path="/runs",
        eyebrow="Execution control",
        title="Runs",
        actions=muted_action_link(href="/sources", label="Back to sources"),
        content=(
            table_section(
                eyebrow="Live work",
                title="Running job executions",
                empty_message="No job executions are running.",
                headers=(
                    "Source",
                    "Execution",
                    "Started",
                    "Status",
                    "Stats",
                    "Actions",
                ),
                rows=running_rows,
            ),
            table_section(
                eyebrow="Queue",
                title="Upcoming jobs",
                empty_message="No jobs are scheduled.",
                headers=(
                    "Source",
                    "Next run",
                    "Cron",
                    "State",
                    "Run now",
                    "Actions",
                ),
                rows=upcoming_rows,
            ),
            table_section(
                eyebrow="History",
                title="Completed job executions",
                empty_message="No job executions have completed yet.",
                headers=(
                    "Source",
                    "Execution",
                    "Ended",
                    "Status",
                    "Summary",
                    "Log",
                ),
                rows=completed_rows,
            ),
            h.script[
                """
 window.repubFormatNextRuns = window.repubFormatNextRuns || (() => {
  const relativeFormatter = new Intl.RelativeTimeFormat(undefined, { numeric: 'auto' });
  const absoluteFormatter = new Intl.DateTimeFormat(undefined, {
    dateStyle: 'medium',
    timeStyle: 'short',
    timeZoneName: 'short',
  });
  const formatRelative = (targetDate) => {
    const diffSeconds = Math.round((targetDate.getTime() - Date.now()) / 1000);
    const units = [
      ['day', 86400],
      ['hour', 3600],
      ['minute', 60],
      ['second', 1],
    ];
    for (const [unit, size] of units) {
      if (Math.abs(diffSeconds) >= size || unit === 'second') {
        return relativeFormatter.format(Math.round(diffSeconds / size), unit);
      }
    }
    return relativeFormatter.format(0, 'second');
  };
  const format = () => {
    document.querySelectorAll('time[data-next-run-at], time[data-ended-at]').forEach((element) => {
      const relativeAt =
        element.getAttribute('data-next-run-at') ??
        element.getAttribute('data-ended-at');
      if (!relativeAt) return;
      const targetDate = new Date(relativeAt);
      if (Number.isNaN(targetDate.getTime())) return;
      element.textContent = formatRelative(targetDate);
      element.title = absoluteFormatter.format(targetDate);
    });
  };
  format();
  if (!window.repubNextRunTimer) {
    window.repubNextRunTimer = window.setInterval(format, 30000);
  }
 });
 window.repubFormatNextRuns();
                """
            ],
        ),
    )
 def execution_logs_page(
    *,
    job_id: int,
    execution_id: int,
    log_view: Mapping[str, object] | None = None,
 ) -> Renderable:
    if log_view is None:
        log_view = {
            "title": f"Job {job_id} / execution {execution_id}",
            "description": "",
            "status_label": "Unavailable",
            "status_tone": "failed",
            "log_text": "",
            "error_message": "Execution log is only available from persisted job runs.",
        }
    error_message = _maybe_text(log_view, "error_message")
    error_notice = (
        h.div(
            class_="mt-3 rounded-2xl bg-rose-50 px-4 py-3 text-sm font-medium text-rose-800"
        )[
            h.p["Execution log unavailable"],
            h.p(class_="mt-1 font-normal")[error_message],
        ]
        if error_message is not None
        else None
    )
    return page_shell(
        current_path=f"/job/{job_id}/execution/{execution_id}/logs",
        eyebrow="Execution log",
        title=_text(log_view, "title"),
        actions=muted_action_link(href="/runs", label="Back to runs"),
        content=(
            section_card(
                content=(
                    h.div(class_="flex items-end justify-between gap-4")[
                        h.div[
                            h.p(
                                class_="text-xs font-semibold uppercase tracking-[0.22em] text-amber-600"
                            )["Route"],
                            h.h2(class_="mt-2 text-xl font-semibold text-slate-950")[
                                f"/job/{job_id}/execution/{execution_id}/logs"
                            ],
                        ],
                        status_badge(
                            label=_text(log_view, "status_label"),
                            tone=_text(log_view, "status_tone"),
                        ),
                    ],
                    error_notice,
                    h.pre(
                        class_="mt-3 overflow-x-auto rounded-[1.5rem] bg-slate-950 p-5 text-xs leading-6 text-emerald-200"
                    )[_text(log_view, "log_text")],
                )
            ),
        ),
    )
--- a/repub/pages/shim.py
+++ b/repub/pages/shim.py
@ -0,0 +1,71 @@
 from __future__ import annotations
 import htpy as h
 from htpy import Node, Renderable
 from repub.components import admin_sidebar
 ON_LOAD_JS = (
    "@post(window.location.pathname + "
    "(window.location.search + '&u=').replace(/^&/,'?'), "
    "{retryMaxCount: Infinity})"
 )
 TAB_ID_JS = "self.crypto.randomUUID().substring(0,8)"
 def shim_page(
    *, datastar_src: str, current_path: str, head: Node | None = None
 ) -> Renderable:
    return h.html(lang="en")[
        h.head[
            h.meta(charset="UTF-8"),
            head,
            h.script(id="js", defer=True, type="module", src=datastar_src),
            h.meta(name="viewport", content="width=device-width, initial-scale=1.0"),
        ],
        h.body[
            h.div({"data-signals:tabid": TAB_ID_JS}),
            h.div(
                {
                    "data-init": ON_LOAD_JS,
                    "data-on:online__window": ON_LOAD_JS,
                }
            ),
            h.noscript["Your browser does not support JavaScript!"],
            h.main(
                id="morph",
                class_="min-h-screen lg:grid lg:grid-cols-[18rem_minmax(0,1fr)]",
            )[
                admin_sidebar(current_path=current_path),
                h.div(class_="px-4 py-4 sm:px-5 lg:px-6 lg:py-5")[
                    h.div(class_="mx-auto max-w-7xl space-y-5")[
                        h.section[
                            h.div(
                                class_="flex flex-col gap-4 sm:flex-row sm:items-start sm:justify-between"
                            )[
                                h.div(class_="max-w-3xl")[
                                    h.p(
                                        class_="text-xs font-semibold uppercase tracking-[0.22em] text-amber-600"
                                    )["Connecting"],
                                    h.h1(
                                        class_="mt-1 text-3xl font-semibold tracking-tight text-slate-950"
                                    )["Loading page"],
                                ],
                            ]
                        ],
                        h.section(
                            class_="overflow-hidden rounded-2xl bg-white shadow-sm ring-1 ring-slate-200"
                        )[
                            h.div(class_="animate-pulse space-y-4 p-6")[
                                h.div(class_="h-5 w-40 rounded-full bg-stone-100"),
                                h.div(class_="h-12 rounded-2xl bg-stone-100"),
                                h.div(class_="h-12 rounded-2xl bg-stone-100"),
                                h.div(class_="h-12 rounded-2xl bg-stone-100"),
                            ]
                        ],
                    ]
                ],
            ],
        ],
    ]
--- a/repub/pages/sources.py
+++ b/repub/pages/sources.py
@ -0,0 +1,425 @@
 from __future__ import annotations
 from collections.abc import Mapping
 import htpy as h
 from htpy import Node, Renderable
 from repub.components import (
    header_action_link,
    inline_link,
    input_field,
    muted_action_link,
    page_shell,
    section_card,
    select_field,
    status_badge,
    table_section,
    textarea_field,
    toggle_field,
 )
 PANGEA_CONTENT_FORMATS = (
    "WTF_0",
    "TEXT_ONLY",
    "WTF_1",
    "MOBILE_1",
    "MOBILE_2",
    "MOBILE_3",
    "WTF_2",
    "XML_TX",
    "JSON",
 )
 PANGEA_CONTENT_TYPES = (
    "articles",
    "audioclips",
    "videoclips",
    "breakingnews",
    "mostpopular",
    "topstories",
 )
 def _value(source: Mapping[str, object] | None, key: str, default: str = "") -> str:
    if source is None:
        return default
    return str(source.get(key, default))
 def _checked(source: Mapping[str, object] | None, key: str, default: bool) -> bool:
    if source is None:
        return default
    value = source.get(key, default)
    return bool(value)
 def _source_row(source: Mapping[str, object]) -> tuple[Node, ...]:
    return (
        h.div[
            h.div(class_="font-semibold text-slate-950")[str(source["name"])],
            h.p(class_="mt-1 font-mono text-xs text-slate-500")[str(source["slug"])],
        ],
        h.p(class_="font-medium whitespace-nowrap text-slate-900")[
            str(source["source_type"])
        ],
        h.p(class_="max-w-sm truncate font-mono text-xs text-slate-600")[
            str(source["upstream"])
        ],
        h.p(class_="font-medium whitespace-nowrap text-slate-900")[
            str(source["schedule"])
        ],
        h.div(class_="min-w-32 whitespace-normal")[
            status_badge(
                label=str(source["state"]),
                tone=str(source["state_tone"]),
            ),
            h.p(class_="mt-2 text-xs text-slate-500")[str(source["last_run"])],
        ],
        h.div(class_="flex flex-nowrap items-center gap-3")[
            inline_link(
                href=f"/sources/{source['slug']}/edit", label="Edit", tone="amber"
            ),
            inline_link(href="/runs", label="View runs"),
        ],
    )
 def sources_table(
    *, sources: tuple[Mapping[str, object], ...] | None = None
 ) -> Renderable:
    rows = tuple(_source_row(source) for source in (sources or ()))
    return table_section(
        eyebrow="Inventory",
        title="Sources",
        empty_message="No sources yet.",
        headers=("Source", "Type", "Upstream", "Schedule", "Job state", "Actions"),
        rows=rows,
        actions=header_action_link(href="/sources/create", label="Create source"),
    )
 def sources_page(
    *, sources: tuple[Mapping[str, object], ...] | None = None
 ) -> Renderable:
    return page_shell(
        current_path="/sources",
        eyebrow="Source management",
        title="Sources",
        actions=header_action_link(href="/sources/create", label="Create source"),
        content=sources_table(sources=sources),
    )
 def source_form(
    *,
    mode: str,
    action_path: str,
    source: Mapping[str, object] | None = None,
 ) -> Renderable:
    source_type = _value(source, "source_type", "pangea")
    slug = _value(source, "slug")
    title = "Source and job setup" if mode == "create" else "Edit source"
    eyebrow = "Create" if mode == "create" else "Edit"
    status_label = "New source" if mode == "create" else "Existing source"
    submit_label = "Create source" if mode == "create" else "Save changes"
    initial_signals = "{sourceType: 'pangea'}"
    if mode == "edit":
        initial_signals = f"{{sourceType: '{source_type}', sourceSlug: '{slug}'}}"
    return section_card(
        content=(
            h.div(
                class_="flex flex-col gap-3 sm:flex-row sm:items-end sm:justify-between"
            )[
                h.div[
                    h.p(
                        class_="text-xs font-semibold uppercase tracking-[0.22em] text-amber-600"
                    )[eyebrow],
                    h.h2(class_="mt-2 text-xl font-semibold text-slate-950")[title],
                ],
                status_badge(label=status_label, tone="scheduled"),
            ],
            h.form(
                {
                    "data-signals": "{_formError: '', _formSuccess: ''}",
                    "data-signals__ifmissing": initial_signals,
                    "data-on:submit": f"@post('{action_path}')",
                },
                class_="mt-5 space-y-6",
            )[
                h.div(
                    {
                        "data-show": "$_formError !== ''",
                        "data-text": "$_formError",
                    },
                    class_="rounded-2xl bg-rose-50 px-4 py-3 text-sm font-medium text-rose-800",
                ),
                h.div(
                    {
                        "data-show": "$_formSuccess !== ''",
                        "data-text": "$_formSuccess",
                    },
                    class_="rounded-2xl bg-emerald-100 px-4 py-3 text-sm font-medium text-emerald-800",
                ),
                h.div(class_="grid gap-4 md:grid-cols-2")[
                    input_field(
                        label="Source name",
                        field_id="source-name",
                        value=_value(source, "name"),
                        signal_name="sourceName",
                    ),
                    input_field(
                        label="Slug",
                        field_id="source-slug",
                        value=slug,
                        help_text="Immutable after creation.",
                        signal_name="sourceSlug",
                        disabled=mode == "edit",
                    ),
                    h.div[
                        h.label(
                            for_="source-type",
                            class_="block text-sm font-medium text-slate-900",
                        )["Source type"],
                        h.select(
                            {"data-bind": "sourceType"},
                            id="source-type",
                            name="source-type",
                            class_="mt-2 block w-full rounded-2xl border-0 bg-white px-3.5 py-2.5 text-sm text-slate-900 shadow-sm ring-1 ring-slate-200 focus:outline-hidden focus:ring-2 focus:ring-amber-500",
                        )[
                            h.option(value="feed", selected=source_type == "feed")[
                                "feed"
                            ],
                            h.option(value="pangea", selected=source_type == "pangea")[
                                "pangea"
                            ],
                        ],
                    ],
                ],
                h.div(
                    {"data-show": "$sourceType === 'feed'"},
                    class_="space-y-4 rounded-[1.5rem] bg-stone-50 p-5",
                )[
                    h.div[
                        h.p(
                            class_="text-xs font-semibold uppercase tracking-[0.22em] text-amber-600"
                        )["Feed source options"],
                        h.h3(class_="mt-2 text-lg font-semibold text-slate-950")[
                            "Feed settings"
                        ],
                    ],
                    h.div(class_="grid gap-4 md:grid-cols-2")[
                        input_field(
                            label="Feed URL",
                            field_id="feed-url",
                            value=_value(source, "feed_url"),
                            placeholder="https://example.com/feed.xml",
                            signal_name="feedUrl",
                        ),
                    ],
                ],
                h.div(
                    {"data-show": "$sourceType === 'pangea'"},
                    class_="space-y-4 rounded-[1.5rem] bg-stone-50 p-5",
                )[
                    h.div[
                        h.p(
                            class_="text-xs font-semibold uppercase tracking-[0.22em] text-amber-600"
                        )["Pangea source options"],
                        h.h3(class_="mt-2 text-lg font-semibold text-slate-950")[
                            "Pangea settings"
                        ],
                    ],
                    h.div(class_="grid gap-4 lg:grid-cols-3")[
                        input_field(
                            label="Pangea domain",
                            field_id="pangea-domain",
                            value=_value(source, "pangea_domain"),
                            signal_name="pangeaDomain",
                        ),
                        input_field(
                            label="Category name",
                            field_id="pangea-category",
                            value=_value(source, "pangea_category"),
                            signal_name="pangeaCategory",
                        ),
                        select_field(
                            label="Content format",
                            field_id="content-format",
                            options=PANGEA_CONTENT_FORMATS,
                            selected=_value(source, "content_format", "MOBILE_3"),
                            signal_name="contentFormat",
                        ),
                        select_field(
                            label="Content type",
                            field_id="content-type",
                            options=PANGEA_CONTENT_TYPES,
                            selected=_value(source, "content_type", "articles"),
                            signal_name="contentType",
                        ),
                        input_field(
                            label="Max articles",
                            field_id="max-articles",
                            value=_value(source, "max_articles", "10"),
                            signal_name="maxArticles",
                        ),
                        input_field(
                            label="Oldest article (days)",
                            field_id="oldest-article",
                            value=_value(source, "oldest_article", "3"),
                            signal_name="oldestArticle",
                        ),
                    ],
                    h.div(class_="grid gap-4 lg:grid-cols-3")[
                        toggle_field(
                            label="Only newest",
                            description="Limit Pangea syncs to the newest material available in the selected category.",
                            signal_name="onlyNewest",
                            checked=_checked(source, "only_newest", True),
                        ),
                        toggle_field(
                            label="Include authors",
                            description="Carry author bylines into mirrored output where upstream data exists.",
                            signal_name="includeAuthors",
                            checked=_checked(source, "include_authors", True),
                        ),
                        toggle_field(
                            label="Exclude media",
                            description="Skip image and media attachment mirroring for this source.",
                            signal_name="excludeMedia",
                            checked=_checked(source, "exclude_media", False),
                        ),
                        toggle_field(
                            label="Include content",
                            description="Store article body content in mirrored output when the upstream provides it.",
                            signal_name="includeContent",
                            checked=_checked(source, "include_content", True),
                        ),
                    ],
                ],
                h.div(class_="grid gap-4 lg:grid-cols-2")[
                    textarea_field(
                        label="Notes",
                        field_id="source-notes",
                        value=_value(source, "notes"),
                        signal_name="sourceNotes",
                    ),
                    textarea_field(
                        label="Spider arguments",
                        field_id="spider-arguments",
                        value=_value(
                            source,
                            "spider_arguments",
                            "language=en\ndownload_media=true",
                        ),
                        signal_name="spiderArguments",
                    ),
                ],
                h.div(
                    class_="grid gap-6 xl:grid-cols-[minmax(0,1.3fr)_minmax(20rem,0.9fr)]"
                )[
                    h.div(class_="rounded-[1.5rem] bg-stone-50 p-5")[
                        h.div[
                            h.h3(class_="text-lg font-semibold text-slate-950")[
                                "Cron schedule"
                            ],
                            h.p(class_="mt-1 text-sm text-slate-600")[
                                "Stored in UTC and displayed in the browser timezone."
                            ],
                        ],
                        h.div(class_="mt-5 grid gap-4 sm:grid-cols-2 xl:grid-cols-5")[
                            input_field(
                                label="Minute",
                                field_id="cron-minute",
                                value=_value(source, "cron_minute", "*/30"),
                                signal_name="cronMinute",
                            ),
                            input_field(
                                label="Hour",
                                field_id="cron-hour",
                                value=_value(source, "cron_hour", "*"),
                                signal_name="cronHour",
                            ),
                            input_field(
                                label="Day of month",
                                field_id="cron-day-of-month",
                                value=_value(source, "cron_day_of_month", "*"),
                                signal_name="cronDayOfMonth",
                            ),
                            input_field(
                                label="Day of week",
                                field_id="cron-day-of-week",
                                value=_value(source, "cron_day_of_week", "*"),
                                signal_name="cronDayOfWeek",
                            ),
                            input_field(
                                label="Month",
                                field_id="cron-month",
                                value=_value(source, "cron_month", "*"),
                                signal_name="cronMonth",
                            ),
                        ],
                    ],
                    h.div(class_="rounded-[1.5rem] bg-stone-50 p-5")[
                        h.p(
                            class_="text-xs font-semibold uppercase tracking-[0.22em] text-amber-600"
                        )["Job defaults"],
                        h.h3(class_="mt-2 text-lg font-semibold text-slate-950")[
                            "Initial job state"
                        ],
                        h.div(class_="mt-5 grid gap-4")[
                            toggle_field(
                                label="Job enabled",
                                description="Scheduler will consider the new job immediately after creation.",
                                signal_name="jobEnabled",
                                checked=_checked(source, "enabled", True),
                            ),
                        ],
                    ],
                ],
                h.div(
                    class_="flex flex-wrap justify-end gap-3 border-t border-slate-200 pt-6"
                )[
                    muted_action_link(href="/sources", label="Cancel"),
                    h.button(
                        type="submit",
                        class_="rounded-full bg-slate-950 px-4 py-2.5 text-sm font-semibold text-white transition hover:bg-slate-800",
                    )[submit_label],
                ],
            ],
        )
    )
 def create_source_page(*, action_path: str = "/actions/sources/create") -> Renderable:
    actions = (
        muted_action_link(href="/sources", label="Back to sources"),
        header_action_link(href="/runs", label="View runs"),
    )
    return page_shell(
        current_path="/sources/create",
        eyebrow="Source creation",
        title="Create source",
        actions=actions,
        content=source_form(mode="create", action_path=action_path),
    )
 def edit_source_page(
    *,
    slug: str,
    source: Mapping[str, object],
    action_path: str,
 ) -> Renderable:
    actions = (
        muted_action_link(href="/sources", label="Back to sources"),
        header_action_link(href="/runs", label="View runs"),
    )
    return page_shell(
        current_path=f"/sources/{slug}/edit",
        eyebrow="Source editing",
        title="Edit source",
        actions=actions,
        content=source_form(mode="edit", action_path=action_path, source=source),
    )
--- a/repub/spiders/rss_spider.py
+++ b/repub/spiders/rss_spider.py
@ -8,7 +8,7 @@ from scrapy.utils.spider import iterate_spider_output
 from repub.items import ChannelElementItem, ElementItem
 from repub.rss import CDATA, CONTENT, ITUNES, MEDIA, E, munge_cdata_html, normalize_date
-from repub.utils import FileType, determine_file_type, local_file_path
+from repub.utils import FileType, determine_file_type, local_file_path, local_image_path
 class BaseRssFeedSpider(Spider):
@ -34,13 +34,15 @@ class BaseRssFeedSpider(Spider):
    def rewrite_file_url(self, file_type: FileType, url):
        file_dir = self.settings["REPUBLISHER_FILE_DIR"]
        local_path = local_file_path(url)
        if file_type == FileType.IMAGE:
            file_dir = self.settings["REPUBLISHER_IMAGE_DIR"]
            local_path = local_image_path(url)
        elif file_type == FileType.VIDEO:
            file_dir = self.settings["REPUBLISHER_VIDEO_DIR"]
        elif file_type == FileType.AUDIO:
            file_dir = self.settings["REPUBLISHER_AUDIO_DIR"]
-        return f"/{file_dir}/{local_file_path(url)}"
+        return f"{file_dir}/{local_path}"
    def rewrite_image_url(self, url):
        return self.rewrite_file_url(FileType.IMAGE, url)
--- a/repub/sql/001_initial.sql
+++ b/repub/sql/001_initial.sql
@ -0,0 +1,98 @@
 CREATE TABLE IF NOT EXISTS source (
    id INTEGER PRIMARY KEY,
    created_at TEXT NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_at TEXT NOT NULL DEFAULT CURRENT_TIMESTAMP,
    name TEXT NOT NULL,
    slug TEXT NOT NULL UNIQUE,
    source_type TEXT NOT NULL CHECK (source_type IN ('feed', 'pangea')),
    notes TEXT NOT NULL DEFAULT ''
 );
 CREATE TABLE IF NOT EXISTS source_feed (
    source_id INTEGER PRIMARY KEY,
    feed_url TEXT NOT NULL,
    etag TEXT,
    last_modified TEXT,
    FOREIGN KEY (source_id) REFERENCES source(id) ON DELETE CASCADE
 );
 CREATE TABLE IF NOT EXISTS source_pangea (
    source_id INTEGER PRIMARY KEY,
    domain TEXT NOT NULL,
    category_name TEXT NOT NULL,
    content_type TEXT NOT NULL,
    only_newest INTEGER NOT NULL CHECK (only_newest IN (0, 1)),
    max_articles INTEGER NOT NULL,
    oldest_article INTEGER NOT NULL,
    include_authors INTEGER NOT NULL CHECK (include_authors IN (0, 1)),
    exclude_media INTEGER NOT NULL CHECK (exclude_media IN (0, 1)),
    include_content INTEGER NOT NULL CHECK (include_content IN (0, 1)),
    content_format TEXT NOT NULL,
    FOREIGN KEY (source_id) REFERENCES source(id) ON DELETE CASCADE
 );
 CREATE TABLE IF NOT EXISTS job (
    id INTEGER PRIMARY KEY,
    source_id INTEGER NOT NULL UNIQUE,
    created_at TEXT NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_at TEXT NOT NULL DEFAULT CURRENT_TIMESTAMP,
    enabled INTEGER NOT NULL CHECK (enabled IN (0, 1)),
    spider_arguments TEXT NOT NULL DEFAULT '',
    cron_minute TEXT NOT NULL,
    cron_hour TEXT NOT NULL,
    cron_day_of_month TEXT NOT NULL,
    cron_day_of_week TEXT NOT NULL,
    cron_month TEXT NOT NULL,
    FOREIGN KEY (source_id) REFERENCES source(id) ON DELETE CASCADE
 );
 CREATE TABLE IF NOT EXISTS job_execution (
    id INTEGER PRIMARY KEY,
    job_id INTEGER NOT NULL,
    created_at TEXT NOT NULL DEFAULT CURRENT_TIMESTAMP,
    started_at TEXT,
    ended_at TEXT,
    stop_requested_at TEXT,
    running_status INTEGER NOT NULL DEFAULT 0 CHECK (running_status BETWEEN 0 AND 4),
    requests_count INTEGER NOT NULL DEFAULT 0,
    items_count INTEGER NOT NULL DEFAULT 0,
    warnings_count INTEGER NOT NULL DEFAULT 0,
    errors_count INTEGER NOT NULL DEFAULT 0,
    bytes_count INTEGER NOT NULL DEFAULT 0,
    retries_count INTEGER NOT NULL DEFAULT 0,
    exceptions_count INTEGER NOT NULL DEFAULT 0,
    cache_size_count INTEGER NOT NULL DEFAULT 0,
    cache_object_count INTEGER NOT NULL DEFAULT 0,
    raw_stats TEXT NOT NULL DEFAULT '{}',
    FOREIGN KEY (job_id) REFERENCES job(id) ON DELETE CASCADE
 );
 CREATE INDEX IF NOT EXISTS job_enabled_idx
 ON job (enabled);
 CREATE INDEX IF NOT EXISTS job_execution_job_created_at_idx
 ON job_execution (job_id, created_at DESC);
 CREATE INDEX IF NOT EXISTS job_execution_status_started_at_idx
 ON job_execution (running_status, started_at DESC);
 CREATE INDEX IF NOT EXISTS job_execution_status_ended_at_idx
 ON job_execution (running_status, ended_at DESC);
 CREATE TRIGGER IF NOT EXISTS source_set_updated_at
 AFTER UPDATE ON source
 FOR EACH ROW
 BEGIN
    UPDATE source
    SET updated_at = CURRENT_TIMESTAMP
    WHERE id = NEW.id;
 END;
 CREATE TRIGGER IF NOT EXISTS job_set_updated_at
 AFTER UPDATE ON job
 FOR EACH ROW
 BEGIN
    UPDATE job
    SET updated_at = CURRENT_TIMESTAMP
    WHERE id = NEW.id;
 END;
--- a/repub/static/app.css
+++ b/repub/static/app.css
--- a/repub/static/app.tailwind.css
+++ b/repub/static/app.tailwind.css
@ -0,0 +1 @@
@import "tailwindcss" source("../");
--- a/repub/static/datastar@1.0.0-RC.8.js
+++ b/repub/static/datastar@1.0.0-RC.8.js
--- a/repub/web.py
+++ b/repub/web.py
@ -1,27 +1,503 @@
 from __future__ import annotations
-from quart import Quart
+import asyncio
 import hashlib
 from collections.abc import AsyncGenerator, Awaitable, Callable
 from pathlib import Path
 from typing import TypedDict, cast
 from urllib.parse import urlparse
 import htpy as h
 from datastar_py import ServerSentEventGenerator as SSE
 from datastar_py.quart import DatastarResponse, read_signals
 from datastar_py.sse import DatastarEvent
 from htpy import Renderable
 from peewee import IntegrityError
 from quart import Quart, Response, request, send_from_directory, url_for
 from repub.datastar import RefreshBroker, render_stream
 from repub.jobs import (
    JobRuntime,
    load_dashboard_view,
    load_execution_log_view,
    load_runs_view,
 )
 from repub.model import (
    Job,
    create_source,
    delete_job_source,
    initialize_database,
    load_source_form,
    load_sources,
    source_slug_exists,
    update_source,
 )
 from repub.pages import (
    create_source_page,
    dashboard_page_with_data,
    edit_source_page,
    execution_logs_page,
    runs_page,
    shim_page,
    sources_page,
 )
 from repub.pages.sources import PANGEA_CONTENT_FORMATS, PANGEA_CONTENT_TYPES
 REFRESH_BROKER_KEY = "repub.refresh_broker"
 JOB_RUNTIME_KEY = "repub.job_runtime"
 DEFAULT_LOG_DIR = Path("out/logs")
 DEFAULT_FEEDS_DIR = Path("out/feeds")
 RenderFunction = Callable[[], Awaitable[Renderable]]
-def create_app() -> Quart:
+class SourceFormData(TypedDict):
    name: str
    slug: str
    source_type: str
    notes: str
    spider_arguments: str
    enabled: bool
    cron_minute: str
    cron_hour: str
    cron_day_of_month: str
    cron_day_of_week: str
    cron_month: str
    feed_url: str
    pangea_domain: str
    pangea_category: str
    content_format: str
    content_type: str
    max_articles: int | None
    oldest_article: int | None
    only_newest: bool
    include_authors: bool
    exclude_media: bool
    include_content: bool
 DEFAULT_PANGEA_CONTENT_FORMAT = "MOBILE_3"
 DEFAULT_PANGEA_CONTENT_TYPE = "articles"
 DEFAULT_PANGEA_MAX_ARTICLES = "10"
 DEFAULT_PANGEA_OLDEST_ARTICLE = "3"
 def _render_shim_page(
    *, stylesheet_href: str, datastar_src: str, current_path: str
 ) -> tuple[str, str]:
    head = (
        h.title["Republisher Admin UI"],
        h.link(rel="stylesheet", href=stylesheet_href),
    )
    body = str(
        shim_page(datastar_src=datastar_src, current_path=current_path, head=head)
    )
    etag = hashlib.sha256(body.encode("utf-8")).hexdigest()
    return body, etag
 def create_app(*, dev_mode: bool = False) -> Quart:
    app = Quart(__name__)
    app.config["REPUB_DB_PATH"] = str(initialize_database())
    app.config.setdefault("REPUB_LOG_DIR", DEFAULT_LOG_DIR)
    app.config.setdefault("REPUB_FEEDS_DIR", DEFAULT_FEEDS_DIR)
    app.config["REPUB_DEV_MODE"] = dev_mode
    app.extensions[REFRESH_BROKER_KEY] = RefreshBroker()
    app.extensions[JOB_RUNTIME_KEY] = None
    @app.get("/feeds/<path:feed_path>")
    async def published_feed(feed_path: str) -> Response:
        if not bool(app.config["REPUB_DEV_MODE"]):
            return Response(status=404)
        response = await send_from_directory(
            str(Path(app.config["REPUB_FEEDS_DIR"])),
            feed_path,
        )
        if Path(feed_path).suffix == ".rss":
            response.mimetype = "application/rss+xml"
        return response
    @app.get("/")
-    async def index() -> str:
+    @app.get("/sources")
-        return """<!doctype html>
+    @app.get("/sources/create")
-<html lang="en">
+    @app.get("/sources/<string:slug>/edit")
-  <head>
+    @app.get("/runs")
-    <meta charset="utf-8">
+    @app.get("/job/<int:job_id>/execution/<int:execution_id>/logs")
-    <meta name="viewport" content="width=device-width, initial-scale=1">
+    async def page_shim(
-    <title>Republisher</title>
+        slug: str | None = None,
-  </head>
+        job_id: int | None = None,
-  <body>
+        execution_id: int | None = None,
-    <main>
+    ) -> Response:
-      <h1>Hello, world!</h1>
+        del slug, job_id, execution_id
-      <p>Republisher web UI is starting here.</p>
+        body, etag = _render_shim_page(
-    </main>
+            stylesheet_href=url_for("static", filename="app.css"),
-  </body>
+            datastar_src=url_for("static", filename="datastar@1.0.0-RC.8.js"),
-</html>
+            current_path=request.path,
-"""
+        )
        if request.if_none_match.contains(etag):
            response = Response(status=304)
            response.set_etag(etag)
            return response
        response = Response(body, mimetype="text/html")
        response.set_etag(etag)
        return response
    @app.post("/")
    async def dashboard_patch() -> DatastarResponse:
        return _page_patch_response(app, lambda: render_dashboard(app))
    @app.post("/sources")
    async def sources_patch() -> DatastarResponse:
        return _page_patch_response(app, lambda: render_sources(app))
    @app.post("/sources/create")
    async def create_source_patch() -> DatastarResponse:
        return _page_patch_response(app, lambda: render_create_source(app))
    @app.post("/sources/<string:slug>/edit")
    async def edit_source_patch(slug: str) -> DatastarResponse:
        return _page_patch_response(app, lambda: render_edit_source(slug))
    @app.post("/actions/sources/create")
    async def create_source_action() -> DatastarResponse:
        signals = cast(dict[str, object], await read_signals())
        source, error = validate_source_form(
            signals,
            slug_exists=source_slug_exists,
        )
        if error is not None:
            return DatastarResponse(
                SSE.patch_signals({"_formError": error, "_formSuccess": ""})
            )
        assert source is not None
        try:
            create_source(**source)
        except IntegrityError:
            return DatastarResponse(
                SSE.patch_signals(
                    {"_formError": "Slug must be unique.", "_formSuccess": ""}
                )
            )
        get_job_runtime(app).sync_jobs()
        trigger_refresh(app)
        return DatastarResponse(SSE.redirect("/sources"))
    @app.post("/actions/sources/<string:slug>/edit")
    async def edit_source_action(slug: str) -> DatastarResponse:
        signals = cast(dict[str, object], await read_signals())
        source, error = validate_source_form(
            signals,
            slug_exists=lambda candidate: candidate != slug
            and source_slug_exists(candidate),
            immutable_slug=slug,
        )
        if error is not None:
            return DatastarResponse(
                SSE.patch_signals({"_formError": error, "_formSuccess": ""})
            )
        assert source is not None
        if update_source(slug, **source) is None:
            return DatastarResponse(
                SSE.patch_signals(
                    {"_formError": "Source does not exist.", "_formSuccess": ""}
                )
            )
        get_job_runtime(app).sync_jobs()
        trigger_refresh(app)
        return DatastarResponse(SSE.redirect("/sources"))
    @app.post("/runs")
    async def runs_patch() -> DatastarResponse:
        return _page_patch_response(app, lambda: render_runs(app))
    @app.post("/actions/jobs/<int:job_id>/run-now")
    async def run_job_now_action(job_id: int) -> Response:
        get_job_runtime(app).run_job_now(job_id, reason="manual")
        trigger_refresh(app)
        return Response(status=204)
    @app.post("/actions/jobs/<int:job_id>/toggle-enabled")
    async def toggle_job_enabled_action(job_id: int) -> Response:
        job = Job.get_or_none(id=job_id)
        if job is not None:
            get_job_runtime(app).set_job_enabled(job_id, enabled=not job.enabled)
            trigger_refresh(app)
        return Response(status=204)
    @app.post("/actions/jobs/<int:job_id>/delete")
    async def delete_job_action(job_id: int) -> Response:
        delete_job_source(job_id)
        get_job_runtime(app).sync_jobs()
        trigger_refresh(app)
        return Response(status=204)
    @app.post("/actions/executions/<int:execution_id>/cancel")
    async def cancel_execution_action(execution_id: int) -> Response:
        get_job_runtime(app).request_execution_cancel(execution_id)
        trigger_refresh(app)
        return Response(status=204)
    @app.post("/job/<int:job_id>/execution/<int:execution_id>/logs")
    async def logs_patch(job_id: int, execution_id: int) -> DatastarResponse:
        async def render() -> Renderable:
            return await render_execution_logs(
                app, job_id=job_id, execution_id=execution_id
            )
        return _page_patch_response(app, render)
    @app.before_serving
    async def start_runtime() -> None:
        get_job_runtime(app).start()
    @app.after_serving
    async def stop_runtime() -> None:
        get_job_runtime(app).shutdown()
    return app
 def get_refresh_broker(app: Quart) -> RefreshBroker:
    return cast(RefreshBroker, app.extensions[REFRESH_BROKER_KEY])
 def get_job_runtime(app: Quart) -> JobRuntime:
    runtime = cast(JobRuntime | None, app.extensions.get(JOB_RUNTIME_KEY))
    if runtime is None:
        runtime = JobRuntime(
            log_dir=app.config["REPUB_LOG_DIR"],
            refresh_callback=lambda: trigger_refresh(app),
        )
        app.extensions[JOB_RUNTIME_KEY] = runtime
    return runtime
 def trigger_refresh(app: Quart, event: object = "refresh-event") -> None:
    get_refresh_broker(app).publish(event)
 async def render_dashboard(app: Quart | None = None) -> Renderable:
    if app is None:
        return dashboard_page_with_data()
    view = load_dashboard_view(log_dir=app.config["REPUB_LOG_DIR"])
    return dashboard_page_with_data(
        snapshot=cast(dict[str, str], view["snapshot"]),
        running_executions=cast(tuple[dict[str, object], ...], view["running"]),
        source_feeds=cast(tuple[dict[str, object], ...], view["source_feeds"]),
    )
 async def render_sources(app: Quart | None = None) -> Renderable:
    sources = None if app is None else load_sources()
    return sources_page(sources=sources)
 async def render_create_source(app: Quart | None = None) -> Renderable:
    del app
    return create_source_page()
 async def render_edit_source(slug: str) -> Renderable:
    source = load_source_form(slug)
    if source is None:
        return sources_page(sources=())
    return edit_source_page(
        slug=slug,
        source=source,
        action_path=f"/actions/sources/{slug}/edit",
    )
 async def render_runs(app: Quart | None = None) -> Renderable:
    if app is None:
        return runs_page()
    view = load_runs_view(log_dir=app.config["REPUB_LOG_DIR"])
    return runs_page(
        running_executions=cast(tuple[dict[str, object], ...], view["running"]),
        upcoming_jobs=cast(tuple[dict[str, object], ...], view["upcoming"]),
        completed_executions=cast(tuple[dict[str, object], ...], view["completed"]),
    )
 async def render_execution_logs(
    app: Quart | None = None, *, job_id: int, execution_id: int
 ) -> Renderable:
    if app is None:
        return execution_logs_page(job_id=job_id, execution_id=execution_id)
    log_view = load_execution_log_view(
        log_dir=app.config["REPUB_LOG_DIR"],
        job_id=job_id,
        execution_id=execution_id,
    )
    return execution_logs_page(
        job_id=job_id,
        execution_id=execution_id,
        log_view={
            "title": log_view.title,
            "description": log_view.description,
            "status_label": log_view.status_label,
            "status_tone": log_view.status_tone,
            "log_text": log_view.log_text,
            "error_message": log_view.error_message,
        },
    )
 def _page_patch_response(app: Quart, render: RenderFunction) -> DatastarResponse:
    queue = get_refresh_broker(app).subscribe()
    stream = render_stream(
        queue,
        render=render,
        last_event_id=request.headers.get("last-event-id"),
    )
    return DatastarResponse(_unsubscribe_on_close(queue, stream, app))
 async def _unsubscribe_on_close(
    queue: object, stream: AsyncGenerator[DatastarEvent, None], app: Quart
 ) -> AsyncGenerator[DatastarEvent, None]:
    try:
        async for event in stream:
            yield event
    finally:
        get_refresh_broker(app).unsubscribe(cast(asyncio.Queue[object], queue))
 def validate_source_form(
    signals: dict[str, object] | None,
    *,
    slug_exists: Callable[[str], bool],
    immutable_slug: str | None = None,
 ) -> tuple[SourceFormData | None, str | None]:
    if signals is None:
        return None, "Missing form data."
    source_name = _read_string(signals, "sourceName")
    source_slug = _read_string(signals, "sourceSlug")
    source_type = _read_string(signals, "sourceType")
    feed_url = _read_string(signals, "feedUrl")
    pangea_domain = _read_string(signals, "pangeaDomain")
    pangea_category = _read_string(signals, "pangeaCategory")
    content_format = _read_string(signals, "contentFormat")
    content_type = _read_string(signals, "contentType")
    max_articles = _read_string(signals, "maxArticles")
    oldest_article = _read_string(signals, "oldestArticle")
    source_notes = _read_string(signals, "sourceNotes")
    spider_arguments = _normalize_multiline(_read_string(signals, "spiderArguments"))
    cron_minute = _read_string(signals, "cronMinute")
    cron_hour = _read_string(signals, "cronHour")
    cron_day_of_month = _read_string(signals, "cronDayOfMonth")
    cron_day_of_week = _read_string(signals, "cronDayOfWeek")
    cron_month = _read_string(signals, "cronMonth")
    errors: list[str] = []
    if source_name == "":
        errors.append("Source name is required.")
    if source_slug == "":
        errors.append("Slug is required.")
    elif immutable_slug is not None and source_slug != immutable_slug:
        errors.append("Slug is immutable.")
    elif slug_exists(source_slug):
        errors.append("Slug must be unique.")
    if source_type not in {"feed", "pangea"}:
        errors.append("Source type must be feed or pangea.")
    if source_type == "feed":
        if feed_url == "":
            errors.append("Feed URL is required for feed sources.")
        elif not _is_valid_url(feed_url):
            errors.append("Feed URL must be a valid URL.")
    if source_type == "pangea":
        content_format = content_format or DEFAULT_PANGEA_CONTENT_FORMAT
        content_type = content_type or DEFAULT_PANGEA_CONTENT_TYPE
        max_articles = max_articles or DEFAULT_PANGEA_MAX_ARTICLES
        oldest_article = oldest_article or DEFAULT_PANGEA_OLDEST_ARTICLE
        if pangea_domain == "":
            errors.append("Pangea domain is required.")
        if pangea_category == "":
            errors.append("Category name is required.")
        if content_format not in PANGEA_CONTENT_FORMATS:
            errors.append("Content format is invalid.")
        if content_type not in PANGEA_CONTENT_TYPES:
            errors.append("Content type is invalid.")
        if _parse_int(max_articles) is None:
            errors.append("Max articles must be an integer.")
        if _parse_int(oldest_article) is None:
            errors.append("Oldest article must be an integer.")
    cron_values = (
        cron_minute,
        cron_hour,
        cron_day_of_month,
        cron_day_of_week,
        cron_month,
    )
    if any(value == "" for value in cron_values):
        errors.append("All cron fields are required.")
    if errors:
        return None, " ".join(errors)
    enabled = _read_bool(signals, "jobEnabled")
    source: SourceFormData = {
        "name": source_name,
        "slug": source_slug,
        "source_type": source_type,
        "notes": source_notes,
        "spider_arguments": spider_arguments,
        "feed_url": feed_url,
        "pangea_domain": pangea_domain,
        "pangea_category": pangea_category,
        "content_format": content_format,
        "content_type": content_type,
        "max_articles": _parse_int(max_articles),
        "oldest_article": _parse_int(oldest_article),
        "enabled": enabled,
        "only_newest": _read_bool(signals, "onlyNewest", default=True),
        "include_authors": _read_bool(signals, "includeAuthors", default=True),
        "exclude_media": _read_bool(signals, "excludeMedia", default=False),
        "include_content": _read_bool(signals, "includeContent", default=True),
        "cron_minute": cron_minute,
        "cron_hour": cron_hour,
        "cron_day_of_month": cron_day_of_month,
        "cron_day_of_week": cron_day_of_week,
        "cron_month": cron_month,
    }
    return source, None
 def _read_string(signals: dict[str, object], key: str) -> str:
    return str(signals.get(key, "")).strip()
 def _read_bool(signals: dict[str, object], key: str, *, default: bool = False) -> bool:
    value = signals.get(key, default)
    if isinstance(value, bool):
        return value
    if isinstance(value, str):
        return value.lower() in {"true", "1", "on", "yes"}
    return bool(value)
 def _normalize_multiline(value: str) -> str:
    return value.replace("\r\n", "\n").replace("\r", "\n")
 def _parse_int(value: str) -> int | None:
    try:
        return int(value)
    except ValueError:
        return None
 def _is_valid_url(value: str) -> bool:
    parsed = urlparse(value)
    return parsed.scheme in {"http", "https"} and parsed.netloc != ""
--- a/tests/test_config.py
+++ b/tests/test_config.py
@ -141,12 +141,20 @@ def test_build_feed_settings_derives_output_paths_from_feed_slug(
    assert feed_settings["REPUBLISHER_OUT_DIR"] == str(out_dir)
    assert feed_settings["LOG_FILE"] == str(out_dir / "logs" / "info-marti.log")
    assert feed_settings["HTTPCACHE_DIR"] == str(out_dir / "httpcache")
-    assert feed_settings["IMAGES_STORE"] == str(out_dir / "info-marti" / "images")
+    assert feed_settings["IMAGES_STORE"] == str(
-    assert feed_settings["AUDIO_STORE"] == str(out_dir / "info-marti" / "audio")
+        out_dir / "feeds" / "info-marti" / "images"
-    assert feed_settings["VIDEO_STORE"] == str(out_dir / "info-marti" / "video")
+    )
-    assert feed_settings["FILES_STORE"] == str(out_dir / "info-marti" / "files")
+    assert feed_settings["AUDIO_STORE"] == str(
        out_dir / "feeds" / "info-marti" / "audio"
    )
    assert feed_settings["VIDEO_STORE"] == str(
        out_dir / "feeds" / "info-marti" / "video"
    )
    assert feed_settings["FILES_STORE"] == str(
        out_dir / "feeds" / "info-marti" / "files"
    )
    assert feed_settings["FEEDS"] == {
-        str(out_dir / "info-marti.rss"): {
+        str(out_dir / "feeds" / "info-marti" / "feed.rss"): {
            "format": "rss",
            "postprocessing": [],
            "feed_name": "info-marti",
@ -181,5 +189,9 @@ def test_build_feed_settings_uses_runtime_media_dir_overrides(tmp_path: Path) ->
    assert feed_settings["REPUBLISHER_VIDEO_DIR"] == "videos-custom"
    assert feed_settings["REPUBLISHER_AUDIO_DIR"] == "audio-custom"
-    assert feed_settings["VIDEO_STORE"] == str(out_dir / "gp-pod" / "videos-custom")
+    assert feed_settings["VIDEO_STORE"] == str(
-    assert feed_settings["AUDIO_STORE"] == str(out_dir / "gp-pod" / "audio-custom")
+        out_dir / "feeds" / "gp-pod" / "videos-custom"
    )
    assert feed_settings["AUDIO_STORE"] == str(
        out_dir / "feeds" / "gp-pod" / "audio-custom"
    )
--- a/tests/test_dev_mode.py
+++ b/tests/test_dev_mode.py
@ -0,0 +1,71 @@
 from __future__ import annotations
 import asyncio
 from pathlib import Path
 from repub.web import create_app
 def test_dev_mode_serves_published_feeds(monkeypatch, tmp_path: Path) -> None:
    db_path = tmp_path / "dev-mode.db"
    feeds_dir = tmp_path / "out" / "feeds"
    monkeypatch.setenv("REPUBLISHER_DB_PATH", str(db_path))
    async def run() -> None:
        app = create_app(dev_mode=True)
        app.config["REPUB_FEEDS_DIR"] = feeds_dir
        feed_path = feeds_dir / "demo-source" / "feed.rss"
        feed_path.parent.mkdir(parents=True)
        feed_path.write_text("<rss/>\n", encoding="utf-8")
        client = app.test_client()
        response = await client.get("/feeds/demo-source/feed.rss")
        assert response.status_code == 200
        assert response.mimetype == "application/rss+xml"
        assert await response.get_data(as_text=True) == "<rss/>\n"
    asyncio.run(run())
 def test_dev_mode_serves_feed_enclosure_assets(monkeypatch, tmp_path: Path) -> None:
    db_path = tmp_path / "dev-mode-assets.db"
    feeds_dir = tmp_path / "out" / "feeds"
    monkeypatch.setenv("REPUBLISHER_DB_PATH", str(db_path))
    async def run() -> None:
        app = create_app(dev_mode=True)
        app.config["REPUB_FEEDS_DIR"] = feeds_dir
        enclosure_path = feeds_dir / "demo-source" / "audio" / "episode.mp3"
        enclosure_path.parent.mkdir(parents=True)
        enclosure_path.write_bytes(b"mp3-data")
        client = app.test_client()
        response = await client.get("/feeds/demo-source/audio/episode.mp3")
        assert response.status_code == 200
        assert await response.get_data() == b"mp3-data"
    asyncio.run(run())
 def test_default_mode_does_not_serve_published_feeds(
    monkeypatch, tmp_path: Path
 ) -> None:
    db_path = tmp_path / "default-mode.db"
    feeds_dir = tmp_path / "out" / "feeds"
    monkeypatch.setenv("REPUBLISHER_DB_PATH", str(db_path))
    async def run() -> None:
        app = create_app()
        app.config["REPUB_FEEDS_DIR"] = feeds_dir
        feed_path = feeds_dir / "demo-source" / "feed.rss"
        feed_path.parent.mkdir(parents=True)
        feed_path.write_text("<rss/>\n", encoding="utf-8")
        client = app.test_client()
        response = await client.get("/feeds/demo-source/feed.rss")
        assert response.status_code == 404
    asyncio.run(run())
--- a/tests/test_entrypoint.py
+++ b/tests/test_entrypoint.py
@ -1,6 +1,9 @@
 import io
 import logging
 from types import SimpleNamespace
 from typing import cast
-from repub.entrypoint import FeedNameFilter
+from repub.entrypoint import FeedNameFilter, entrypoint, logger, parse_args
 def test_feed_name_filter_accepts_matching_item() -> None:
@ -15,3 +18,70 @@ def test_feed_name_filter_rejects_non_matching_item() -> None:
    feed_filter = FeedNameFilter({"feed_name": "nasa"})
    assert feed_filter.accepts(item) is False
 def test_parse_args_uses_republisher_host_and_port_env_vars(monkeypatch) -> None:
    monkeypatch.setenv("REPUBLISHER_HOST", "0.0.0.0")
    monkeypatch.setenv("REPUBLISHER_PORT", "9090")
    command, args = parse_args(["serve"])
    assert command == "serve"
    assert args.host == "0.0.0.0"
    assert args.port == "9090"
 def test_parse_args_supports_dev_mode_flag() -> None:
    command, args = parse_args(["serve", "--dev-mode"])
    assert command == "serve"
    assert args.dev_mode is True
 def test_parse_args_defaults_to_dev_mode_when_no_args() -> None:
    command, args = parse_args([])
    assert command == "serve"
    assert args.dev_mode is True
 def test_entrypoint_rejects_invalid_republisher_port(monkeypatch) -> None:
    monkeypatch.setenv("REPUBLISHER_PORT", "not-a-number")
    stream = io.StringIO()
    handlers = [
        cast(logging.StreamHandler[io.StringIO], handler) for handler in logger.handlers
    ]
    original_streams = [handler.stream for handler in handlers]
    for handler in handlers:
        handler.stream = stream
    try:
        exit_code = entrypoint(["serve"])
    finally:
        for handler, original_stream in zip(handlers, original_streams):
            handler.stream = original_stream
    assert exit_code == 2
    assert "Invalid REPUBLISHER_PORT/--port value" in stream.getvalue()
 def test_entrypoint_passes_dev_mode_to_create_app(monkeypatch) -> None:
    recorded: dict[str, object] = {}
    class StubApp:
        def run(self, *, host: str, port: int) -> None:
            recorded["host"] = host
            recorded["port"] = port
    def fake_create_app(*, dev_mode: bool) -> StubApp:
        recorded["dev_mode"] = dev_mode
        return StubApp()
    monkeypatch.setattr("repub.entrypoint.create_app", fake_create_app)
    exit_code = entrypoint(
        ["serve", "--dev-mode", "--host", "0.0.0.0", "--port", "9090"]
    )
    assert exit_code == 0
    assert recorded == {"dev_mode": True, "host": "0.0.0.0", "port": 9090}
--- a/tests/test_file_feeds.py
+++ b/tests/test_file_feeds.py
@ -1,6 +1,10 @@
 from pathlib import Path
 from scrapy.settings import Settings
 from repub import entrypoint as entrypoint_module
 from repub.spiders.rss_spider import RssFeedSpider
 from repub.utils import FileType, local_audio_path, local_image_path
 def test_entrypoint_supports_file_feed_urls(tmp_path: Path, monkeypatch) -> None:
@ -29,9 +33,33 @@ DOWNLOAD_TIMEOUT = 5
    exit_code = entrypoint_module.entrypoint(["--config", str(config_path)])
-    output_path = tmp_path / "out" / "local-file.rss"
+    output_path = tmp_path / "out" / "feeds" / "local-file" / "feed.rss"
    assert exit_code == 0
    assert output_path.exists()
    output = output_path.read_text(encoding="utf-8")
    assert "<title>Local Demo Feed</title>" in output
    assert "<title>Local Demo Entry</title>" in output
 def test_rss_spider_rewrites_public_asset_urls_as_relative_paths() -> None:
    spider = RssFeedSpider(feed_name="demo", url="https://example.com/feed.rss")
    spider.settings = Settings(
        values={
            "REPUBLISHER_IMAGE_DIR": "images",
            "REPUBLISHER_FILE_DIR": "files",
            "REPUBLISHER_AUDIO_DIR": "audio",
            "REPUBLISHER_VIDEO_DIR": "video",
        }
    )
    assert (
        spider.rewrite_image_url("https://example.com/media/photo.jpg")
        == f"images/{local_image_path('https://example.com/media/photo.jpg')}"
    )
    assert (
        spider.rewrite_file_url(
            FileType.AUDIO,
            "https://example.com/media/podcast.mp3",
        )
        == f"audio/{local_audio_path('https://example.com/media/podcast.mp3')}"
    )
--- a/tests/test_jobs.py
+++ b/tests/test_jobs.py
@ -0,0 +1,85 @@
 from __future__ import annotations
 from datetime import UTC, datetime
 from pathlib import Path
 from repub.jobs import load_runs_view
 from repub.model import (
    Job,
    JobExecution,
    JobExecutionStatus,
    create_source,
    initialize_database,
 )
 def test_load_runs_view_humanizes_completed_execution_summary_bytes(
    tmp_path: Path,
 ) -> None:
    initialize_database(tmp_path / "jobs-completed.db")
    source = create_source(
        name="Completed source",
        slug="completed-source",
        source_type="feed",
        notes="",
        spider_arguments="",
        enabled=False,
        cron_minute="*/5",
        cron_hour="*",
        cron_day_of_month="*",
        cron_day_of_week="*",
        cron_month="*",
        feed_url="https://example.com/completed.xml",
    )
    job = Job.get(Job.source == source)
    JobExecution.create(
        job=job,
        running_status=JobExecutionStatus.SUCCEEDED,
        ended_at=datetime(2026, 3, 30, 12, 0, tzinfo=UTC),
        requests_count=14,
        items_count=11,
        bytes_count=16_410_269,
    )
    view = load_runs_view(
        log_dir=tmp_path / "out" / "logs",
        now=datetime(2026, 3, 30, 12, 30, tzinfo=UTC),
    )
    assert view["completed"][0]["stats"] == "14 requests • 11 items • 15.7 MiB"
 def test_load_runs_view_humanizes_running_execution_summary_bytes(
    tmp_path: Path,
 ) -> None:
    initialize_database(tmp_path / "jobs-running.db")
    source = create_source(
        name="Running source",
        slug="running-source",
        source_type="feed",
        notes="",
        spider_arguments="",
        enabled=False,
        cron_minute="*/5",
        cron_hour="*",
        cron_day_of_month="*",
        cron_day_of_week="*",
        cron_month="*",
        feed_url="https://example.com/running.xml",
    )
    job = Job.get(Job.source == source)
    JobExecution.create(
        job=job,
        running_status=JobExecutionStatus.RUNNING,
        started_at=datetime(2026, 3, 30, 12, 0, tzinfo=UTC),
        requests_count=14,
        items_count=11,
        bytes_count=1_536,
    )
    view = load_runs_view(
        log_dir=tmp_path / "out" / "logs",
        now=datetime(2026, 3, 30, 12, 30, tzinfo=UTC),
    )
    assert view["running"][0]["stats"] == "14 requests • 11 items • 1.5 KiB"
--- a/tests/test_model.py
+++ b/tests/test_model.py
@ -0,0 +1,170 @@
 from __future__ import annotations
 import sqlite3
 from pathlib import Path
 import pytest
 from peewee import IntegrityError
 from repub.model import (
    Job,
    Source,
    database,
    initialize_database,
    resolve_database_path,
 )
 def test_resolve_database_path_defaults_to_republisher_db(
    monkeypatch: pytest.MonkeyPatch, tmp_path: Path
 ) -> None:
    monkeypatch.chdir(tmp_path)
    monkeypatch.delenv("REPUBLISHER_DB_PATH", raising=False)
    assert resolve_database_path() == tmp_path / "republisher.db"
 def test_resolve_database_path_prefers_environment_variable(
    monkeypatch: pytest.MonkeyPatch, tmp_path: Path
 ) -> None:
    db_path = tmp_path / "env-configured.db"
    monkeypatch.setenv("REPUBLISHER_DB_PATH", str(db_path))
    assert resolve_database_path() == db_path
 def test_initialize_database_bootstraps_schema_from_sql_files(tmp_path: Path) -> None:
    db_path = tmp_path / "bootstrap.db"
    initialize_database(db_path)
    connection = sqlite3.connect(db_path)
    try:
        table_names = {
            row[0]
            for row in connection.execute(
                """
                SELECT name
                FROM sqlite_master
                WHERE type = 'table' AND name NOT LIKE 'sqlite_%'
                """
            )
        }
        assert table_names == {
            "job",
            "job_execution",
            "source",
            "source_feed",
            "source_pangea",
        }
        defaults = {
            row[1]: row[4]
            for row in connection.execute("PRAGMA table_info('source_pangea')")
        }
        assert defaults["content_type"] is None
        assert defaults["only_newest"] is None
        assert defaults["max_articles"] is None
        assert defaults["oldest_article"] is None
        assert defaults["include_authors"] is None
        assert defaults["exclude_media"] is None
        assert defaults["include_content"] is None
        assert defaults["content_format"] is None
    finally:
        connection.close()
 def test_initialize_database_configures_sqlite_pragmas(tmp_path: Path) -> None:
    db_path = tmp_path / "pragmas.db"
    initialize_database(db_path)
    database.connect(reuse_if_open=True)
    try:
        pragma_values = {
            "cache_size": database.execute_sql("PRAGMA cache_size").fetchone()[0],
            "page_size": database.execute_sql("PRAGMA page_size").fetchone()[0],
            "journal_mode": database.execute_sql("PRAGMA journal_mode").fetchone()[0],
            "synchronous": database.execute_sql("PRAGMA synchronous").fetchone()[0],
            "temp_store": database.execute_sql("PRAGMA temp_store").fetchone()[0],
            "foreign_keys": database.execute_sql("PRAGMA foreign_keys").fetchone()[0],
            "busy_timeout": database.execute_sql("PRAGMA busy_timeout").fetchone()[0],
        }
        assert pragma_values == {
            "cache_size": 15625,
            "page_size": 4096,
            "journal_mode": "wal",
            "synchronous": 1,
            "temp_store": 2,
            "foreign_keys": 1,
            "busy_timeout": 5000,
        }
    finally:
        database.close()
 def test_initialize_database_creates_scheduler_and_execution_indexes(
    tmp_path: Path,
 ) -> None:
    db_path = tmp_path / "indexes.db"
    initialize_database(db_path)
    connection = sqlite3.connect(db_path)
    try:
        index_names = {
            row[0]
            for row in connection.execute(
                """
                SELECT name
                FROM sqlite_master
                WHERE type = 'index'
                  AND name IN (
                    'job_enabled_idx',
                    'job_execution_job_created_at_idx',
                    'job_execution_status_started_at_idx',
                    'job_execution_status_ended_at_idx'
                  )
                """
            )
        }
        assert index_names == {
            "job_enabled_idx",
            "job_execution_job_created_at_idx",
            "job_execution_status_started_at_idx",
            "job_execution_status_ended_at_idx",
        }
    finally:
        connection.close()
 def test_job_table_allows_exactly_one_job_per_source(tmp_path: Path) -> None:
    initialize_database(tmp_path / "jobs.db")
    source = Source.create(
        name="Guardian feed mirror",
        slug="guardian-feed",
        source_type="feed",
    )
    Job.create(
        source=source,
        enabled=True,
        spider_arguments="",
        cron_minute="15",
        cron_hour="*",
        cron_day_of_month="*",
        cron_day_of_week="*",
        cron_month="*",
    )
    with pytest.raises(IntegrityError):
        Job.create(
            source=source,
            enabled=True,
            spider_arguments="language=en",
            cron_minute="30",
            cron_hour="*",
            cron_day_of_month="*",
            cron_day_of_week="*",
            cron_month="*",
        )
--- a/tests/test_pipelines.py
+++ b/tests/test_pipelines.py
@ -1,8 +1,10 @@
 import sys
 from pathlib import Path
 from types import SimpleNamespace
 import pytest
 from repub import media
 from repub.config import (
    FeedConfig,
    RepublisherConfig,
@ -48,3 +50,141 @@ def test_pipeline_from_crawler_uses_configured_store(
    assert pipeline.settings is crawler.settings
    assert pipeline.store.basedir == crawler.settings[store_setting]
 def test_transcode_audio_captures_ffmpeg_output(monkeypatch, tmp_path: Path) -> None:
    input_file = tmp_path / "input.mp3"
    input_file.write_bytes(b"12345")
    output_dir = tmp_path / "audio-out"
    output_dir.mkdir()
    run_calls: list[dict[str, object]] = []
    class FakeOutput:
        def __init__(self, output_path: Path):
            self.output_path = output_path
        def run(self, **kwargs):
            run_calls.append(kwargs)
            self.output_path.write_bytes(b"12")
            return b"", b""
    class FakeInput:
        def output(self, output_file: str, **params):
            del params
            return FakeOutput(Path(output_file))
    monkeypatch.setattr(media.ffmpeg, "input", lambda _: FakeInput())
    result = media.transcode_audio(
        str(input_file),
        str(output_dir),
        {"extension": "mp3", "acodec": "libmp3lame"},
    )
    assert result == str(output_dir / "converted.mp3")
    assert run_calls == [{"capture_stdout": True, "capture_stderr": True}]
 def test_transcode_video_two_pass_does_not_print_ffmpeg_output(
    monkeypatch, tmp_path: Path
 ) -> None:
    input_file = tmp_path / "input.mp4"
    input_file.write_bytes(b"12345")
    output_dir = tmp_path / "video-out"
    output_dir.mkdir()
    run_calls: list[dict[str, object]] = []
    printed: list[tuple[tuple[object, ...], dict[str, object]]] = []
    class FakeOutput:
        def __init__(self, output_path: Path | None):
            self.output_path = output_path
        def global_args(self, *args):
            del args
            return self
        def run(self, **kwargs):
            run_calls.append(kwargs)
            if self.output_path is not None:
                self.output_path.write_bytes(b"12")
            return b"pass-out", b"pass-err"
    class FakeInput:
        video = object()
        audio = object()
        def output(self, *args, **params):
            del params
            output_path = next(
                (
                    Path(arg)
                    for arg in args
                    if isinstance(arg, str) and arg.endswith(".mp4")
                ),
                None,
            )
            return FakeOutput(output_path)
    monkeypatch.setattr(media.ffmpeg, "input", lambda _: FakeInput())
    monkeypatch.setattr(
        "builtins.print", lambda *args, **kwargs: printed.append((args, kwargs))
    )
    result = media.transcode_video(
        str(input_file),
        str(output_dir),
        {
            "extension": "mp4",
            "passes": [
                {"f": "null"},
                {"c:v": "libx264"},
            ],
        },
    )
    assert result == str(output_dir / "converted.mp4")
    assert run_calls == [
        {"capture_stdout": True, "capture_stderr": True},
        {
            "capture_stdout": True,
            "capture_stderr": True,
            "overwrite_output": True,
        },
    ]
    assert printed == []
 def test_transcode_video_prints_ffmpeg_output_on_error(
    monkeypatch, tmp_path: Path
 ) -> None:
    input_file = tmp_path / "input.mp4"
    input_file.write_bytes(b"12345")
    output_dir = tmp_path / "video-out"
    output_dir.mkdir()
    printed: list[tuple[str, bool]] = []
    class FakeOutput:
        def run(self, **kwargs):
            del kwargs
            raise media.ffmpeg.Error("ffmpeg", b"video-stdout", b"video-stderr")
    class FakeInput:
        def output(self, *args, **params):
            del args, params
            return FakeOutput()
    def fake_print(*args, **kwargs):
        printed.append((str(args[0]), kwargs.get("file") is sys.stderr))
    monkeypatch.setattr(media.ffmpeg, "input", lambda _: FakeInput())
    monkeypatch.setattr("builtins.print", fake_print)
    with pytest.raises(RuntimeError):
        media.transcode_video(
            str(input_file),
            str(output_dir),
            {"extension": "mp4", "c:v": "libx264"},
        )
    assert ("video-stderr", True) in printed
    assert ("video-stdout", False) in printed
--- a/tests/test_scheduler_runtime.py
+++ b/tests/test_scheduler_runtime.py
@ -0,0 +1,513 @@
 from __future__ import annotations
 import asyncio
 import json
 import socketserver
 import threading
 import time
 from datetime import UTC, datetime, timedelta
 from http.server import BaseHTTPRequestHandler
 from pathlib import Path
 from repub.job_runner import generate_pangea_feed
 from repub.jobs import JobArtifacts, JobRuntime, load_runs_view
 from repub.model import (
    Job,
    JobExecution,
    JobExecutionStatus,
    Source,
    create_source,
    initialize_database,
 )
 from repub.web import create_app, get_job_runtime, render_execution_logs, render_runs
 FIXTURE_FEED_PATH = (
    Path(__file__).resolve().parents[1] / "demo" / "fixtures" / "local-feed.rss"
 ).resolve()
 def test_job_runtime_syncs_enabled_jobs_into_apscheduler(tmp_path: Path) -> None:
    initialize_database(tmp_path / "scheduler.db")
    enabled_source = create_source(
        name="Enabled source",
        slug="enabled-source",
        source_type="feed",
        notes="",
        spider_arguments="",
        enabled=True,
        cron_minute="*/5",
        cron_hour="*",
        cron_day_of_month="*",
        cron_day_of_week="*",
        cron_month="*",
        feed_url="https://example.com/enabled.xml",
    )
    disabled_source = create_source(
        name="Disabled source",
        slug="disabled-source",
        source_type="feed",
        notes="",
        spider_arguments="",
        enabled=False,
        cron_minute="15",
        cron_hour="*",
        cron_day_of_month="*",
        cron_day_of_week="*",
        cron_month="*",
        feed_url="https://example.com/disabled.xml",
    )
    enabled_job = Job.get(Job.source == enabled_source)
    disabled_job = Job.get(Job.source == disabled_source)
    runtime = JobRuntime(log_dir=tmp_path / "out" / "logs")
    try:
        runtime.start()
        runtime.sync_jobs()
        scheduled_ids = {job.id for job in runtime.scheduler.get_jobs()}
        assert f"job-{enabled_job.id}" in scheduled_ids
        assert f"job-{disabled_job.id}" not in scheduled_ids
        enabled_job.enabled = False
        enabled_job.save()
        runtime.sync_jobs()
        scheduled_ids = {job.id for job in runtime.scheduler.get_jobs()}
        assert f"job-{enabled_job.id}" not in scheduled_ids
    finally:
        runtime.shutdown()
 def test_job_runtime_run_now_writes_log_and_stats_and_marks_success(
    tmp_path: Path,
 ) -> None:
    initialize_database(tmp_path / "run-now.db")
    source = create_source(
        name="Manual source",
        slug="manual-source",
        source_type="feed",
        notes="",
        spider_arguments="",
        enabled=False,
        cron_minute="*/5",
        cron_hour="*",
        cron_day_of_month="*",
        cron_day_of_week="*",
        cron_month="*",
        feed_url=FIXTURE_FEED_PATH.as_uri(),
    )
    job = Job.get(Job.source == source)
    runtime = JobRuntime(log_dir=tmp_path / "out" / "logs")
    try:
        runtime.start()
        execution_id = runtime.run_job_now(job.id, reason="manual")
        assert execution_id is not None
        execution = _wait_for_terminal_execution(execution_id)
        artifacts = JobArtifacts.for_execution(
            log_dir=tmp_path / "out" / "logs",
            job_id=job.id,
            execution_id=execution_id,
        )
        assert execution.running_status == JobExecutionStatus.SUCCEEDED
        assert execution.started_at is not None
        assert execution.ended_at is not None
        assert execution.requests_count > 0
        assert execution.items_count > 0
        assert execution.bytes_count > 0
        assert artifacts.log_path.exists()
        assert artifacts.stats_path.exists()
        output_path = tmp_path / "out" / "feeds" / "manual-source" / "feed.rss"
        assert output_path.exists()
        output_text = output_path.read_text(encoding="utf-8")
        assert "<title>Local Demo Feed</title>" in output_text
        assert "<title>Local Demo Entry</title>" in output_text
        stats_lines = [
            json.loads(line)
            for line in artifacts.stats_path.read_text(encoding="utf-8").splitlines()
        ]
        assert len(stats_lines) >= 2
        assert stats_lines[-1]["requests_count"] == execution.requests_count
    finally:
        runtime.shutdown()
 def test_job_runtime_cancel_marks_execution_canceled(tmp_path: Path) -> None:
    initialize_database(tmp_path / "cancel.db")
    with _slow_feed_server() as feed_url:
        source = create_source(
            name="Cancelable source",
            slug="cancelable-source",
            source_type="feed",
            notes="",
            spider_arguments="",
            enabled=False,
            cron_minute="*/5",
            cron_hour="*",
            cron_day_of_month="*",
            cron_day_of_week="*",
            cron_month="*",
            feed_url=feed_url,
        )
        job = Job.get(Job.source == source)
        runtime = JobRuntime(log_dir=tmp_path / "out" / "logs")
        try:
            runtime.start()
            execution_id = runtime.run_job_now(job.id, reason="manual")
            assert execution_id is not None
            _wait_for_running_execution(execution_id)
            runtime.request_execution_cancel(execution_id)
            execution = _wait_for_terminal_execution(execution_id)
            artifacts = JobArtifacts.for_execution(
                log_dir=tmp_path / "out" / "logs",
                job_id=job.id,
                execution_id=execution_id,
            )
            assert execution.running_status == JobExecutionStatus.CANCELED
            assert execution.ended_at is not None
            assert execution.stop_requested_at is not None
            assert "graceful stop requested" in artifacts.log_path.read_text(
                encoding="utf-8"
            )
        finally:
            runtime.shutdown()
 def test_job_runtime_start_reconciles_stale_running_execution(tmp_path: Path) -> None:
    initialize_database(tmp_path / "stale-running.db")
    source = create_source(
        name="Stale source",
        slug="stale-source",
        source_type="feed",
        notes="",
        spider_arguments="",
        enabled=False,
        cron_minute="*/5",
        cron_hour="*",
        cron_day_of_month="*",
        cron_day_of_week="*",
        cron_month="*",
        feed_url="https://example.com/stale.xml",
    )
    job = Job.get(Job.source == source)
    execution = JobExecution.create(
        job=job,
        started_at="2026-03-30 12:30:00+00:00",
        running_status=JobExecutionStatus.RUNNING,
    )
    artifacts = JobArtifacts.for_execution(
        log_dir=tmp_path / "out" / "logs",
        job_id=job.id,
        execution_id=int(execution.get_id()),
    )
    artifacts.log_path.parent.mkdir(parents=True, exist_ok=True)
    artifacts.log_path.write_text(
        "worker: process lost during app restart\n",
        encoding="utf-8",
    )
    runtime = JobRuntime(log_dir=tmp_path / "out" / "logs")
    try:
        runtime.start()
        reconciled_execution = JobExecution.get_by_id(execution.get_id())
        assert reconciled_execution.running_status == JobExecutionStatus.FAILED
        assert reconciled_execution.ended_at is not None
        assert "marked failed after app restart" in artifacts.log_path.read_text(
            encoding="utf-8"
        )
    finally:
        runtime.shutdown()
 def test_generate_pangea_feed_writes_pangea_rss_file(
    monkeypatch, tmp_path: Path
 ) -> None:
    class StubPangeaFeed:
        def __init__(self, config, feeds):
            self.config = config
            self.feed = feeds[0]
        def acquire_content(self) -> None:
            return None
        def generate_feed(self) -> None:
            return None
        def disgorge(self, slug: str):
            output_path = self.config.results.output_directory / slug / "pangea.rss"
            output_path.parent.mkdir(parents=True, exist_ok=True)
            output_path.write_text(
                "<rss><channel><title>Pangea Fixture</title></channel></rss>\n",
                encoding="utf-8",
            )
            return output_path
    monkeypatch.setattr(
        "repub.job_runner.pangea_feed_class",
        lambda: StubPangeaFeed,
    )
    output_path = generate_pangea_feed(
        name="Pangea source",
        slug="pangea-source",
        domain="example.org",
        category_name="News",
        content_type="articles",
        only_newest=True,
        max_articles=10,
        oldest_article=3,
        include_authors=True,
        exclude_media=False,
        include_content=True,
        content_format="MOBILE_3",
        out_dir=tmp_path / "out",
        log_path=tmp_path / "out" / "logs" / "pangea.log",
    )
    assert output_path == (tmp_path / "out" / "feeds" / "pangea-source" / "pangea.rss")
    assert output_path.exists()
    assert "Pangea Fixture" in output_path.read_text(encoding="utf-8")
 def test_load_runs_view_humanizes_completed_execution_end_time(
    monkeypatch, tmp_path: Path
 ) -> None:
    db_path = tmp_path / "runs-view.db"
    log_dir = tmp_path / "out" / "logs"
    monkeypatch.setenv("REPUBLISHER_DB_PATH", str(db_path))
    app = create_app()
    app.config["REPUB_LOG_DIR"] = log_dir
    source = create_source(
        name="Completed source",
        slug="completed-source",
        source_type="feed",
        notes="",
        spider_arguments="",
        enabled=False,
        cron_minute="*/5",
        cron_hour="*",
        cron_day_of_month="*",
        cron_day_of_week="*",
        cron_month="*",
        feed_url="https://example.com/completed.xml",
    )
    job = Job.get(Job.source == source)
    reference_time = datetime(2026, 1, 15, 12, 0, tzinfo=UTC)
    ended_at = reference_time - timedelta(hours=2)
    JobExecution.create(
        job=job,
        running_status=JobExecutionStatus.SUCCEEDED,
        ended_at=ended_at,
    )
    view = load_runs_view(log_dir=app.config["REPUB_LOG_DIR"], now=reference_time)
    completed = view["completed"][0]
    assert completed["ended_at"] == "2 hours ago"
    assert completed["ended_at_iso"] == ended_at.isoformat()
 def test_render_runs_uses_database_backed_jobs_and_executions(
    monkeypatch, tmp_path: Path
 ) -> None:
    db_path = tmp_path / "runs-page.db"
    log_dir = tmp_path / "out" / "logs"
    monkeypatch.setenv("REPUBLISHER_DB_PATH", str(db_path))
    app = create_app()
    app.config["REPUB_LOG_DIR"] = log_dir
    source = create_source(
        name="Runs page source",
        slug="runs-page-source",
        source_type="feed",
        notes="",
        spider_arguments="",
        enabled=True,
        cron_minute="*/5",
        cron_hour="*",
        cron_day_of_month="*",
        cron_day_of_week="*",
        cron_month="*",
        feed_url=FIXTURE_FEED_PATH.as_uri(),
    )
    job = Job.get(Job.source == source)
    runtime = get_job_runtime(app)
    runtime.start()
    try:
        execution_id = runtime.run_job_now(job.id, reason="manual")
        assert execution_id is not None
        execution = _wait_for_terminal_execution(execution_id)
        async def run() -> None:
            body = str(await render_runs(app))
            assert "runs-page-source" in body
            assert "Running job executions" in body
            assert "Upcoming jobs" in body
            assert "Completed job executions" in body
            assert f"/job/{job.id}/execution/{execution.get_id()}/logs" in body
            assert "Succeeded" in body
            assert "Run now" in body
        asyncio.run(run())
    finally:
        runtime.shutdown()
 def test_render_execution_logs_handles_missing_execution_and_missing_log_file(
    monkeypatch, tmp_path: Path
 ) -> None:
    db_path = tmp_path / "log-errors.db"
    log_dir = tmp_path / "out" / "logs"
    monkeypatch.setenv("REPUBLISHER_DB_PATH", str(db_path))
    app = create_app()
    app.config["REPUB_LOG_DIR"] = log_dir
    source = create_source(
        name="Log source",
        slug="log-source",
        source_type="feed",
        notes="",
        spider_arguments="",
        enabled=False,
        cron_minute="*/5",
        cron_hour="*",
        cron_day_of_month="*",
        cron_day_of_week="*",
        cron_month="*",
        feed_url="https://example.com/log-source.xml",
    )
    job = Job.get(Job.source == source)
    execution = JobExecution.create(
        job=job,
        running_status=JobExecutionStatus.FAILED,
    )
    async def run() -> None:
        missing_execution = str(
            await render_execution_logs(app, job_id=job.id, execution_id=9999)
        )
        missing_log = str(
            await render_execution_logs(app, job_id=job.id, execution_id=execution.id)
        )
        assert "Execution log unavailable" in missing_execution
        assert "Execution does not exist." in missing_execution
        assert "Execution log unavailable" in missing_log
        assert "Log file has not been created yet." in missing_log
    asyncio.run(run())
 def test_delete_job_action_removes_source_job_and_execution_history(
    monkeypatch, tmp_path: Path
 ) -> None:
    db_path = tmp_path / "delete-job.db"
    monkeypatch.setenv("REPUBLISHER_DB_PATH", str(db_path))
    async def run() -> None:
        app = create_app()
        client = app.test_client()
        source = create_source(
            name="Delete source",
            slug="delete-source",
            source_type="feed",
            notes="",
            spider_arguments="",
            enabled=True,
            cron_minute="*/30",
            cron_hour="*",
            cron_day_of_month="*",
            cron_day_of_week="*",
            cron_month="*",
            feed_url="https://example.com/delete.xml",
        )
        job = Job.get(Job.source == source)
        execution = JobExecution.create(
            job=job,
            running_status=JobExecutionStatus.SUCCEEDED,
        )
        response = await client.post(f"/actions/jobs/{job.id}/delete")
        assert response.status_code == 204
        assert Source.get_or_none(Source.slug == "delete-source") is None
        assert Job.get_or_none(id=job.id) is None
        assert JobExecution.get_or_none(id=int(execution.get_id())) is None
    asyncio.run(run())
 def _wait_for_running_execution(
    execution_id: int, *, timeout_seconds: float = 2.0
 ) -> JobExecution:
    deadline = time.monotonic() + timeout_seconds
    while time.monotonic() < deadline:
        execution = JobExecution.get_by_id(execution_id)
        if execution.running_status == JobExecutionStatus.RUNNING:
            return execution
        time.sleep(0.02)
    raise AssertionError(f"execution {execution_id} never entered RUNNING state")
 def _wait_for_terminal_execution(
    execution_id: int, *, timeout_seconds: float = 4.0
 ) -> JobExecution:
    deadline = time.monotonic() + timeout_seconds
    while time.monotonic() < deadline:
        execution = JobExecution.get_by_id(execution_id)
        if execution.running_status in {
            JobExecutionStatus.SUCCEEDED,
            JobExecutionStatus.FAILED,
            JobExecutionStatus.CANCELED,
        }:
            return execution
        time.sleep(0.02)
    raise AssertionError(f"execution {execution_id} did not finish in time")
 class _SlowFeedRequestHandler(BaseHTTPRequestHandler):
    def do_GET(self) -> None:  # noqa: N802
        time.sleep(2.0)
        payload = FIXTURE_FEED_PATH.read_bytes()
        self.send_response(200)
        self.send_header("Content-Type", "application/rss+xml; charset=utf-8")
        self.send_header("Content-Length", str(len(payload)))
        self.end_headers()
        self.wfile.write(payload)
    def log_message(self, format: str, *args: object) -> None:
        del format, args
 class _ThreadedTCPServer(socketserver.ThreadingMixIn, socketserver.TCPServer):
    allow_reuse_address = True
 class _slow_feed_server:
    def __enter__(self) -> str:
        self._server = _ThreadedTCPServer(("127.0.0.1", 0), _SlowFeedRequestHandler)
        self._thread = threading.Thread(
            target=self._server.serve_forever,
            kwargs={"poll_interval": 0.01},
            daemon=True,
        )
        self._thread.start()
        host = str(self._server.server_address[0])
        port = int(self._server.server_address[1])
        return f"http://{host}:{port}/slow-feed.rss"
    def __exit__(self, exc_type, exc, tb) -> None:
        del exc_type, exc, tb
        self._server.shutdown()
        self._server.server_close()
        self._thread.join(timeout=1)
--- a/tests/test_web.py
+++ b/tests/test_web.py
@ -0,0 +1,924 @@
 from __future__ import annotations
 import asyncio
 import os
 from datetime import UTC, datetime, timedelta
 from pathlib import Path
 from typing import Any, cast
 from repub.components import status_badge
 from repub.datastar import RefreshBroker, render_sse_event, render_stream
 from repub.jobs import load_dashboard_view
 from repub.model import (
    Job,
    JobExecution,
    JobExecutionStatus,
    Source,
    SourceFeed,
    SourcePangea,
    create_source,
 )
 from repub.pages.runs import runs_page
 from repub.web import (
    create_app,
    get_refresh_broker,
    render_create_source,
    render_dashboard,
    render_edit_source,
    render_execution_logs,
    render_runs,
    render_sources,
 )
 def test_status_badge_uses_green_done_tone() -> None:
    badge = str(status_badge(label="Succeeded", tone="done"))
    assert "bg-emerald-100 text-emerald-800" in badge
    assert "Succeeded" in badge
 def test_runs_page_renders_completed_execution_end_time_as_relative_hoverable_time() -> (
    None
 ):
    ended_at = "2026-01-15T10:00:00+00:00"
    body = str(
        runs_page(
            completed_executions=(
                {
                    "source": "Completed source",
                    "slug": "completed-source",
                    "job_id": 7,
                    "execution_id": 42,
                    "ended_at": "2 hours ago",
                    "ended_at_iso": ended_at,
                    "status": "Succeeded",
                    "status_tone": "done",
                    "stats": "1 requests • 1 items • 1 bytes",
                    "summary": "Worker exited successfully",
                    "log_href": "/job/7/execution/42/logs",
                },
            )
        )
    )
    assert "data-ended-at" in body
    assert f'data-ended-at="{ended_at}"' in body
    assert f'datetime="{ended_at}"' in body
    assert f'title="{ended_at}"' in body
    assert ">2 hours ago<" in body
 def test_root_get_serves_datastar_shim() -> None:
    async def run() -> None:
        client = create_app().test_client()
        response = await client.get("/")
        body = await response.get_data(as_text=True)
        assert response.status_code == 200
        assert response.headers["ETag"]
        assert body.startswith("<!doctype html>")
        assert (
            '<script id="js" defer type="module" src="/static/datastar@1.0.0-RC.8.js"></script>'
            in body
        )
        assert 'data-signals:tabid="self.crypto.randomUUID().substring(0,8)"' in body
        assert 'data-init="@post(window.location.pathname +' in body
        assert "retryMaxCount: Infinity" in body
        assert "data-on:online__window=" in body
        assert '<main id="morph"' in body
        assert 'href="/sources"' in body
        assert 'href="/runs"' in body
        assert "Connecting" in body
    asyncio.run(run())
 def test_create_app_bootstraps_default_database_path(
    monkeypatch, tmp_path: Path
 ) -> None:
    monkeypatch.chdir(tmp_path)
    app = create_app()
    assert Path(app.config["REPUB_DB_PATH"]) == tmp_path / "republisher.db"
    assert (tmp_path / "republisher.db").exists()
 def test_root_get_honors_if_none_match() -> None:
    async def run() -> None:
        client = create_app().test_client()
        initial = await client.get("/")
        etag = initial.headers["ETag"]
        response = await client.get("/", headers={"If-None-Match": etag})
        assert response.status_code == 304
        assert response.headers["ETag"] == etag
    asyncio.run(run())
 def test_dashboard_post_serves_morph_component() -> None:
    async def run() -> None:
        client = create_app().test_client()
        async with client.request("/?u=shim", method="POST") as connection:
            await connection.send_complete()
            chunk = await asyncio.wait_for(connection.receive(), timeout=1)
            raw_connection = cast(Any, connection)
            assert raw_connection.status_code == 200
            assert raw_connection.headers["Content-Type"] == "text/event-stream"
            assert b"event: datastar-patch-elements" in chunk
            assert b"id: " in chunk
            assert b'<main id="morph"' in chunk
            assert b"Operational snapshot" in chunk
            assert b"Running executions" in chunk
            await connection.disconnect()
    asyncio.run(run())
 def test_render_sse_event_skips_unchanged_view() -> None:
    async def run() -> None:
        async def render() -> str:
            return '<main id="morph">same</main>'
        event_id, event = await render_sse_event(render)
        repeated_id, repeated_event = await render_sse_event(
            render, last_event_id=event_id
        )
        assert repeated_id == event_id
        assert event is not None
        assert repeated_event is None
    asyncio.run(run())
 def test_app_refresh_broker_publishes_events() -> None:
    async def run() -> None:
        app = create_app()
        broker = get_refresh_broker(app)
        queue = broker.subscribe()
        broker.publish()
        event = await asyncio.wait_for(queue.get(), timeout=1)
        assert event == "refresh-event"
        broker.unsubscribe(queue)
    asyncio.run(run())
 def test_render_stream_yields_on_connect_and_refresh() -> None:
    async def run() -> None:
        queue = RefreshBroker().subscribe()
        renders = 0
        async def render() -> str:
            nonlocal renders
            renders += 1
            return f'<main id="morph">{renders}</main>'
        stream = render_stream(queue, render)
        first = await anext(stream)
        await queue.put("refresh-event")
        second = await anext(stream)
        await stream.aclose()
        assert "1</main>" in first
        assert "2</main>" in second
    asyncio.run(run())
 def test_render_dashboard_shows_dashboard_information_architecture(
    monkeypatch, tmp_path: Path
 ) -> None:
    db_path = tmp_path / "dashboard-render.db"
    monkeypatch.setenv("REPUBLISHER_DB_PATH", str(db_path))
    async def run() -> None:
        app = create_app()
        body = str(await render_dashboard(app))
        assert "Operational snapshot" in body
        assert "Running executions" in body
        assert "Published feeds" in body
        assert 'href="/sources"' in body
        assert 'href="/runs"' in body
        assert "Create source" in body
    asyncio.run(run())
 def test_render_dashboard_shows_empty_state_rows(monkeypatch, tmp_path: Path) -> None:
    db_path = tmp_path / "dashboard-empty.db"
    monkeypatch.setenv("REPUBLISHER_DB_PATH", str(db_path))
    async def run() -> None:
        app = create_app()
        body = str(await render_dashboard(app))
        assert "No job executions are running." in body
        assert "No feeds have been published yet." in body
    asyncio.run(run())
 def test_load_dashboard_view_measures_log_artifact_path(
    monkeypatch, tmp_path: Path
 ) -> None:
    db_path = tmp_path / "dashboard-footprint.db"
    monkeypatch.setenv("REPUBLISHER_DB_PATH", str(db_path))
    create_app()
    out_dir = tmp_path / "out"
    log_dir = out_dir / "logs"
    cache_dir = out_dir / "httpcache"
    log_dir.mkdir(parents=True)
    cache_dir.mkdir(parents=True)
    (log_dir / "run.log").write_bytes(b"x" * 1024)
    (cache_dir / "cache.bin").write_bytes(b"y" * 2048)
    snapshot = load_dashboard_view(log_dir=log_dir)["snapshot"]
    assert cast(dict[str, str], snapshot)["artifact_footprint"] == "3.0 KB"
 def test_render_dashboard_describes_log_artifact_footprint(
    monkeypatch, tmp_path: Path
 ) -> None:
    db_path = tmp_path / "dashboard-footprint-copy.db"
    monkeypatch.setenv("REPUBLISHER_DB_PATH", str(db_path))
    async def run() -> None:
        app = create_app()
        body = str(await render_dashboard(app))
        assert "Current artifact size under the output path." in body
    asyncio.run(run())
 def test_load_dashboard_view_lists_source_feed_artifacts(
    monkeypatch, tmp_path: Path
 ) -> None:
    db_path = tmp_path / "dashboard-feeds.db"
    monkeypatch.setenv("REPUBLISHER_DB_PATH", str(db_path))
    app = create_app()
    out_dir = tmp_path / "out"
    log_dir = out_dir / "logs"
    app.config["REPUB_LOG_DIR"] = log_dir
    log_dir.mkdir(parents=True)
    create_source(
        name="Available source",
        slug="available-source",
        source_type="feed",
        notes="",
        spider_arguments="",
        enabled=False,
        cron_minute="*/5",
        cron_hour="*",
        cron_day_of_month="*",
        cron_day_of_week="*",
        cron_month="*",
        feed_url="https://example.com/available.xml",
    )
    create_source(
        name="Missing source",
        slug="missing-source",
        source_type="feed",
        notes="",
        spider_arguments="",
        enabled=False,
        cron_minute="*/5",
        cron_hour="*",
        cron_day_of_month="*",
        cron_day_of_week="*",
        cron_month="*",
        feed_url="https://example.com/missing.xml",
    )
    feed_dir = out_dir / "feeds" / "available-source"
    feed_dir.mkdir(parents=True)
    feed_path = feed_dir / "feed.rss"
    feed_path.write_bytes(b"x" * 1024)
    (feed_dir / "audio.mp3").write_bytes(b"y" * 2048)
    reference_time = datetime(2026, 3, 30, 12, 30, tzinfo=UTC)
    updated_at = reference_time - timedelta(minutes=32)
    updated_at_epoch = updated_at.timestamp()
    os.utime(feed_path, (updated_at_epoch, updated_at_epoch))
    source_feeds = cast(
        tuple[dict[str, object], ...],
        load_dashboard_view(log_dir=log_dir, now=reference_time)["source_feeds"],
    )
    assert source_feeds == (
        {
            "source": "Available source",
            "slug": "available-source",
            "feed_href": "/feeds/available-source/feed.rss",
            "feed_status_label": "Available",
            "feed_status_tone": "done",
            "feed_exists": True,
            "last_updated": "32 minutes ago",
            "last_updated_iso": updated_at.isoformat(),
            "artifact_footprint": "3.0 KB",
        },
        {
            "source": "Missing source",
            "slug": "missing-source",
            "feed_href": "/feeds/missing-source/feed.rss",
            "feed_status_label": "Missing",
            "feed_status_tone": "failed",
            "feed_exists": False,
            "last_updated": "Never published",
            "last_updated_iso": None,
            "artifact_footprint": "0 B",
        },
    )
 def test_render_dashboard_shows_source_feed_links_and_statuses(
    monkeypatch, tmp_path: Path
 ) -> None:
    db_path = tmp_path / "dashboard-feed-links.db"
    monkeypatch.setenv("REPUBLISHER_DB_PATH", str(db_path))
    app = create_app()
    app.config["REPUB_LOG_DIR"] = tmp_path / "out" / "logs"
    create_source(
        name="Published source",
        slug="published-source",
        source_type="feed",
        notes="",
        spider_arguments="",
        enabled=False,
        cron_minute="*/5",
        cron_hour="*",
        cron_day_of_month="*",
        cron_day_of_week="*",
        cron_month="*",
        feed_url="https://example.com/published.xml",
    )
    create_source(
        name="Missing source",
        slug="missing-source",
        source_type="feed",
        notes="",
        spider_arguments="",
        enabled=False,
        cron_minute="*/5",
        cron_hour="*",
        cron_day_of_month="*",
        cron_day_of_week="*",
        cron_month="*",
        feed_url="https://example.com/missing.xml",
    )
    async def run() -> None:
        published_feed = tmp_path / "out" / "feeds" / "published-source" / "feed.rss"
        published_feed.parent.mkdir(parents=True)
        published_feed.write_text("<rss/>\n", encoding="utf-8")
        body = str(await render_dashboard(app))
        assert "Published feeds" in body
        assert 'href="/feeds/published-source/feed.rss"' in body
        assert 'href="/feeds/missing-source/feed.rss"' in body
        assert "Available" in body
        assert "Missing" in body
        assert "Never published" in body
    asyncio.run(run())
 def test_render_sources_shows_table_and_create_link() -> None:
    async def run() -> None:
        body = str(await render_sources())
        assert ">Sources<" in body
        assert 'href="/sources/create"' in body
        assert "No sources yet." in body
        assert "guardian-feed" not in body
        assert "podcast-audio" not in body
    asyncio.run(run())
 def test_render_create_source_shows_dedicated_form_page() -> None:
    async def run() -> None:
        body = str(await render_create_source())
        assert ">Create source<" in body
        assert "Source and job setup" in body
        assert "data-signals__ifmissing" in body
        assert "/actions/sources/create" in body
        assert 'data-show="$sourceType === &#39;feed&#39;"' in body
        assert 'data-show="$sourceType === &#39;pangea&#39;"' in body
        assert "jobEnabled" in body
        assert "onlyNewest" in body
        assert "includeAuthors" in body
        assert "excludeMedia" in body
        assert "includeContent" in body
        assert "TEXT_ONLY" in body
        assert "breakingnews" in body
        assert "Pangea domain" in body
        assert "Feed URL" in body
        assert "Cron schedule" in body
        assert "Initial job state" in body
        assert "Pangea mobile articles" not in body
        assert "pangea-mobile" not in body
        assert "guardianproject.info" not in body
        assert (
            "Primary Pangea mobile article mirror for the operator landing page."
            not in body
        )
        assert "language=en,download_media=true" not in body
        assert "language=en\ndownload_media=true" in body
        assert 'value="articles"' in body
        assert 'value="10"' in body
        assert 'value="3"' in body
        assert 'value="*/30"' in body
        assert 'value="*"' in body
    asyncio.run(run())
 def test_render_edit_source_shows_existing_values(monkeypatch, tmp_path: Path) -> None:
    db_path = tmp_path / "edit-page.db"
    monkeypatch.setenv("REPUBLISHER_DB_PATH", str(db_path))
    create_app()
    create_source(
        name="Kenya health desk",
        slug="kenya-health",
        source_type="pangea",
        notes="Regional health alerts.",
        spider_arguments="language=en\ndownload_media=true",
        enabled=True,
        cron_minute="0",
        cron_hour="*/6",
        cron_day_of_month="*",
        cron_day_of_week="*",
        cron_month="*",
        pangea_domain="example.org",
        pangea_category="Health",
        content_type="breakingnews",
        only_newest=True,
        max_articles=12,
        oldest_article=5,
        include_authors=True,
        exclude_media=False,
        include_content=True,
        content_format="MOBILE_3",
    )
    async def run() -> None:
        body = str(await render_edit_source("kenya-health"))
        assert "Edit source" in body
        assert "/actions/sources/kenya-health/edit" in body
        assert "Kenya health desk" in body
        assert "kenya-health" in body
        assert 'id="source-slug"' in body
        assert (
            'id="source-slug" name="source-slug" type="text" value="kenya-health"'
            in body
        )
        assert " disabled " in body
        assert "cursor-not-allowed bg-slate-100 text-slate-500" in body
        assert "example.org" in body
        assert "Health" in body
        assert "language=en\ndownload_media=true" in body
    asyncio.run(run())
 def test_create_source_action_creates_pangea_source_and_job_in_database(
    monkeypatch, tmp_path: Path
 ) -> None:
    db_path = tmp_path / "sources.db"
    monkeypatch.setenv("REPUBLISHER_DB_PATH", str(db_path))
    async def run() -> None:
        app = create_app()
        client = app.test_client()
        response = await client.post(
            "/actions/sources/create",
            headers={"Datastar-Request": "true"},
            json={
                "sourceName": "Kenya health desk",
                "sourceSlug": "kenya-health",
                "sourceType": "pangea",
                "pangeaDomain": "example.org",
                "pangeaCategory": "Health",
                "contentFormat": "MOBILE_3",
                "contentType": "breakingnews",
                "maxArticles": "12",
                "oldestArticle": "5",
                "sourceNotes": "Regional health alerts.",
                "spiderArguments": "language=en\ndownload_media=true",
                "cronMinute": "0",
                "cronHour": "*/6",
                "cronDayOfMonth": "*",
                "cronDayOfWeek": "*",
                "cronMonth": "*",
                "jobEnabled": True,
                "onlyNewest": True,
                "includeAuthors": True,
                "excludeMedia": False,
            },
        )
        body = await response.get_data(as_text=True)
        assert response.status_code == 200
        assert "window.location = '/sources'" in body
        source = Source.get(Source.slug == "kenya-health")
        pangea = SourcePangea.get(SourcePangea.source == source)
        job = Job.get(Job.source == source)
        rendered_sources = str(await render_sources(app))
        assert source.name == "Kenya health desk"
        assert source.source_type == "pangea"
        assert pangea.content_type == "breakingnews"
        assert pangea.include_content is True
        assert job.enabled is True
        assert job.spider_arguments == "language=en\ndownload_media=true"
        assert job.cron_hour == "*/6"
        assert "kenya-health" in rendered_sources
        assert "example.org / Health" in rendered_sources
        assert "Enabled" in rendered_sources
    asyncio.run(run())
 def test_create_source_action_creates_feed_source_and_job_in_database(
    monkeypatch, tmp_path: Path
 ) -> None:
    db_path = tmp_path / "feed-sources.db"
    monkeypatch.setenv("REPUBLISHER_DB_PATH", str(db_path))
    async def run() -> None:
        app = create_app()
        client = app.test_client()
        response = await client.post(
            "/actions/sources/create",
            headers={"Datastar-Request": "true"},
            json={
                "sourceName": "NASA feed",
                "sourceSlug": "nasa-feed",
                "sourceType": "feed",
                "feedUrl": "https://www.nasa.gov/rss/dyn/breaking_news.rss",
                "sourceNotes": "Primary NASA mirror.",
                "spiderArguments": "",
                "cronMinute": "30",
                "cronHour": "*",
                "cronDayOfMonth": "*",
                "cronDayOfWeek": "*",
                "cronMonth": "*",
                "jobEnabled": False,
            },
        )
        body = await response.get_data(as_text=True)
        assert response.status_code == 200
        assert "window.location = '/sources'" in body
        source = Source.get(Source.slug == "nasa-feed")
        feed = SourceFeed.get(SourceFeed.source == source)
        job = Job.get(Job.source == source)
        rendered_sources = str(await render_sources(app))
        assert source.source_type == "feed"
        assert feed.feed_url == "https://www.nasa.gov/rss/dyn/breaking_news.rss"
        assert job.enabled is False
        assert "nasa-feed" in rendered_sources
        assert "https://www.nasa.gov/rss/dyn/breaking_news.rss" in rendered_sources
        assert "Disabled" in rendered_sources
    asyncio.run(run())
 def test_edit_source_action_updates_existing_source_and_job_in_database(
    monkeypatch, tmp_path: Path
 ) -> None:
    db_path = tmp_path / "edit-source.db"
    monkeypatch.setenv("REPUBLISHER_DB_PATH", str(db_path))
    create_app()
    create_source(
        name="Kenya health desk",
        slug="kenya-health",
        source_type="pangea",
        notes="Regional health alerts.",
        spider_arguments="language=en\ndownload_media=true",
        enabled=True,
        cron_minute="0",
        cron_hour="*/6",
        cron_day_of_month="*",
        cron_day_of_week="*",
        cron_month="*",
        pangea_domain="example.org",
        pangea_category="Health",
        content_type="breakingnews",
        only_newest=True,
        max_articles=12,
        oldest_article=5,
        include_authors=True,
        exclude_media=False,
        include_content=True,
        content_format="MOBILE_3",
    )
    async def run() -> None:
        app = create_app()
        client = app.test_client()
        response = await client.post(
            "/actions/sources/kenya-health/edit",
            headers={"Datastar-Request": "true"},
            json={
                "sourceName": "Kenya health desk nightly",
                "sourceSlug": "kenya-health",
                "sourceType": "pangea",
                "pangeaDomain": "example.org",
                "pangeaCategory": "Nightly",
                "contentFormat": "TEXT_ONLY",
                "contentType": "articles",
                "maxArticles": "25",
                "oldestArticle": "7",
                "sourceNotes": "Updated nightly run.",
                "spiderArguments": "language=sw\ninclude_audio=false",
                "cronMinute": "15",
                "cronHour": "2",
                "cronDayOfMonth": "*",
                "cronDayOfWeek": "*",
                "cronMonth": "*",
                "jobEnabled": False,
                "onlyNewest": False,
                "includeAuthors": False,
                "excludeMedia": True,
                "includeContent": True,
            },
        )
        body = await response.get_data(as_text=True)
        assert response.status_code == 200
        assert "window.location = '/sources'" in body
        source = Source.get(Source.slug == "kenya-health")
        pangea = SourcePangea.get(SourcePangea.source == source)
        job = Job.get(Job.source == source)
        rendered_sources = str(await render_sources(app))
        assert source.name == "Kenya health desk nightly"
        assert source.notes == "Updated nightly run."
        assert pangea.category_name == "Nightly"
        assert pangea.content_format == "TEXT_ONLY"
        assert pangea.max_articles == 25
        assert pangea.include_authors is False
        assert pangea.exclude_media is True
        assert job.enabled is False
        assert job.spider_arguments == "language=sw\ninclude_audio=false"
        assert job.cron_hour == "2"
        assert "Kenya health desk nightly" in rendered_sources
        assert "example.org / Nightly" in rendered_sources
        assert "Disabled" in rendered_sources
    asyncio.run(run())
 def test_edit_source_action_rejects_slug_changes(monkeypatch, tmp_path: Path) -> None:
    db_path = tmp_path / "edit-invalid.db"
    monkeypatch.setenv("REPUBLISHER_DB_PATH", str(db_path))
    create_app()
    create_source(
        name="Kenya health desk",
        slug="kenya-health",
        source_type="pangea",
        notes="Regional health alerts.",
        spider_arguments="language=en\ndownload_media=true",
        enabled=True,
        cron_minute="0",
        cron_hour="*/6",
        cron_day_of_month="*",
        cron_day_of_week="*",
        cron_month="*",
        pangea_domain="example.org",
        pangea_category="Health",
        content_type="breakingnews",
        only_newest=True,
        max_articles=12,
        oldest_article=5,
        include_authors=True,
        exclude_media=False,
        include_content=True,
        content_format="MOBILE_3",
    )
    async def run() -> None:
        app = create_app()
        client = app.test_client()
        response = await client.post(
            "/actions/sources/kenya-health/edit",
            headers={"Datastar-Request": "true"},
            json={
                "sourceName": "Kenya health desk",
                "sourceSlug": "kenya-health-renamed",
                "sourceType": "pangea",
                "pangeaDomain": "example.org",
                "pangeaCategory": "Health",
                "contentFormat": "MOBILE_3",
                "contentType": "breakingnews",
                "maxArticles": "12",
                "oldestArticle": "5",
                "sourceNotes": "Regional health alerts.",
                "spiderArguments": "language=en\ndownload_media=true",
                "cronMinute": "0",
                "cronHour": "*/6",
                "cronDayOfMonth": "*",
                "cronDayOfWeek": "*",
                "cronMonth": "*",
                "jobEnabled": True,
                "onlyNewest": True,
                "includeAuthors": True,
                "excludeMedia": False,
                "includeContent": True,
            },
        )
        body = await response.get_data(as_text=True)
        assert response.status_code == 200
        assert "Slug is immutable." in body
        assert Source.get(Source.slug == "kenya-health").name == "Kenya health desk"
        assert Source.select().where(Source.slug == "kenya-health-renamed").count() == 0
    asyncio.run(run())
 def test_create_source_action_validates_duplicate_slug_and_pangea_type(
    monkeypatch, tmp_path: Path
 ) -> None:
    db_path = tmp_path / "duplicate.db"
    monkeypatch.setenv("REPUBLISHER_DB_PATH", str(db_path))
    async def run() -> None:
        app = create_app()
        Source.create(
            name="Guardian feed mirror",
            slug="guardian-feed",
            source_type="feed",
        )
        client = app.test_client()
        response = await client.post(
            "/actions/sources/create",
            headers={"Datastar-Request": "true"},
            json={
                "sourceName": "Duplicate guardian",
                "sourceSlug": "guardian-feed",
                "sourceType": "pangea",
                "pangeaDomain": "example.org",
                "pangeaCategory": "News",
                "contentFormat": "WEB",
                "contentType": "not-a-real-type",
                "maxArticles": "ten",
                "oldestArticle": "3",
                "cronMinute": "0",
                "cronHour": "*",
                "cronDayOfMonth": "*",
                "cronDayOfWeek": "*",
                "cronMonth": "*",
                "jobEnabled": True,
            },
        )
        body = await response.get_data(as_text=True)
        assert response.status_code == 200
        assert "Slug must be unique." in body
        assert "Content format is invalid." in body
        assert "Content type is invalid." in body
        assert "Max articles must be an integer." in body
        assert Source.select().where(Source.name == "Duplicate guardian").count() == 0
    asyncio.run(run())
 def test_render_runs_shows_running_upcoming_and_completed_tables(
    monkeypatch, tmp_path: Path
 ) -> None:
    db_path = tmp_path / "runs-render.db"
    monkeypatch.setenv("REPUBLISHER_DB_PATH", str(db_path))
    async def run() -> None:
        app = create_app()
        source = create_source(
            name="Runs render source",
            slug="runs-render-source",
            source_type="feed",
            notes="",
            spider_arguments="",
            enabled=True,
            cron_minute="*/30",
            cron_hour="*",
            cron_day_of_month="*",
            cron_day_of_week="*",
            cron_month="*",
            feed_url="https://example.com/runs.xml",
        )
        job = Job.get(Job.source == source)
        execution = JobExecution.create(
            job=job,
            running_status=JobExecutionStatus.SUCCEEDED,
        )
        body = str(await render_runs(app))
        assert "Running job executions" in body
        assert "Upcoming jobs" in body
        assert "Completed job executions" in body
        assert "runs-render-source" in body
        assert f"/job/{job.id}/execution/{execution.get_id()}/logs" in body
        assert "data-next-run-at" in body
        assert "in " in body
        assert "Already running" not in body
    asyncio.run(run())
 def test_render_runs_shows_empty_state_rows(monkeypatch, tmp_path: Path) -> None:
    db_path = tmp_path / "runs-empty.db"
    monkeypatch.setenv("REPUBLISHER_DB_PATH", str(db_path))
    async def run() -> None:
        app = create_app()
        body = str(await render_runs(app))
        assert body.count("No job executions are running.") == 1
        assert "No jobs are scheduled." in body
        assert "No job executions have completed yet." in body
    asyncio.run(run())
 def test_render_execution_logs_uses_app_route(monkeypatch, tmp_path: Path) -> None:
    db_path = tmp_path / "logs-render.db"
    monkeypatch.setenv("REPUBLISHER_DB_PATH", str(db_path))
    async def run() -> None:
        log_dir = tmp_path / "out" / "logs"
        app = create_app()
        app.config["REPUB_LOG_DIR"] = log_dir
        source = create_source(
            name="Log render source",
            slug="log-render-source",
            source_type="feed",
            notes="",
            spider_arguments="",
            enabled=False,
            cron_minute="*/30",
            cron_hour="*",
            cron_day_of_month="*",
            cron_day_of_week="*",
            cron_month="*",
            feed_url="https://example.com/logs.xml",
        )
        job = Job.get(Job.source == source)
        execution = JobExecution.create(
            job=job,
            running_status=JobExecutionStatus.RUNNING,
        )
        log_path = log_dir / f"job-{job.id}-execution-{execution.get_id()}.log"
        log_path.parent.mkdir(parents=True, exist_ok=True)
        log_path.write_text(
            "\n".join(
                (
                    "scheduler: run_now requested",
                    "worker: starting simulated crawl",
                    "worker: waiting for more log lines ...",
                )
            ),
            encoding="utf-8",
        )
        body = str(
            await render_execution_logs(
                app, job_id=job.id, execution_id=int(execution.get_id())
            )
        )
        assert f"Job {job.id} / execution {execution.get_id()}" in body
        assert f"/job/{job.id}/execution/{execution.get_id()}/logs" in body
        assert "waiting for more log lines" in body
    asyncio.run(run())
--- a/uv.lock
+++ b/uv.lock
@ -504,6 +504,18 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/07/c6/80c95b1b2b94682a72cbdbfb85b81ae2daffa4291fbfa1b1464502ede10d/hpack-4.1.0-py3-none-any.whl", hash = "sha256:157ac792668d995c657d93111f46b4535ed114f0c9c8d672271bbec7eae1b496", size = 34357, upload-time = "2025-01-22T21:44:56.92Z" },
 ]
 [[package]]
 name = "htpy"
 version = "25.12.0"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
    { name = "markupsafe" },
 ]
 sdist = { url = "https://files.pythonhosted.org/packages/b6/23/e00bbc355e70444d16c90a0f1fdce108c67379fe65e9312cd026c13db976/htpy-25.12.0.tar.gz", hash = "sha256:7d3f4aaa10b35c5e46dfa804df1f3f18772caf8efee6e6a035b5dee89a5d6af8", size = 291259, upload-time = "2025-12-01T20:35:01.666Z" }
 wheels = [
    { url = "https://files.pythonhosted.org/packages/61/f1/a2f2caf14b03e7fab4801ac6018a4ac996de3e82a573e7aa21f3cb11a7cc/htpy-25.12.0-py3-none-any.whl", hash = "sha256:642e69278d6f8f4643acc2d2d13c21682ceb5fb4860ecbbce042f171577fff54", size = 21141, upload-time = "2025-12-01T20:35:00.13Z" },
 ]
 [[package]]
 name = "hypercorn"
 version = "0.18.0"
@ -1077,6 +1089,7 @@ dependencies = [
    { name = "feedparser" },
    { name = "ffmpeg-python" },
    { name = "greenlet" },
    { name = "htpy" },
    { name = "lxml" },
    { name = "peewee" },
    { name = "pillow" },
@ -1108,6 +1121,7 @@ requires-dist = [
    { name = "feedparser", specifier = ">=6.0.11,<7.0.0" },
    { name = "ffmpeg-python", specifier = ">=0.2.0,<0.3.0" },
    { name = "greenlet", specifier = ">=3.2.4,<4.0.0" },
    { name = "htpy", specifier = ">=25.12.0,<26.0.0" },
    { name = "lxml", specifier = ">=5.2.1,<6.0.0" },
    { name = "peewee", specifier = ">=3.19.0,<4.0.0" },
    { name = "pillow", specifier = ">=10.3.0,<11.0.0" },
Author	SHA1	Message	Date
Abel Luck	31e1da937f	add dev-mode	2026-03-30 15:36:12 +02:00
Abel Luck	0803617e62	add empty table placeholders	2026-03-30 15:28:56 +02:00
Abel Luck	8716579508	humanize sizes	2026-03-30 15:25:28 +02:00
Abel Luck	947ef8e833	remove most subtitles	2026-03-30 15:25:10 +02:00
Abel Luck	d8f2e03d36	be consistent with env var names	2026-03-30 15:23:34 +02:00
Abel Luck	6fd3b598ab	output to out/feeds/*	2026-03-30 15:21:39 +02:00
Abel Luck	beac981047	update readme	2026-03-30 15:20:27 +02:00
Abel Luck	36cf98a91c	fix output paths	2026-03-30 15:10:47 +02:00
Abel Luck	8af28c2f68	implement scrapy + pygea job runner	2026-03-30 15:04:41 +02:00
Abel Luck	916968c579	reconcile stale execs	2026-03-30 14:18:55 +02:00
Abel Luck	90674e6515	tweak sidebar	2026-03-30 14:18:51 +02:00
Abel Luck	51728a5401	shim renders app shell	2026-03-30 14:16:15 +02:00
Abel Luck	c210168d65	tweak job runs	2026-03-30 14:14:59 +02:00
Abel Luck	2b2a3f1cc0	implement job runner and scheduler	2026-03-30 14:02:39 +02:00
Abel Luck	328a70ff9b	edit sources	2026-03-30 13:49:00 +02:00
Abel Luck	847aeae772	db backed source creation	2026-03-30 13:37:25 +02:00
Abel Luck	b9e288a22d	add sqlite database	2026-03-30 13:31:06 +02:00
Abel Luck	06066c2394	create sources in memory	2026-03-30 13:23:36 +02:00
Abel Luck	9e826fcee8	separeate pages	2026-03-30 13:11:37 +02:00
Abel Luck	3fc999a69b	add a datastar action	2026-03-30 12:48:32 +02:00
Abel Luck	33dbb143fd	add datastar SSE rendering	2026-03-30 12:34:38 +02:00
Abel Luck	2accb26546	add datastar and render shim	2026-03-30 12:27:45 +02:00
Abel Luck	9ce576e7e8	with htpy and css	2026-03-30 12:13:04 +02:00