republisher/AGENTS.md

# republisher-redux

See @README.md

## Overview

- `republisher-redux` is a Scrapy-based tool that mirrors RSS and Atom feeds for offline use.
- Python packaging uses `pyproject.toml` with `setuptools`.
- Development uses `uv`
- Nix development and packaging use `flake.nix`.
- Formatting is managed through `treefmt-nix`, exposed via `nix fmt`.

- Prefer immutable style functional programming style
    - functions that operate on data over classes that encapsulate state
- Think carefully and implement the most concise solution that changes as little code as possible.


## HTML/Datastar Rules

Very important rules for datastar usage.

The views are pure functions data in -> html out.

- we only use full page morph mode. no diffing
    Why large/fat/main morphs (aka immediate mode)?

    By only using data: mode morph and always targeting the main element of the document the API can be massively simplified. This avoids having the explosion of endpoints you get with HTMX and makes reasoning about your app much simpler.

- we only have a single render function per page
    By having a single render function per page you can simplify the reasoning about your app to view = f(state). You can then reason about your pushed updates as a continuous signal rather than discrete event stream. The benefit of this is you don't have to handle missed events, disconnects and reconnects. When the state changes on the server you push down the latest view, not the delta between views. On the client idiomorph can translate that into fine grained dom updates.


- any database change -> re render all connected users with 200ms throttle
    When your events are not homogeneous, you can't miss events, so you cannot throttle your events without losing data.

    But, wait! Won't that mean every change will cause all users to re-render? Yes, but at a maximum rate determined by the throttle. This, might sound scary at first but in practice:

        The more shared views the users have the more likely most of the connected users will have to re-render when a change happen.

        The more events that are happening the more likely most users will have to re-render.

    This means you actually end up doing more work with a non homogeneous event system under heavy load than with this simple homogeneous event system that's throttled (especially it there's any sort of common/shared view between users).

- Signals are only for ephemeral client side state
    Signals should only be used for ephemeral client side state. Things like: the current value of a text input, whether a popover is visible, current csrf token, input validation errors. Signals can be controlled on the client via expressions, or from the backend via patch-signals.
- Signals in elements should be declared __ifmissing
    Because signals are only being used to represent ephemeral client state that means they can only be initialised by elements and they can only be changed via expressions on the client or from the server via patch-signals in an action. Signals in elements should be declared __ifmissing unless they are "view only".

- View only signals, are signals that can only be changed by the server. These should not be declared __ifmissing instead they should be made "local" by starting their key with an _ this prevents the client from sending them up to the server.

- Actions should not update the view themselves directly
    Actions should not update the view via patch elements. This is because the changes they make would get overwritten on the next render-fn that pushes a new view down the updates SSE connection. However, they can still be used to update signals as those won't be changed by elements patch. This allows you to do things like validation on the server.

- Stateless views
The only way for actions to affect the view returned by the render-fn running in a connection is via the database. The ensures CQRS. This means there is no connection state that needs to be persisted or maintained (so missed events and shutdowns/deploys will not lead to lost state). Even when you are running in a single process there is no way for an action (command) to communicate with/affect a view render (query) without going through the database.

- CQRS
    Actions modify the database and return a 204 or a 200 if they patch-signals.
    Render functions re-render when the database changes and send an update down the updates SSE connection.

- Work sharing (caching)
    Work sharing is the term I'm using for sharing renders between connected users. This can be useful when a lot of connected users share the same view. For example a leader board, game board, presence indicator etc. It ensures the work (eg: query and html generation) for that view is only done once regardless of the number of connected users. The simplest way to do this is to recalculate and cache values after after a batch has been run.

- Use data-on:pointerdown/mousedown over data-on:click
    This is a small one but can make even the slowest of networks feel much snappier.

- No CORS By hosting all assets on the same origin we avoid the need for CORS. This avoids additional server round trips and helps reduce latency.

- Rendering an initial shim -Rather than returning the whole page on initial render and having two render paths, one for initial render and one for subsequent rendering a shell is rendered and then populated when the page connects to the updates endpoint for that page. This has a few advantages:

    The page will only render dynamic content if the user has javascript and first party cookies enabled.

    The initial shell page can generated and compressed once.

    The server only does more work for actual users and less work for link preview crawlers and other bots (that don't support javascript or cookies).

## Workflow

- Use Python 3.13.
- Enter the dev environment with `nix develop` if you are not already inside it
- Sync Python dependencies with `uv sync --all-groups`.
- Run the app with `uv run repub`.
- Generate CSS with `tailwindcss -i ./repub/static/app.tailwind.css -o ./repub/static/app.css` and add `--watch` when you need live rebuilds.
- Validate a generated feed with `./scripts/validate-feed path/to/feed.rss`. This wraps the local checkout at `~/src/github.com/w3c/feedvalidator` and pages the validator output through `less` by default.

```sh
uv sync --all-groups
uv run pytest
uv run flake8 repub/ tests/
uv run pyright
./scripts/validate-feed out/feeds/mn-cuba/feed.rss
nix fmt
nix flake check
uv run repub
uv run repub crawl -c repub.toml
```

## Validation

- Run `nix fmt` after changing repo files that are covered by treefmt.
- Run `nix flake check` before declaring work complete.
- `nix flake check` is expected to build and check the formatter, devshell, package, tests, and lint derivations.

## Editing Rules

- Keep `treefmt.nix`, `flake.nix`, and `pyproject.toml` aligned.
- Prefer updating the flake-exported package and checks rather than adding ad hoc scripts.
- Put new SQLite schema objects in numbered files under `repub/sql/` such as `002_*.sql`.
- For backward-compatible column additions on existing SQLite databases, use Peewee's `playhouse.migrate` helpers instead of raw ad hoc `ALTER TABLE` logic.
- Do not commit, amend, or stage unrelated files unless explicitly asked.
- Final verication `nix flake check` must be greenbefore claiming task completeness

## Repo Notes

- The console entrypoint is `repub`.
- Runtime ffmpeg availability is provided by the flake package and devshell.
- Tests live under `tests/`.
- `prompts/` is git ignored intentionally
- Never search the web for this repo. If an external resource, document, or reference is needed, stop and ask the user to provide it.
- Treat the repo-root `republisher.db` as user-owned local state. Do not delete or reset it as part of routine testing or verification.
- For automated tests or isolated verification, use a separate database path via `REPUBLISHER_DB_PATH` instead of mutating or removing the repo-root database.