republisher/AGENTS.md

8.9 KiB

republisher-redux

See @README.md

Overview

  • republisher-redux is a Scrapy-based tool that mirrors RSS and Atom feeds for offline use.

  • Python packaging uses pyproject.toml with setuptools.

  • Development uses uv

  • Nix development and packaging use flake.nix.

  • Formatting is managed through treefmt-nix, exposed via nix fmt.

  • Prefer immutable style functional programming style

    • functions that operate on data over classes that encapsulate state
  • Think carefully and implement the most concise solution that changes as little code as possible.

HTML/Datastar Rules

Very important rules for datastar usage.

The views are pure functions data in -> html out.

  • we only use full page morph mode. no diffing Why large/fat/main morphs (aka immediate mode)?

    By only using data: mode morph and always targeting the main element of the document the API can be massively simplified. This avoids having the explosion of endpoints you get with HTMX and makes reasoning about your app much simpler.

  • we only have a single render function per page By having a single render function per page you can simplify the reasoning about your app to view = f(state). In immediate-mode terms the server is re-running the whole page render against the latest state, like a game loop, rather than trying to patch the view incrementally by hand. You can then reason about your pushed updates as a continuous signal rather than discrete event stream. The benefit of this is you don't have to handle missed events, disconnects and reconnects. When the state changes on the server you push down the latest view, not the delta between views. On the client idiomorph can translate that into fine grained dom updates.

  • any database change -> re render all connected users with 200ms throttle When your events are not homogeneous, you can't miss events, so you cannot throttle your events without losing data.

    But, wait! Won't that mean every change will cause all users to re-render? Yes, but at a maximum rate determined by the throttle. This, might sound scary at first but in practice:

      The more shared views the users have the more likely most of the connected users will have to re-render when a change happen.
    
      The more events that are happening the more likely most users will have to re-render.
    

    This means you actually end up doing more work with a non homogeneous event system under heavy load than with this simple homogeneous event system that's throttled (especially it there's any sort of common/shared view between users).

  • Signals are only for ephemeral client side state Signals should only be used for ephemeral client side state. Things like: the current value of a text input, whether a popover is visible, current csrf token, input validation errors. Signals can be controlled on the client via expressions, or from the backend via patch-signals.

  • Signals in elements should be declared __ifmissing Because signals are only being used to represent ephemeral client state that means they can only be initialised by elements and they can only be changed via expressions on the client or from the server via patch-signals in an action. Signals in elements should be declared __ifmissing unless they are "view only".

  • View only signals, are signals that can only be changed by the server. These should not be declared __ifmissing instead they should be made "local" by starting their key with an _ this prevents the client from sending them up to the server.

  • Actions should not update the view themselves directly Actions should not update the view via patch elements. This is because the changes they make would get overwritten on the next render-fn that pushes a new view down the updates SSE connection. However, they can still be used to update signals as those won't be changed by elements patch. This allows you to do things like validation on the server.

  • Stateless views The state passed to a render-fn should be thought of as {persistent db state, ephemeral tab state}. The database is the source of truth for durable application state. Ephemeral tab state is server-owned in-memory state keyed by tab id for non-persistent UI concerns like pagination, sort order, expanded panels, wizard step, etc.

    This tab state is not a client signal and not a database row. It exists so that non-persistent actions can still participate in the same immediate-mode render model: an action mutates server-side tab state, then the render-fn re-runs with the new {db state, tab state} and sends the latest full page view.

    Tab state must be scoped to a single tab/SSE connection, initialized when the long-lived page stream connects, cleaned up when that stream closes, and periodically reaped for stale entries so memory cannot grow without bound.

    Nothing else should influence a render. Do not smuggle view state through ad hoc globals, connection-local mutable objects, or client-owned signals that the server "trusts". If state should survive reconnects, restarts, or be shared across users, it belongs in the database. If it is purely per-tab and ephemeral, it belongs in tab state.

  • CQRS Persistent actions modify the database and return a 204 or a 200 if they patch-signals. Ephemeral actions modify tab state and return a 204 or a 200 if they patch-signals. Render functions re-render from the combined {db state, tab state} and send an update down the updates SSE connection.

  • Work sharing (caching) Work sharing is the term I'm using for sharing renders between connected users. This can be useful when a lot of connected users share the same view. For example a leader board, game board, presence indicator etc. It ensures the work (eg: query and html generation) for that view is only done once regardless of the number of connected users. The simplest way to do this is to recalculate and cache values after after a batch has been run.

  • Use data-on:pointerdown/mousedown over data-on:click This is a small one but can make even the slowest of networks feel much snappier.

  • No CORS By hosting all assets on the same origin we avoid the need for CORS. This avoids additional server round trips and helps reduce latency.

  • Rendering an initial shim -Rather than returning the whole page on initial render and having two render paths, one for initial render and one for subsequent rendering a shell is rendered and then populated when the page connects to the updates endpoint for that page. This has a few advantages:

    The page will only render dynamic content if the user has javascript and first party cookies enabled.

    The initial shell page can generated and compressed once.

    The server only does more work for actual users and less work for link preview crawlers and other bots (that don't support javascript or cookies).

Workflow

  • Use Python 3.13.
  • Enter the dev environment with nix develop if you are not already inside it
  • Sync Python dependencies with uv sync --all-groups.
  • Run the app with uv run repub.
  • Generate CSS with tailwindcss -i ./repub/static/app.tailwind.css -o ./repub/static/app.css and add --watch when you need live rebuilds.
  • Validate a generated feed with ./scripts/validate-feed path/to/feed.rss. This wraps the local checkout at ~/src/github.com/w3c/feedvalidator and pages the validator output through less by default.
uv sync --all-groups
uv run pytest
uv run flake8 repub/ tests/
uv run pyright
./scripts/validate-feed out/feeds/mn-cuba/feed.rss
nix fmt
nix flake check
uv run repub
uv run repub crawl -c repub.toml

Validation

  • Run nix fmt after changing repo files that are covered by treefmt.
  • Run nix flake check before declaring work complete.
  • nix flake check is expected to build and check the formatter, devshell, package, tests, and lint derivations.

Editing Rules

  • Keep treefmt.nix, flake.nix, and pyproject.toml aligned.
  • Prefer updating the flake-exported package and checks rather than adding ad hoc scripts.
  • Put new SQLite schema objects in numbered files under repub/sql/ such as 002_*.sql.
  • For backward-compatible column additions on existing SQLite databases, use Peewee's playhouse.migrate helpers instead of raw ad hoc ALTER TABLE logic.
  • Do not commit, amend, or stage unrelated files unless explicitly asked.
  • Final verication nix flake check must be greenbefore claiming task completeness

Repo Notes

  • The console entrypoint is repub.
  • Runtime ffmpeg availability is provided by the flake package and devshell.
  • Tests live under tests/.
  • prompts/ is git ignored intentionally
  • Never search the web for this repo. If an external resource, document, or reference is needed, stop and ask the user to provide it.
  • Treat the repo-root republisher.db as user-owned local state. Do not delete or reset it as part of routine testing or verification.
  • For automated tests or isolated verification, use a separate database path via REPUBLISHER_DB_PATH instead of mutating or removing the repo-root database.