Update NATS to 2.10.20, use SyncAlways (#3418)

The internal NATS instance is definitely convenient but it does have one
problem: its lifecycle is tied to the Dendrite process. That means if
Dendrite panics or OOMs, it takes out NATS with it. I suspect this is
sometimes contributing to what people see with stuck streams, as some
operations or state might not be written to disk fully before it gets
interrupted.

Using `SyncAlways` means that NATS will effectively use `O_SYNC` and
block writes on flushes, which should improve resiliency against this
kind of failure considerably. It might affect performance a little but
shouldn't be significant.

Also updates NATS to 2.10.20 as there have been all sorts of fixes since
2.10.7, including better `SyncAlways` handling.

Signed-off-by: Neil Alexander <git@neilalexander.dev>

---------

Signed-off-by: Neil Alexander <git@neilalexander.dev>
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
This commit is contained in:
Neil 2024-09-10 19:54:38 +01:00 committed by GitHub
parent 3a2eadcc36
commit 117ed66037
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
3 changed files with 35 additions and 34 deletions

View file

@ -56,6 +56,7 @@ func (s *NATSInstance) Prepare(process *process.ProcessContext, cfg *config.JetS
MaxPayload: 16 * 1024 * 1024,
NoSigs: true,
NoLog: cfg.NoLog,
SyncAlways: true,
}
s.Server, err = natsserver.NewServer(opts)
if err != nil {