fix(poller): short-poll inbox instead of 30s long-poll

The 10s-after-healthy failure pattern was reproducing even with
the connection pool disabled. Smoking gun: the inbox loop opens
GET /messages?timeout=30 right after start_daemon returns, and
every subsequent peers/routes call timed out exactly when our
client-side reqwest timeout (10s) fired.

Concluded mycelium 0.6.1's HTTP server serialises requests: while
the long-poll connection is held, no other admin endpoint can
respond. The sidecar process kept logging routes the whole time
(seen in the in-app log buffer) — proof the daemon was alive,
just unable to serve concurrent calls.

Switch to short-poll: timeout=0 returns immediately, sleep 2s
between iterations. Per-iteration server hold time is now
millisecond-scale instead of 30s.
This commit is contained in:
syoul
2026-04-26 00:32:58 +02:00
parent 7981fc571c
commit 939565b88a

View File

@@ -11,7 +11,7 @@ use tracing::warn;
const PEERS_INTERVAL: Duration = Duration::from_secs(3); const PEERS_INTERVAL: Duration = Duration::from_secs(3);
const ROUTES_INTERVAL: Duration = Duration::from_secs(5); const ROUTES_INTERVAL: Duration = Duration::from_secs(5);
const INBOX_LONG_POLL_SECS: u64 = 30; const INBOX_INTERVAL: Duration = Duration::from_secs(2);
const INBOX_RETRY_BACKOFF: Duration = Duration::from_secs(2); const INBOX_RETRY_BACKOFF: Duration = Duration::from_secs(2);
const INBOX_CAPACITY: usize = 200; const INBOX_CAPACITY: usize = 200;
@@ -121,18 +121,22 @@ fn spawn_inbox_loop(
) -> JoinHandle<()> { ) -> JoinHandle<()> {
tokio::spawn(async move { tokio::spawn(async move {
loop { loop {
tokio::time::sleep(INBOX_INTERVAL).await;
let Some(client) = sidecar.client() else { let Some(client) = sidecar.client() else {
break; break;
}; };
// Each iteration is a fresh long-poll. The daemon answers as // Short-poll: timeout=0 returns immediately if no message.
// soon as a message arrives, or returns an empty body / 204 // We previously used a 30s long-poll, but mycelium 0.6.1's
// when the timeout window elapses. // HTTP server appears to serialise requests behind a single
match client.pop_message(false, INBOX_LONG_POLL_SECS, None).await { // worker — holding the connection for 30s starved every
// other endpoint (peers, routes, admin) until our own
// 10s reqwest timeout kicked in.
match client.pop_message(false, 0, None).await {
Ok(Some(msg)) => { Ok(Some(msg)) => {
me.push_inbox(msg.clone()); me.push_inbox(msg.clone());
let _ = app.emit("messages://incoming", &msg); let _ = app.emit("messages://incoming", &msg);
} }
Ok(None) => {} // window expired, loop Ok(None) => {}
Err(e) => { Err(e) => {
warn!(error = %e, "inbox: pop_message failed"); warn!(error = %e, "inbox: pop_message failed");
tokio::time::sleep(INBOX_RETRY_BACKOFF).await; tokio::time::sleep(INBOX_RETRY_BACKOFF).await;