Currently Sharkey tries to send activities to dead instances until you manually toggle sending activities of for the instances, this causes a pile up of failed deliver requests and causes unnecessary load on the server
Designs
Child items
...
Show closed items
Linked items
0
Link issues together to show that they're related or that one is blocking others.
Learn more.
This seems quite hard to figure out considering you would need to know when to count a instance as dead and not just down for x amount as well as not just having issues specific to the instance sending the data (like resolving issues via ipv6 or ipv4)
Imho something like 14 days should be fine. I can't really think of a scenario where an issue like this would go unnoticed that long on an active instance (Wouldn't an instance be completely overwhelmed after such a downtime anyway if it had to process all these old posts?).
the foundkey code skips suspended instances, or instances that haven't been talked with for a week
the iceshrimp code does almost the same thing, with a "not responding" extra boolean
sharkey (and misskey!) marks an instance as "suspended" if it responds with a 410, and then automatically skips it
so, what we're missing is a "last time we interacted with this instance" (we have latestRequestReceivedAt which is not quite the same thing: if that instance only consumes our activities, and never sends any, we'll never update that column); it should be updated whenever we set isNotResponding to false.
once we add that, we could have a periodic job go through the instances and suspend the ones that are not responding and haven't been for a while (configurable interval). or even suspend them straight from DeliverProcessorService.