Why was tender down for several days?

neil's Avatar

neil

12 Dec, 2022 09:29 AM

I'll need a good explanation as to why the site was down for so long, and why there was zero communication from you about it. Otherwise we'll be jumping ship as it feels like this product is no longer supported.

  1. 1 Posted by Courtenay Gaski... on 12 Dec, 2022 11:07 AM

    Courtenay Gasking's Avatar

    Hi Neil, one of our load balancers’ ethernet connection became unreliable
    for unknown reasons and was up and down for about a day, and instead of the
    automatic IP failover kicking in as should happen; it didn’t trigger
    because the machine was still up and running, just had a flaky connection
    to the internet but not to the local subnet or the monitoring system.

    We had some alerts but it seemed to be working when we checked it manually,
    which is not out of the ordinary. There are periodic ddos attempts which
    often render the same sorts of alerts as the systems kick in.

    Once the weekend was over here we dug into it as part of monday morning
    roundups and saw that traffic was way down, and found that one port on the
    switch was misbehaving, so we forced a failover to the secondary load
    balancer, reconfigured the networking on that machine, had the data center
    switch the cables in case that was the issue, and re-sent all the inbound
    emails that had queued up due to the connectivity issue.

    The failover script, which hadn’t run for over six years, has been tweaked
    to be more aggressive in pruning a misbehaving primary IP, there’s a new
    cable, and we’re looking at ways of improving the monitoring, including
    pings from around the world so it captures certain routes outside the
    primary path.

    we still don’t actually have a root cause as to why the original issue
    happened so not much point in talking about it yet! trying to mail it down
    between the switch and our colo host’s ddos protection.

  2. Support Staff 2 Posted by Courtenay on 12 Dec, 2022 11:32 AM

    Courtenay's Avatar

    Hi Neil, one of our load balancers’ ethernet connection became unreliable
    for unknown reasons and was up and down for about a day, and instead of the
    automatic IP failover kicking in as should happen; it didn’t trigger
    because the machine was still up and running, just had a flaky connection
    to the internet but not to the local subnet or the monitoring system.

    We had some alerts but it seemed to be working when we checked it manually,
    which is not out of the ordinary. There are periodic ddos attempts which
    often render the same sorts of alerts as the systems kick in.

    Once the weekend was over here we dug into it as part of monday morning
    roundups and saw that traffic was way down, and found that one port on the
    switch was misbehaving, so we forced a failover to the secondary load
    balancer, reconfigured the networking on that machine, had the data center
    switch the cables in case that was the issue, and re-sent all the inbound
    emails that had queued up due to the connectivity issue.

    The failover script, which hadn’t run for over six years, has been tweaked
    to be more aggressive in pruning a misbehaving primary IP, there’s a new
    cable, and we’re looking at ways of improving the monitoring, including
    pings from around the world so it captures certain routes outside the
    primary path.

    we still don’t actually have a root cause as to why the original issue
    happened so not much point in talking about it yet! trying to mail it down
    between the switch and our colo host’s ddos protection.

  3. brandi closed this discussion on 15 May, 2023 07:47 PM.

Discussions are closed to public comments.
If you need help with Tender please start a new discussion.

Keyboard shortcuts

Generic

? Show this help
ESC Blurs the current field

Comment Form

r Focus the comment reply box
^ + ↩ Submit the comment

You can use Command ⌘ instead of Control ^ on Mac