Why was tender down for several days?
I'll need a good explanation as to why the site was down for so long, and why there was zero communication from you about it. Otherwise we'll be jumping ship as it feels like this product is no longer supported.
Discussions are closed to public comments.
If you need help with Tender please
start a new discussion.
Keyboard shortcuts
Generic
? | Show this help |
---|---|
ESC | Blurs the current field |
Comment Form
r | Focus the comment reply box |
---|---|
^ + ↩ | Submit the comment |
You can use Command ⌘
instead of Control ^
on Mac
1 Posted by Courtenay Gaski... on 12 Dec, 2022 11:07 AM
Hi Neil, one of our load balancers’ ethernet connection became unreliable
for unknown reasons and was up and down for about a day, and instead of the
automatic IP failover kicking in as should happen; it didn’t trigger
because the machine was still up and running, just had a flaky connection
to the internet but not to the local subnet or the monitoring system.
We had some alerts but it seemed to be working when we checked it manually,
which is not out of the ordinary. There are periodic ddos attempts which
often render the same sorts of alerts as the systems kick in.
Once the weekend was over here we dug into it as part of monday morning
roundups and saw that traffic was way down, and found that one port on the
switch was misbehaving, so we forced a failover to the secondary load
balancer, reconfigured the networking on that machine, had the data center
switch the cables in case that was the issue, and re-sent all the inbound
emails that had queued up due to the connectivity issue.
The failover script, which hadn’t run for over six years, has been tweaked
to be more aggressive in pruning a misbehaving primary IP, there’s a new
cable, and we’re looking at ways of improving the monitoring, including
pings from around the world so it captures certain routes outside the
primary path.
we still don’t actually have a root cause as to why the original issue
happened so not much point in talking about it yet! trying to mail it down
between the switch and our colo host’s ddos protection.
Support Staff 2 Posted by Courtenay on 12 Dec, 2022 11:32 AM
Hi Neil, one of our load balancers’ ethernet connection became unreliable
for unknown reasons and was up and down for about a day, and instead of the
automatic IP failover kicking in as should happen; it didn’t trigger
because the machine was still up and running, just had a flaky connection
to the internet but not to the local subnet or the monitoring system.
We had some alerts but it seemed to be working when we checked it manually,
which is not out of the ordinary. There are periodic ddos attempts which
often render the same sorts of alerts as the systems kick in.
Once the weekend was over here we dug into it as part of monday morning
roundups and saw that traffic was way down, and found that one port on the
switch was misbehaving, so we forced a failover to the secondary load
balancer, reconfigured the networking on that machine, had the data center
switch the cables in case that was the issue, and re-sent all the inbound
emails that had queued up due to the connectivity issue.
The failover script, which hadn’t run for over six years, has been tweaked
to be more aggressive in pruning a misbehaving primary IP, there’s a new
cable, and we’re looking at ways of improving the monitoring, including
pings from around the world so it captures certain routes outside the
primary path.
we still don’t actually have a root cause as to why the original issue
happened so not much point in talking about it yet! trying to mail it down
between the switch and our colo host’s ddos protection.
brandi closed this discussion on 15 May, 2023 07:47 PM.