tag:help.tenderapp.com,2008-11-12:/discussions/problems/114496-why-was-tender-down-for-several-daysTender: Discussion 2023-05-15T19:47:50Ztag:help.tenderapp.com,2008-11-12:Comment/571181422022-12-12T09:29:04Z2022-12-12T16:47:48ZWhy was tender down for several days? <div><p>I'll need a good explanation as to why the site was down for so long, and why there was zero communication from you about it. Otherwise we'll be jumping ship as it feels like this product is no longer supported.</p></div>neiltag:help.tenderapp.com,2008-11-12:Comment/571181422022-12-12T11:07:02Z2022-12-12T16:47:48ZWhy was tender down for several days? <div><p>Hi Neil, one of our load balancers’ ethernet connection became unreliable<br>
for unknown reasons and was up and down for about a day, and instead of the<br>
automatic IP failover kicking in as should happen; it didn’t trigger<br>
because the machine was still up and running, just had a flaky connection<br>
to the internet but not to the local subnet or the monitoring system.</p>
<p>We had some alerts but it seemed to be working when we checked it manually,<br>
which is not out of the ordinary. There are periodic ddos attempts which<br>
often render the same sorts of alerts as the systems kick in.</p>
<p>Once the weekend was over here we dug into it as part of monday morning<br>
roundups and saw that traffic was way down, and found that one port on the<br>
switch was misbehaving, so we forced a failover to the secondary load<br>
balancer, reconfigured the networking on that machine, had the data center<br>
switch the cables in case that was the issue, and re-sent all the inbound<br>
emails that had queued up due to the connectivity issue.</p>
<p>The failover script, which hadn’t run for over six years, has been tweaked<br>
to be more aggressive in pruning a misbehaving primary IP, there’s a new<br>
cable, and we’re looking at ways of improving the monitoring, including<br>
pings from around the world so it captures certain routes outside the<br>
primary path.</p>
<p>we still don’t actually have a root cause as to why the original issue<br>
happened so not much point in talking about it yet! trying to mail it down<br>
between the switch and our colo host’s ddos protection.</p></div>Courtenay Gaskingtag:help.tenderapp.com,2008-11-12:Comment/571181422022-12-12T11:32:00Z2022-12-12T16:47:48ZWhy was tender down for several days? <div><p>Hi Neil, one of our load balancers’ ethernet connection became unreliable<br>
for unknown reasons and was up and down for about a day, and instead of the<br>
automatic IP failover kicking in as should happen; it didn’t trigger<br>
because the machine was still up and running, just had a flaky connection<br>
to the internet but not to the local subnet or the monitoring system.</p>
<p>We had some alerts but it seemed to be working when we checked it manually,<br>
which is not out of the ordinary. There are periodic ddos attempts which<br>
often render the same sorts of alerts as the systems kick in.</p>
<p>Once the weekend was over here we dug into it as part of monday morning<br>
roundups and saw that traffic was way down, and found that one port on the<br>
switch was misbehaving, so we forced a failover to the secondary load<br>
balancer, reconfigured the networking on that machine, had the data center<br>
switch the cables in case that was the issue, and re-sent all the inbound<br>
emails that had queued up due to the connectivity issue.</p>
<p>The failover script, which hadn’t run for over six years, has been tweaked<br>
to be more aggressive in pruning a misbehaving primary IP, there’s a new<br>
cable, and we’re looking at ways of improving the monitoring, including<br>
pings from around the world so it captures certain routes outside the<br>
primary path.</p>
<p>we still don’t actually have a root cause as to why the original issue<br>
happened so not much point in talking about it yet! trying to mail it down<br>
between the switch and our colo host’s ddos protection.</p></div>Courtenay