I am experiencing an strange issue with our 3960 and IPSec tunnels. Tunnels that are simply transiting the Gate have intermittent issues where the tunnel appears up but is passing only one way traffic. The tunnel never goes down, but I only see traffic going out nothing coming in. The distal end is seeing the same thing. The tunnel is up, but traffic is only going out nothing coming in.
For clarification, the Gate is not the termination point. It is a firewall behind the Gate that is the termination point, it is simply passing the traffic.
Has anyone else seen this issue?
Tunnel will be “up” as long as the IKE control plane (UDP/500 assuming no NAT) on both sides reaches agreement, and occasionally send and reply to dead-peer-detection messages. (one-way traffic is indistinguishable from real silence in the tunnel in the same direction, after all)
Is the traffic purely being routed between unique tuples of IPs, with no SNAT happening? What if you force UDP encapsulation (UDP/4500 instead of ESP), does the issue stop happening then?
Is the FortiGate even receiving the missing packets in the problematic direction? (maybe the drop happens elsewhere?)
I’ve had similar recently with a few customers. Basically one one wasn’t receiving ESP traffic on the outside interface, but the tunnel was up and traffic flowed the other direction. Never got a final fix but it would resolve on its own after some time. Customers hate that answer unfortunately.
Make sure you enable dpd on-idle on both ends.
That being said - I created a ticket on this exact issue the other day, on a dial-up setup. The tunnel was “up” on the spoke, but not present on the hub. TAC did not manage to figure it out, and we ended up with resetting phase1 for the tunnel. Resetting phase2 did not make any difference.
I also faced a similar issue for an IPsec. The vpn events showed an error with dpd message failure so shifted it to dialup vpn connection where dpd is off and viola it resolved.
Fortigate has an IPSec phase 1 bug since forever where an active phase 1 is not renegotiated if a new request comes from the same peer–say the peer suddenly power cycled and didn’t notify that the phase 1 is going down. Fortinet solution is to always enable DPD. I don’t do that because DPD has a purpose and it’s not to cover for their bugs.
My solution is to set the phase 1 timeout to a low enough value that failures won’t be noticed. 86400 seconds one day is garbage even for my home connection. I use 1800 seconds. A higher reliability environment would be shorter. Pin up the phase 2 and the phase 1 won’t be used until something kills the phase 2.
I first met this bug in 3.x. Looks like it’s still there. Other router brands do not have this bug.
Timeouts do not need to match on both peers. If your peer insists on forever and a day, set it to 1800 and rock on.