Understanding Network Performance - Site to Site VPN Poor Performance

I am having issues with poor performance over a site to site VPN connection. Users at a remote site complain of poor performance when trying to use Revit and/or opening files from our primary file server. Some information I think is relevant:

  • Site A Bandwidth(file server location): 150Mbps Symmertrical

  • Site B Bandwidth (remote location): 1Gbps Symmetrical

  • Latency Between Sites: 50ms

  • Site to Site VPN is between two Watchguard M370s

  • File Server is Windows Server 2016

  • Client Machines are Windows 10 Pro

Some performance characteristics:

  • LAN transfer of a 10GB zip file from machine to machine at Site B is very steady at roughly 30MB/s (Gbps LAN)

  • WAN transfer of a 10GB zip file from Site B to our file server at Site A is roughly 15MB/s

  • WAN transfer of a 10GB zip from the file server at Site A to Site B routinely fluctuates between 350KB/s and 705KB/s with an occasional spike of 2.0MB/s

I’ve been researching this issue and trying to improve performance but can’t seem to find a good starting point. Based on my research I am able to see that the SMB protocol is not particularly well-suited for WAN transmission. When I run iPerf I get the following results:

[Site B to Site A] (https://imgur.com/a/I0xKTuY)

[Site A to Site B] (https://imgur.com/a/r0yAdbi)

Based on my research about the issue I think I have a fairly good idea of the issue. Effectively I believe I have a Long Fat Network and I am unable to saturate the bandwidth between both sites. Essentially, I am being hindered by big pipes, high latency, and the chatty TCP protocol that is SMB (dont worry, V1 is disabled ORG WIDE). However, I do not know how to marry together the information I have found in order to develop a solution to the poor performance. I know that I can calculate the Bandwidth Delay Product and from I gather I can adjust MTU to adjust the amount of data in flight for each frame. Additional research has taught me about auto-tuning and receive side scaling to help increase the frame size during a TCP transmission. However I feel like I am getting farther away from solving the problem. Company ownership seems to think that increasing the bandwidth at site A will help fix the issue (hence a higher bandwidth internet connection is being configured this Saturday by me) but I am not certain that is the case. Am I on the right track in looking at BDP, MTU, and frame sizing? Is there a better way to address a performance issue like this? What are the next steps I should take in order to even resolve this kind of issue? Thank you in advance for any help you can offer.

What are your configured MTU values?

What are your configured MSS Clamping values?

What is the round Trip latency from source to destination (and back) ?

What’s the packet loss? TCP Sawtooth effect could be killing you here. I was struggling with the same thing across some Comcast Business accounts until I added a Silverpeak and then my transfers (specifically backups) actually exceeded my WAN speeds. It was wild, but localized ACK to your servers, and some great “network raid” can really save your speeds.

One out of order or lost packet is the death of TCP speed.

I assume that speed from Site B to the internet is fine? Any interface errors? Packet loss?

We had a similar issue and it would come and go and I figured it was something further upstream. We finally switched to another ISP at the branch office and never had this issue again.

We also had this issue when there was a duplex mismatch between our firewall and the router on the ISP end but that was affecting all traffic including traffic to the internet.

If there is no packet loss or interface errors it could be an MTU/MSS issue as others have suggested but it could also be further down the stream and you may not be able to change that.

We had the same issue and fixed it with changing the encryption Settings.
Which Settings do you use in p1 and p2?

I have a M370 on a symmetrical 500MB link, SSL VPN users get max 1MB download speeds from the file server, have been looking at this for a while now, cannot figure out why it’s so slow.

If your file server is Hyper-V virtualized try updating network card drivers on the host and/or disabling VM queues for the adapter (Disable-NetadapterVMQ).

Also check to see if Windows Defender or another AV product is interfering.

i had a similar issue (in my case all site to site VPN traffic would max out at about 350k/s) and it turned out to be IPS on my firewall

Currently MTU on the host at the remote location is set at 1350. I calculated the MTU by using ping host -l -f 1322 and then added the standard header length 28bits for a total of 1350. MTU on the router and the file server have not been adjusted

MSS is configured for AUTO on all routers

RTT latency Site A - B: 50ms
RTT latency Site B -A: 49ms

I think that is essentially what I am seeing. I’ve tried running Wireshark (learning on my own time, but nowhere near an experienced user by any means) and based on the number of duplicate ACKs that wouldn’t be a surprising result. When I perform a transfer from the remote host with Resource Monitor open I essentially see what looks like saw teeth in the network graph

Silverpeak

What “silverpeak” did you add?

This is a brand new fiber buildout from the ISP. There are only 2 ISPs in the region and one of them doesn’t supply service to my location at Site B. General web speed is phenomenal, its just when trying to transfer data over the wire from Site A to Site B.

Phase 1 settings are SHA2-256-AES(256-bit) with Diffie-Hellman 14 key group. Phase 2 settings are ESP with SHA2-256-AES(256-bit) as well. Phase 2 rekeys on a 24 hour interval

Disable-NetadapterVMQ

Is this specific to Hyper-V only? Our environment is virtualized, however we are a VMWare shop (ESXI 6.7). I see this is a Powershell command and MSFT docs state for HyperV. I am currently researching this further from VMWare end as well. This is clearly a PowerShell command, do you happen to know if there is an equivalent for VMWare ESXi?

Disable that VMQ did a lot of improvements on our ends.

I looked into IPS on the firewall and it is currently enabled with Full Scan as the configured option. When I review my policies within the firewall BOVPN in and out traffic is both set to disabled. I will try disabling the feature after hours to see if performance increases.

From outside of the tunnel, when you ping from router to router with DF set, largest packet size you are successful with is 1350 ??

Subtract 64 bytes from that for ipsec and set your tunnel MTU to (1350-64)

It was pre-acquisition, back when they only really had one product. Did a virtual deployment on both sides and some quick PBR rules. Couldn’t have been easier frankly.

Never mind then, yes it’s specific to Hyper-V.

No idea about the VMWare equivalent but probably worth experimenting from a random machine at Site A to a machine at Site B.

Something tells me that a 1350 MTU is not good…I will admit I am just starting to pull back the covers on the innards of networking and holy shit is there a fuck ton to know. Additionally, can you expand on “outside of the tunnel pinging” please? I am suddenly not so sure I am using proper testing…