Remote sites accessing Azure via S2S at one site.

Associate
Joined
28 Jan 2005
Posts
1,698
Location
Southport
Hi all,

Im looking for some advice on this as testing a Azure setup for a customer is giving us some difficulties. The customer has 12 sites that will all need a S2S (Site to site) VPN into the Azure setup. The azure setup is a little limited as its through a 365 CSP which do the setup for us. The VPN in Azure has defined a local subnet of 192.168.0.0/16 to the HO draytek meaning all traffic trying to get out from Azure to a 192.168.x.x IP will be routed back to the HO draytek and this covers all the ranges of the other remote sites.

So am I doing something that's unachievable or need tweaking.....

So the head office (HO) has a draytek 3910 router that is used for the client PCs there around 10 PCs and 10 VoIP phones (Cloud hosted so traffic is internet bound). Staff and guest wifi (Unifi APs) also run through this and is normally around 50 clients connected. They also have a 100Mbps leased line. The actual S2S VPN into Azure is configured on this router. So this router is NATing for the clients PCs and routing to Azure.

Now the other 11 offices all have a S2S VPN from Draytek 2862 routers into this HO draytek 3910 and each have the Azure subnet defined as an additional remote network. Each site has a maximum of 10 devices connected to the domain.

This all connects up fine, VPNs dial and there is routing between Azure and all the sites. All sites can ping the servers in Azure no problem, great......

So this week we setup conditional DNS forwarding on the routers to push any domain traffic (domain controller moving to Azure) across from each site to Azure thus through the HO router. Although this will be minimum traffic and basically just requests to and from AD the HO users reported major issues with their phones on internet going slow. After investigating it a little more it would appear that router at HO is struggling, pings to google spike up to like 300-400ms and strangely pings locally to the HO router spike really high too to around 300ms for brief periods. Reboot the router and its ok again for a while and gradually gets worse.

Questions I have really are:

- Is there too many VPNs and thus too much in the routing table on the HO router so traffic is getting lost sometimes even if this is local traffic (ping from local device to router). There is a maximum of 15 entries in that table though which doesn't seem high, graded most of over the WAN / VPNs and the 3910 specs state it can handle 500 simultaneous VPNs with a 3Gbps throughput.

- Is it better to have another router dedicated to the HO clients internet access / NATing and that set a route back to the 3910 for the Azure network and use this router for routing like a VPN "hub"?

- Should I be using a different routing protocol (OSPF or BGP) rather than just static routes on the 3910? (Created automatically when S2S VPNs to remote offices where created).

Sorry for the long post but im tearing my hair out with this because im my opinion (which could be wrong) this shouldn't really be a problem unless there is a whole different issue at HO and this extra work for the router is amplifying the issue?

Thanks in advance.
 
Caporegime
Joined
18 Oct 2002
Posts
26,053
Having 11 sites reliant on a single router and Internet connection at head office to access Azure resources seems like a flawed design. You can have up to 10 tunnels on each gateway without paying any more for it, then after that point each additional tunnel is going to cost you around £9 per month - so start pointing remote offices directly to Azure where possible.

You're also much better off trying to configure this as routed tunnels rather than as policy-based VPN, but I don't know if DrayTek equipment is up to this task. Honestly I'd also look at Azure Virtual WAN for what you want to do as you can then start doing things like transit routing across your WAN rather than having to send everything to one site.

If the router is falling over because of (what sounds like) a load of tiny DNS packets in VPN tunnels eating the CPU up, adding a routing protocol isn't going to fix anything.

DrayTek devices have a reputation for being good and frankly I've no idea where that has come from - they aren't great in terms of features and don't perform brilliantly for what they cost. I think MSPs love them because they've been using them forever and they have a central management service, and that in turn ends up with people thinking they're a cut above. I'd guess that a Netgate SG-5100 would outperform the 3910.
 
Associate
OP
Joined
28 Jan 2005
Posts
1,698
Location
Southport
Having 11 sites reliant on a single router and Internet connection at head office to access Azure resources seems like a flawed design. You can have up to 10 tunnels on each gateway without paying any more for it, then after that point each additional tunnel is going to cost you around £9 per month - so start pointing remote offices directly to Azure where possible.

You're also much better off trying to configure this as routed tunnels rather than as policy-based VPN, but I don't know if DrayTek equipment is up to this task. Honestly I'd also look at Azure Virtual WAN for what you want to do as you can then start doing things like transit routing across your WAN rather than having to send everything to one site.

If the router is falling over because of (what sounds like) a load of tiny DNS packets in VPN tunnels eating the CPU up, adding a routing protocol isn't going to fix anything.

DrayTek devices have a reputation for being good and frankly I've no idea where that has come from - they aren't great in terms of features and don't perform brilliantly for what they cost. I think MSPs love them because they've been using them forever and they have a central management service, and that in turn ends up with people thinking they're a cut above. I'd guess that a Netgate SG-5100 would outperform the 3910.

Thanks for your input and help. The single router as a single point of failure is being addressed by a second leased line being installed and originally a 3910 in a HA paid but this might change depending on the outcome. The extra tunnels might be a problem as the package we are getting from the CSP is at a fixed monthly cost which apparently only allows one VPN per "customer" so I guess we cant add and additional gateways like you could in full blown azure, other wise point direct would be the perfect answer.

I did presume it was CPU load hanging up the router but it gives a "real time" cpu usage on the GUI of the draytek and despite the odd spike to around 75% on the primary CPU it stays fairly low but this may not be a true reflection.
 
Back
Top Bottom