1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Sporadic external DNS issues on domain machines

Discussion in 'Servers and Enterprise Solutions' started by bledd, 13 Jul 2020.

  1. bledd

    Don

    Joined: 21 Oct 2002

    Posts: 46,688

    Location: Parts Unknown

    Have an odd issue in work where during roughly 9am to 5pm Monday to Friday, we have sporadic DNS lookup issues for external DNS. It will be fine for a couple of minutes, then very patchy for a while etc, causing chaos as you'd imagine.


    One minute, nslookup to bbc.co.uk will work, next minute it won't resolve the IPs.. same for all websites.



    During this time, we don't lose pings to the DNS server or firewall and pings to 8.8.8.8 work perfectly etc.

    DC is 2012 R2 with DNS running on it, same spec for secondary DC / DNS. CPU/ram usage are fine on them. nslookup on any local machine is instantly replied to.

    This wasn't an issue in March, I've been on furlough and came back to this problem. It's my problem to sort out.

    Between 100-200 machines, hundreds of switches.

    Changes since I've been off: Multiple Windows updates, Core switch replaced, new Linux based phone system installed, many desk moves :/

    I want to blame the firewall, but when this issue occurs, DNS on the 'guest' wifi which goes through the same network on a different VLAN works perfectly fine.

    I've set nslookup to test around 10 websites per minute, it logs the failures into a file. Seems to have zero zero issues overnight, but plays up from about 9am (coincides with people arriving at work..). The offices are spread wide over the business, so it's hard to visually pinpoint.

    -

    It's not as full on as a broadcast storm, so it'll be hard to identify immediately if we fix the issue. So far I've tried looking for rogue DHCP servers (in case someone has brought one in for some reason), none found.

    Also tried disconnecting specific switches dotted around the site to identify a problematic area.. to no avail. I have a handful more to try.

    Any pointers on what to try next?

    I'm thinking turn on full firewall logs, full DNS server logs, Wireshark on DNS server. In all my years of working in IT, I've never seen a DNS issue like this before.
     
  2. Armageus

    Don

    Joined: 19 May 2012

    Posts: 11,321

    Location: Spalding, Lincolnshire

    Is internal DNS affected at the same time?

    Have you checked spanning tree on the core switch? (E.g. that it actually is set to the correct priority)
     
  3. bledd

    Don

    Joined: 21 Oct 2002

    Posts: 46,688

    Location: Parts Unknown

    Thanks for the reply :)

    Internal DNS remains perfectly operational (using nslookup to query the DNS servers)

    I'll check SPT tomorrow.
     
  4. Throrik

    Wise Guy

    Joined: 15 Sep 2009

    Posts: 1,503

    Location: Manchester

    Check STP as mentioned, but can you ping from each VLAN on the core switch to external DNS reliably, are there any dropped packet warnings on the switch (may need to enable debugging if you don't syslog it), I would check the same on the Firewall.
     
  5. bledd

    Don

    Joined: 21 Oct 2002

    Posts: 46,688

    Location: Parts Unknown

    Pinging to 1.1.1.1 or 8.8.8.8 doesn't drop a single packet.

    Good time to set up a syslog server too I guess :)
     
  6. Toughnoodle

    Gangster

    Joined: 13 Oct 2009

    Posts: 222

    Location: Cumbria

    You're having problems with the DNS forwarders configured on the DCs running DNS? I would try a process of elimination if possible, e.g. try nslookup on a laptop directly connected to your external network on a public IP, e.g. bypassing your firewall temporarily. It's not recommended to leave it like this, but you could try dropping it down to one forwarder at a time to see if it's a particular problem with just one of the forwarders. Bit of long shot, but I'd make sure all of the DNS related Windows services are running too.