linux networking question (reformatted for readability)
(i appologize for the previous message that i completely unreadable.) Dear W-LUGgers, I had some strange network related behavior that left me baffled and i wanted to hear some theories from more experienced network admins. The network layout is the following. The network is pretty simple, external traffic passes through a bridged-firewall (OpenBSD, my choice) into an OS X server (not my choice) which handles NAT/DHCP/DNS et al., and all the office computers are dhcp clients to the OS X server. The trouble machines are linux (ubuntu) desktop clients. The scenario is the following. I was doing work on the bridged firewall (OpenBSD) and somehow caused it to kernel panic, oopsie! So the bridged-firewall went down thus no-one had internet access, Dou! i quickly bounced the machine. It came back on-line and clients were able to access the internet, huzzah! But, the strange behavior was in two linux machines within the network that were not able to access some external IPs. My linux desktop could not access 8.8.8.8, when pinging, i could see an arp who-has request originating from my machine, i could see the packet come into the OS X server. But the request always went unanswered. The same behavior happened to another linux desktop but with 192.48.178.134 (sgi.com). My desktop could ping sgi.com. So each linux desktop had *different* unreachable IPs. The rest of the internet was reachable. I tried clearing the arp-cache on the OS X server, then clearing the NAT state tables, then I rebooted the OS X server, none solved the problem. I finally renewed the dhcp lease on my linux desktop machine and that allowed the ping to complete. What would case this behavior? Could it be stale arp-cache on the linux machine? ( I *should* have tried to clear the arp-cache on the linux machine before i renewed the dhcp lease, but i didn’t think of that until after the fact.) The linux machine was sending out arp who-has requests so would a stale cache even matter? Why would no one answer the arp requests? I am not an expert network admin (its just a side job since the company is only 15 people). I don’t expect to get a resolution but i’m interested to hear any theories. Thanks and cheers, — brad PS. As a worcester transplant to boston, i am really jealous i don’t live closer to attend meetings. The topics of late sound outstanding.
This is just a rough guess, but did you ever check on the state tables in your openbsd firewall? In your setup you have state distributed amongst your clients, your firewall, and you NAT gateway, and unless all three are in sync you can easily get the kind of possessed network behavior you describe. On February 7, 2015 3:20:15 PM EST, Brad <bkn@ithryn.net> wrote:
(i appologize for the previous message that i completely unreadable.)
Dear W-LUGgers,
I had some strange network related behavior that left me baffled and i wanted to hear some theories from more experienced network admins.
The network layout is the following. The network is pretty simple, external traffic passes through a bridged-firewall (OpenBSD, my choice) into an OS X server (not my choice) which handles NAT/DHCP/DNS et al., and all the office computers are dhcp clients to the OS X server. The trouble machines are linux (ubuntu) desktop clients.
The scenario is the following. I was doing work on the bridged firewall (OpenBSD) and somehow caused it to kernel panic, oopsie! So the bridged-firewall went down thus no-one had internet access, Dou! i quickly bounced the machine. It came back on-line and clients were able to access the internet, huzzah! But, the strange behavior was in two linux machines within the network that were not able to access some external IPs. My linux desktop could not access 8.8.8.8, when pinging, i could see an arp who-has request originating from my machine, i could see the packet come into the OS X server. But the request always went unanswered. The same behavior happened to another linux desktop but with 192.48.178.134 (sgi.com). My desktop could ping sgi.com. So each linux desktop had *different* unreachable IPs. The rest of the internet was reachable. I tried clearing the arp-cache on the OS X server, then clearing the NAT state tables, then I rebooted the OS X server, none solved the problem. I finally renewed the dhcp lease on my linux desktop machine and that allowed the ping to complete.
What would case this behavior? Could it be stale arp-cache on the linux machine? ( I *should* have tried to clear the arp-cache on the linux machine before i renewed the dhcp lease, but i didn’t think of that until after the fact.) The linux machine was sending out arp who-has requests so would a stale cache even matter? Why would no one answer the arp requests? I am not an expert network admin (its just a side job since the company is only 15 people). I don’t expect to get a resolution but i’m interested to hear any theories.
Thanks and cheers, — brad
PS. As a worcester transplant to boston, i am really jealous i don’t live closer to attend meetings. The topics of late sound outstanding.
_______________________________________________ Wlug mailing list Wlug@mail.wlug.org http://mail.wlug.org/mailman/listinfo/wlug
-- Sent from my Android device with K-9 Mail. Please excuse my brevity.
On Sat, Feb 07, 2015 at 04:40:36PM -0500, Frank Sweetser wrote:
This is just a rough guess, but did you ever check on the state tables in your openbsd firewall?
I did not check the state tables in the OpenBSD firewall. I figured since it had just been forcefully reboot the state tables were coherent.
In your setup you have state distributed amongst your clients, your firewall, and you NAT gateway, and unless all three are in sync you can easily get the kind of possessed network behavior you describe.
To properly exorcize should i pour holy-water over the routers? I always figured some king of black magic was involved in routing. Is a bridged-firewall not a good idea in practice because it adds another layer between the internat and the intranet, and could get out of sync? Thanks for the ideas, - brad
On February 7, 2015 3:20:15 PM EST, Brad <bkn@ithryn.net> wrote:
(i appologize for the previous message that i completely unreadable.)
Dear W-LUGgers,
I had some strange network related behavior that left me baffled and i wanted to hear some theories from more experienced network admins.
The network layout is the following. The network is pretty simple, external traffic passes through a bridged-firewall (OpenBSD, my choice) into an OS X server (not my choice) which handles NAT/DHCP/DNS et al., and all the office computers are dhcp clients to the OS X server. The trouble machines are linux (ubuntu) desktop clients.
The scenario is the following. I was doing work on the bridged firewall (OpenBSD) and somehow caused it to kernel panic, oopsie! So the bridged-firewall went down thus no-one had internet access, Dou! i quickly bounced the machine. It came back on-line and clients were able to access the internet, huzzah! But, the strange behavior was in two linux machines within the network that were not able to access some external IPs. My linux desktop could not access 8.8.8.8, when pinging, i could see an arp who-has request originating from my machine, i could see the packet come into the OS X server. But the request always went unanswered. The same behavior happened to another linux desktop but with 192.48.178.134 (sgi.com). My desktop could ping sgi.com. So each linux desktop had *different* unreachable IPs. The rest of the internet was reachable. I tried clearing the arp-cache on the OS X server, then clearing the NAT state tables, then I rebooted the OS X server, none solved the problem. I finally renewed the dhcp lease on my linux desktop machine and that allowed the ping to complete.
What would case this behavior? Could it be stale arp-cache on the linux machine? ( I *should* have tried to clear the arp-cache on the linux machine before i renewed the dhcp lease, but i didn’t think of that until after the fact.) The linux machine was sending out arp who-has requests so would a stale cache even matter? Why would no one answer the arp requests? I am not an expert network admin (its just a side job since the company is only 15 people). I don’t expect to get a resolution but i’m interested to hear any theories.
Thanks and cheers, — brad
PS. As a worcester transplant to boston, i am really jealous i don’t live closer to attend meetings. The topics of late sound outstanding.
_______________________________________________ Wlug mailing list Wlug@mail.wlug.org http://mail.wlug.org/mailman/listinfo/wlug
-- Sent from my Android device with K-9 Mail. Please excuse my brevity.
On 2/7/2015 5:14 PM, Brad wrote:
On Sat, Feb 07, 2015 at 04:40:36PM -0500, Frank Sweetser wrote:
This is just a rough guess, but did you ever check on the state tables in your openbsd firewall?
I did not check the state tables in the OpenBSD firewall. I figured since it had just been forcefully reboot the state tables were coherent.
It's not just a simple question of the firewall state table being internally consistent, it also has to be consistent with both the NAT gateway and what the clients are expecting. When one of your internal clients makes an outgoing connection, both the firewall and NAT tables have to have entries added to accommodate the return traffic - a translation rule in the NAT table, and an allow rule in the firewall. If either of those aren't correct, the return traffic won't make it back to the client.
In your setup you have state distributed amongst your clients, your firewall, and you NAT gateway, and unless all three are in sync you can easily get the kind of possessed network behavior you describe.
To properly exorcize should i pour holy-water over the routers? I always figured some king of black magic was involved in routing.
Simple routing, not so much - it's just packet go in, packet go out. It's when you throw state tables in that things start to get fragile. When you have strings of disconnected state tables that require consistency like this, it's not uncommon that you end up having to always reboot them together.
Is a bridged-firewall not a good idea in practice because it adds another layer between the internat and the intranet, and could get out of sync?
The bridged portion itself isn't inherently much worse than doing a traditional layer 3 visible firewall, so long as you have your rules set up correctly to account for things like passing ARP traffic. Your typical all-in-one box is a little less likely to display this kind of weirdness only because they're combined on a single box, so a reboot will clear both out simultaneously. I would seriously suggest you check out pfSense. It's a FreeBSD based distribution designed to operate as a SOHO class firewall, with all of the usual NAT, DHCP, and other expected goodies. I think it's a pretty safe bet that it'll be a better router and firewall than OSX... -- Frank Sweetser fs at wpi.edu | For every problem, there is a solution that Manager of Network Operations | is simple, elegant, and wrong. Worcester Polytechnic Institute | - HL Mencken
participants (3)
-
Brad
-
Brad
-
Frank Sweetser