Zero trust is a mindset

ansible Jul 17, 2022

The title is almost speaking for itself :) Lately there have been many articles about creating a zero trust environment and many companies are jumping in with their solutions as well. But before spending a lot of money on a product, you'd have to understand that many of these products only perform or cover a few tasks. It's not a one-size fits-all solution.

From my own perspective, when i hear 'zero-trust' i get this feeling of servers that sit next to each other in the same network and should only be able to communicate with each other on specific ports for specific applications. And even then, the application should also only trust the machine if it identifies itself correctly.

The idea that is being conveyed here is that 'zero-trust' should span every aspect of the OSI-model. Trust your network, trust your links, trust your sessions etcetera, not just "is this user allowed to log in to this device".

Challenge

Today we will cover a small challenge that arose a few months ago. We have groups of machines that have three networks for this example:

Management network (192.168.1.0/24)
Core network (192.168.2.0/24)
Internal network (192.168.56.0/24)
Vagrant network (10.0.2.0/24) bonus network from my test environment.

Set challenges are:

Management can only be connected to from 192.168.1.148 (ansible machine)
Core port 80 should be available (to test this from another network would require some form of routing for testing purposes)
Internal should allow its own network.
Vagrant should allow its own network.

Ansible

In this particular case ansible and iptables will be used. Admittedly, firewalld can be used to achieve the same (which will be covered in a later post).

Defining the testing environment, two vagrant machines will be used:

Repo1 (192.168.[1,2,56].100)
Node1 (192.168.[1,2,56].101)

The text below will use a few 'best practices' that were picked up along the way. In a to be written blog post this 'best practice' will be explained. It is however based on the method described by Yevgeniy Brikman in his book 'Terraform Up & Running'.

Inventory

In the inventory/inventory file add the group and hosts:

[vagrant]
repo1 ansible_host=192.168.1.100 ansible_user=ansible
node1 ansible_host=192.168.1.101 ansible_user=ansible

Create the group_vars file inventory/group_vars/vagrant add:

---
firewall_shield: true

management_rules:
  - '-A MANAGEMENT_NETWORK -p tcp --dport 22 -s 192.168.1.148 -j ACCEPT'
  - '-A MANAGEMENT_NETWORK -p tcp --dport 22 -j DROP'
  - '-A MANAGEMENT_NETWORK -p tcp --dport 123 -j ACCEPT'
  - '-A MANAGEMENT_NETWORK -p udp --dport 123 -j ACCEPT'

core_rules:
  - '-A CORE_NETWORK -p tcp --dport 80 -j ACCEPT'
  - '-A CORE_NETWORK -m pkttype --pkt-type multicast -m limit --limit=10/m -j LOG --log-prefix "core: "'
  - '-A CORE_NETWORK -m pkttype --pkt-type multicast -s 192.168.2.0/24 -j ACCEPT'
  - '-A CORE_NETWORK -p udp -d 224.0.0.0/4 -j DROP'

internal_rules:
  - '-A INTERNAL_NETWORK -p tcp --dport 5666 -j ACCEPT'

vagrant_rules:
  - '-A VAGRANT_NETWORK -p tcp --dport 22 -j ACCEPT'

A few more rules have been added, for fun really.

Define the networks as variables in inventory/group_vars/all/firewall

---
management_network: 192.168.1.0/24
core_network: 192.168.2.0/24
internal_network: 192.168.56.0/24
vagrant_network: 10.0.2.0/24

We will call on these variables later, to find the machines interfaces.

Roles

Use Ansible Galaxy create a role:

ansible-galaxy init --offline roles/staging/firewall-shield

This will create default files.

Edit roles/staging/firewall-shield/tasks/main.yml

---
# tasks file for roles/staging/firewall-shield

- block:
  - name: Install required packages
    yum:
      name: "{{ item }}"
      state: present
    loop:
      - python3-firewall
      - iptables
      - iptables-libs
      - iptables-services

  - name: Were using iptables
    service:
      name: firewalld
      state: stopped
      enabled: no

  - name: Create (missing) directories
    file:
      path: "{{ item }}"
      owner: root
      group: root
      mode: 0755
      state: directory
    with_items:
      - /etc/rsyslog.d

  - name: Set interfaces facts
    template:
      src: templates/interfaces.j2
      dest: /etc/ansible/facts.d/interfaces.fact
      owner: root
      group: root
      mode: 0644

  - name: Run setup
    setup:
      gather_subset: all

  - name: Set interfaces_keys
    set_fact:
      interfaces_keys: "{{ hostvars[inventory_hostname]['ansible_local']['interfaces']['interfaces']|list }}"

  - name: Place rsyslog rule
    template:
      src: templates/iptables-rsyslog.j2
      dest: /etc/rsyslog.d/11-iptables.conf
      owner: root
      group: root
      mode: 0644
    notify: restart rsyslog
    
  - name: Iptables restore file
    template:
      src: "templates/{{ item }}.j2"
      dest: "/etc/sysconfig/{{ item }}"
      owner: root
      group: root
      mode: 0600
    with_items:
      - iptables
      - ip6tables
    notify: reload iptables

  when:
    - firewall_shield | default('false') | bool

This is already a wall of text, essentially the following happens:

Required packages are being installed.
Firewalld service is stopped and disabled.
Missing directory is created.
A facts file is placed (!) this will hold interface names based on networks that they are in. This requires the facts to be reloaded.
A variable (list) interface_keys is created (!) so that it can be itterated over.
Rsyslog rule is placed in the created directory /etc/rsyslog.d.
Iptables "restore" files are placed using a template (!) it is a sneaky method :)
A when rule. Essentially, firewall_shield has to be true, otherwise do nothing. ( inventory/group_vars/vagrant )

Some notify rules were issued, edit the file roles/staging/firewall-shield/handlers/main.yml

---
# handlers file for roles/staging/firewall-shield

- name: reload iptables
  service:
    name: "{{ item }}"
    state: restarted
    enabled: yes
  loop:
    - iptables
    - ip6tables

- name: restart rsyslog
  service:
    name: rsyslog
    state: restarted
    enabled: yes

Templates

there were 4 template files defined ( this is where the magick happens ):

Edit roles/staging/firewall-shield/templates/interfaces.j2

[interfaces]
{% for inf in ansible_interfaces %}
{% if hostvars[inventory_hostname]['ansible_' + inf]['ipv4'] is defined %}
{% if hostvars[inventory_hostname]['ansible_' + inf]['ipv4']['address']|ipaddr(management_network) %}
management_interface = {{ inf }}
management_network = {{ management_network }}
{% endif %}
{% if hostvars[inventory_hostname]['ansible_' + inf]['ipv4']['address']|ipaddr(core_network) %}
core_interface = {{ inf }}
core_network = {{ core_network }}
{% endif %}
{% if hostvars[inventory_hostname]['ansible_' + inf]['ipv4']['address']|ipaddr(internal_network) %}
internal_interface = {{ inf }}
internal_network = {{ internal_network }}
{% endif %}
{% if hostvars[inventory_hostname]['ansible_' + inf]['ipv4']['address']|ipaddr(vagrant_network) %}
vagrant_interface = {{ inf }}
vagrant_network = {{ vagrant_network }}
{% endif %}
{% endif %}
{% endfor %}

Looping over the interfaces_keys an ansible facts file is created with the interface name and network defined.

To view in a 'node', this is the content of /etc/ansible/facts.d/interfaces.fact

[interfaces]
management_interface = eth1
management_network = 192.168.1.0/24
core_interface = eth2
core_network = 192.168.2.0/24
internal_interface = eth3
internal_network = 192.168.56.0/24
vagrant_interface = eth0
vagrant_network = 10.0.2.0/24

The contents will be used for the 'iptables' rules.

Edit roles/staging/firewall-shield/templates/iptables.j2 This is our firewall magick file, this is where it all comes together:

# {{ ansible_managed }}
*filter
:INPUT DROP [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
{% for item in interfaces_keys %}
{% if 'network' in item %}
:{{ item|upper }} - [0:0]
{% endif %}
{% endfor %}
-A INPUT -i lo -j ACCEPT
-A OUTPUT -o lo -j ACCEPT
{% for item in interfaces_keys %}
{% if 'network' in item %}
{% set nwdsg = item.split('_')[0] %}
-A INPUT -i {{ hostvars[inventory_hostname]['ansible_local']['interfaces']['interfaces'][nwdsg + '_interface'] }} -j {{ item|upper }}
-A {{ item|upper }} -m state --state ESTABLISHED,RELATED -j ACCEPT
-A {{ item|upper }} -p icmp -j ACCEPT
{% if hostvars[inventory_hostname][nwdsg + '_rules'] is defined %}
{% for line in hostvars[inventory_hostname][nwdsg +'_rules'] %}
{{ line }}
{% endfor %}
{% endif %}
-A {{ item|upper }} -s {{ hostvars[inventory_hostname]['ansible_local']['interfaces']['interfaces'][nwdsg + '_network'] }} -j ACCEPT
-A {{ item|upper }} -j LOG --log-prefix "{{ nwdsg }}: "
-A {{ item|upper }} -j DROP
{% endif %}
{% endfor %}
COMMIT
# Completed on {{ ansible_date_time.iso8601 }}

This is placed as the 'iptables restore' file and a reload command is then given through the handlers.

Edit roles/staging/firewall-shield/templates/ip6tables.j2 a much simpler template:

# {{ ansible_managed }}
# Not sure what to do with this - dont use ip6, but cant trust it if it is just on
*filter
:INPUT DROP [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p ipv6-icmp -j ACCEPT
-A INPUT -i lo -j ACCEPT
#-A INPUT -p tcp -m state --state NEW -m tcp --dport 22 -j ACCEPT
-A INPUT -d fe80::/64 -p udp -m udp --dport 546 -m state --state NEW -j ACCEPT
-A INPUT -j REJECT --reject-with icmp6-adm-prohibited
-A FORWARD -j REJECT --reject-with icmp6-adm-prohibited
COMMIT
# Completed on {{ ansible_date_time.iso8601 }}

And the last template roles/staging/firewall-shield/templates/iptables-rsyslog.j2 for logging:

# For iptables logging
{% for item in interfaces_keys %}
{% if 'network' in item %}
{% set nwdsg = item.split('_')[0] %}
:msg, contains, "{{ nwdsg }}: " -/var/log/iptables-{{ nwdsg }}.log
{% endif %}
{% endfor %}
stop

Lastly edit roles/staging/firewall-shield/tests/test.yml

---
- hosts: vagrant
  remote_user: ansible
  gather_facts: yes
  roles:
    - roles/staging/firewall-shield

So this is all for Ansible. Time to put it to the test.

Ansible test run

To run this role as a playbook:

ansible-playbook roles/staging/firewall-shield/tests/test.yml

PLAY RECAP **********************************************
node1                      : ok=15   changed=5    unreachable=0    failed=0    skipped=8    rescued=0    ignored=0
repo1                      : ok=15   changed=5    unreachable=0    failed=0    skipped=8    rescued=0    ignored=0

Verification

Now all that is left is to verify if the 'nodes' behave as we would expect.

From a different machine in the 'management network' an ssh session should not be allowed:

riccardo@docker1 ~ $ ip add |grep 192
2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    inet 192.168.1.43/24 brd 192.168.1.255 scope global noprefixroute ens192

riccardo@docker1 ~ $ ssh -o ConnectTimeout=3 192.168.1.101
ssh: connect to host 192.168.1.101 port 22: Connection timed out

riccardo@docker1 ~ $ ssh -o ConnectTimeout=3 192.168.1.100
ssh: connect to host 192.168.1.100 port 22: Connection timed out

From the ansible management machine this same connection should be allowed:

(venv) ansible@amane ~/ansible $ ip addr | grep 192 ; ssh -o ConnectTimeout=3 192.168.1.100 'uptime'
2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    inet 192.168.1.148/24 brd 192.168.1.255 scope global noprefixroute ens192

 08:12:50 up  1:42,  1 user,  load average: 0.00, 0.00, 0.00

(venv) ansible@amane ~/ansible $ ip addr | grep 192 ; ssh -o ConnectTimeout=3 192.168.1.101 'uptime'
2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    inet 192.168.1.148/24 brd 192.168.1.255 scope global noprefixroute ens192

 08:12:59 up  1:36,  1 user,  load average: 0.00, 0.02, 0.00

Port 80 from everywhere should be allowed, this is a difficult one to test but lets add a route.

Core Pre-test:

# on the node machine
[root@node-1 vagrant]# nc -v -l 192.168.2.101 80
Ncat: Version 7.70 ( https://nmap.org/ncat )
Ncat: Listening on 192.168.2.101:80

riccardo@docker1 ~ $ nc -v 192.168.2.101 80
192.168.2.101: inverse host lookup failed:
(UNKNOWN) [192.168.2.101] 80 (http) : Connection timed out

Obviously this happened because there is no way back.

Core Post-test:

Let's add a route back on the node. For testing purposes :)

[root@node-1 vagrant]# ip route show
default via 10.0.2.2 dev eth0 proto dhcp metric 100
10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.15 metric 100
192.168.1.0/24 dev eth1 proto kernel scope link src 192.168.1.101 metric 101
192.168.2.0/24 dev eth2 proto kernel scope link src 192.168.2.101 metric 102
192.168.56.0/24 dev eth3 proto kernel scope link src 192.168.56.101 metric 103

[root@node-1 vagrant]# ip route add 192.168.1.43/32 via 192.168.2.1 dev eth2

[root@node-1 vagrant]# ip route show
default via 10.0.2.2 dev eth0 proto dhcp metric 100
10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.15 metric 100
192.168.1.0/24 dev eth1 proto kernel scope link src 192.168.1.101 metric 101
192.168.1.43 via 192.168.2.1 dev eth2
192.168.2.0/24 dev eth2 proto kernel scope link src 192.168.2.101 metric 102
192.168.56.0/24 dev eth3 proto kernel scope link src 192.168.56.101 metric 103

The route is added, nc is listening.

riccardo@docker1 ~ $ nc -v 192.168.2.101 80
192.168.2.101: inverse host lookup failed:
(UNKNOWN) [192.168.2.101] 80 (http) open
blah # text i entered

[root@node-1 vagrant]# nc -v -l 192.168.2.101 80
Ncat: Version 7.70 ( https://nmap.org/ncat )
Ncat: Listening on 192.168.2.101:80
Ncat: Connection from 192.168.1.43.
Ncat: Connection from 192.168.1.43:49632.
blah # text received

Internal network

Lastly lets check out the internal network, the last three lines from iptables -vnL say:

Chain INTERNAL_NETWORK (1 references)
 pkts bytes target     prot opt in     out     source               destination
    0     0 ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            state RELATED,ESTABLISHED
    0     0 ACCEPT     icmp --  *      *       0.0.0.0/0            0.0.0.0/0
    0     0 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:5666
    0     0 ACCEPT     all  --  *      *       192.168.56.0/24      0.0.0.0/0
    0     0 LOG        all  --  *      *       0.0.0.0/0            0.0.0.0/0            LOG flags 0 level 4 prefix "internal: "
    0     0 DROP       all  --  *      *       0.0.0.0/0            0.0.0.0/0

Accept everything from 192.168.56.0/24 , followed by drop everything.
Related, established ensure the node can make its own connections out and keep open sessions alive. (i might interpret this wrong)
Port 5666 is for nrpe monitoring.

Fin

Thats it, now for each group/cluster of machines you can apply these rules in an automated way, creating a nice 'shield' around your machines instead of relying on the firewalling done on a network level.

"Castle-and-moat" is a network security model in which no one outside the network is able to access data on the inside, but everyone inside the network can.