High Performance ATS Container In A Multi-Tenant Enviroment
Note - Below are findings I documented for an internal POC related to ATS and Docker containers. The main focus is launching a high performance ATS container on the same host as other high performance applications. Or applications that require direct access to CPU/Memory/PCI-Devices/etc.
Note that orchestration or boot time configurations are not covered here. I only cover the OS/HOST commands needed to pass resources to the container.
These commands or similar can be incorporated into an orchestration engine of choice
All these commands are run on a Docker host.
Also, the steps below reflect my lab machine, but easily translate to any other hosting environment.
+++++
Objective - To create a cache instance using the Docker container framework and ensuring that all cpu/memory/network resources are NUMA aligned.
Resulting container will have the following resources assigned/pinned.
8 CPU(cores)
32G Memory
Bonded SRIOV VFs within the corresponding NUMA node.
18G ramdisk per container to server as cache disk
Host information:
HOST OS - Ubuntu 16.04.3 LTS (Xenial Xerus)
Docker Version - Docker CE 17.x
Steps:
Install docker network plugin.
https://github.com/Mellanox/docker-passthrough-plugin
Start Docker plugin to support NIC passthrough:
docker run -v /run/docker/plugins: /run/docker/plugins --net=host --privileged mellanox/passthrough-plugin
This plugin also supports carving out SRIOV VFs on the fly and assigning the VFs to the container.
For this document I will use the plugin to passthrough a HOST created bond interface to each container . Each container will have its own bond interface.
NUMA 0 Steps
(modify the below system paths to reflect your host architecture)
Note that orchestration or boot time configurations are not covered here. I only cover the OS/HOST commands needed to pass resources to the container.
These commands or similar can be incorporated into an orchestration engine of choice
All these commands are run on a Docker host.
Also, the steps below reflect my lab machine, but easily translate to any other hosting environment.
+++++
Objective - To create a cache instance using the Docker container framework and ensuring that all cpu/memory/network resources are NUMA aligned.
Resulting container will have the following resources assigned/pinned.
8 CPU(cores)
32G Memory
Bonded SRIOV VFs within the corresponding NUMA node.
18G ramdisk per container to server as cache disk
Host information:
HOST OS - Ubuntu 16.04.3 LTS (Xenial Xerus)
Docker Version - Docker CE 17.x
Steps:
Install docker network plugin.
https://github.com/Mellanox/docker-passthrough-plugin
Start Docker plugin to support NIC passthrough:
docker run -v /run/docker/plugins: /run/docker/plugins --net=host --privileged mellanox/passthrough-plugin
This plugin also supports carving out SRIOV VFs on the fly and assigning the VFs to the container.
For this document I will use the plugin to passthrough a HOST created bond interface to each container . Each container will have its own bond interface.
NUMA 0 Steps
(modify the below system paths to reflect your host architecture)
# enable VFs on numa0 PFs port 0 and 1.. I used 8 VFs per PF port.. but any supported value can be used here
echo
8 >
'/sys/devices/pci0000:00/0000:00:02.0/0000:03:00.0/sriov_numvfs'
echo
8 >
'/sys/devices/pci0000:00/0000:00:02.0/0000:03:00.1/sriov_numvfs'
# create bond interface for the eventual container
ip link add dev bndn0c0
type
bond
ip link
set
down dev bndn0c0
# set bond mode and hashing policy
echo
2 >
/sys/devices/virtual/net/bndn0c0/bonding/mode
echo
'layer2+3'
>
/sys/devices/virtual/net/bndn0c0/bonding/xmit_hash_policy
# select VFs to form the bond.. set spoofchk off and set MACs to be the same since we are using link aggregation.. I used VF 3.
# set the VFs down. needed to add the VFs to the bond
ip link
set
enp3s0f0 vf 3 spoofchk off mac 32:fe:d8:7a:83:93
ip link
set
enp3s0f1 vf 3 spoofchk off mac 32:fe:d8:7a:83:93
ip link
set
down enp3s16f6
ip link
set
down enp3s16f7
ip link
set
enp3s16f6 master bndn0c0
ip link
set
enp3s16f7 master bndn0c0
# add bond interface to the vlan of choice. in this case 1400.
vconfig add bndn0c0 1400
# bring the bond interfaces up
ip link
set
bndn0c0 up
ip link
set
bndn0c0.1400 up
# create a cgroup for CPU and memory pinning
# this will reserve the cpu and memory for this cgroup. multiple containers
# can use this cgroup. however we will only assign this cgroup to a single container
cgcreate -g cpuset:ats-numa0-cnt0-cgroup
echo
32,34,36,38,40,42,44,46 >
/sys/fs/cgroup/cpuset/ats-numa0-cnt0/cpuset
.cpus
echo
0 >
/sys/fs/cgroup/cpuseprivilegdedt/ats-numa0-cnt0/cpuset
.mems
echo
1 >
/sys/fs/cgroup/cpuset/ats-numa0-cnt0/cpuset
.mem_hardwall
echo
32G >
/sys/fs/cgroup/memory/ats-numa0-cnt0/memory
.limit_in_bytes
# create docker network, which maps to the just created bond interface on vlan1400.
docker network create -d passthrough --ip-range 192.168.1.228
/30
--gateway=192.168.1.193 --subnet=192.168.1.192
/26
-o netdevice=bndn0c0.1400 -o mode=passthrough bndn0c0-1400
#
create docker container which attaches to the above docker network. i started the container in privileged mode(being lazy). we can set
# permissions which limit access accordingly. we would not launch a production # container in priviledged mode.
docker run --dns=192.168.1.104 --ip=192.168.1.229 --privileged --
rm
--cgroup-parent=
/ats-numa0-cnt0-cgroup/
--name=ats-numa0-cnt0 --net=bndn0c0-1416 --device=
/dev/ram0
:
/dev/xram0
-it 829dabfd1184
/bin/bash
# once the container is started, it can contact a chef server or some configuration manager to pull configurations.
# i built this container image manually and set static values within the image. coniguration management is no different
# than a VM or bare metal.
Comments
Post a Comment