Understanding OpenShift Networking: Router vs Route

1. Why Does OpenShift Need Router and Route?

As the name suggests, a Router is a routing device, and a Route is a route configured within that router. These two concepts in OpenShift are designed to address the need for accessing services from outside the cluster (i.e., from locations other than the cluster nodes). I don’t understand why OpenShift changed Kubernetes’ Ingress to Router; I think the name Ingress is more appropriate.

A simple schematic diagram of accessing applications in pods from the outside via the router and from the inside via the service is shown below:

In the above diagram, three pods of an application are located on node1, node2, and node3. OpenShift has three layers of IP address concepts:

The pod’s own IP address, which can be likened to a fixed IP of a virtual machine in OpenStack. It is only meaningful within the cluster.
The service’s IP address. Services typically have a ClusterIP, which is also a type of internal cluster IP address.
The application’s external IP address, which can be likened to a floating IP in OpenStack, or an IDC IP (with a NAT mapping relationship between the floating IP).

Therefore, to access applications in pods from outside the cluster, there are essentially two methods:

One is to use a proxy to convert the external IP address to the backend Pod IP address. This is the idea behind OpenShift’s router/route. The router service in OpenShift is a cluster service that runs on specific nodes (usually infrastructure nodes) and is created and managed by the cluster administrator. It can have multiple replicas (pods). The router can have multiple routes, each of which can find its backend pod list via the domain name of the external HTTP request and forward network packets. This essentially exposes the applications in the pods to external domain names, allowing users to access the applications via domain names. This is essentially a layer 7 load balancer. OpenShift uses HAProxy by default to implement this, but it also supports other implementations, such as F5.
The other is to expose the service directly to the outside of the cluster. This method will be explained in detail in the article on ‘Service’.

2. How Does OpenShift Use HAProxy to Implement Router and Route?

2.1 Router Deployment

When deploying an OpenShift cluster using ansible with default configurations, a HAProxy pod will run on the Infra node using Host networking mode, listening on ports 80 and 443 on all network interfaces.

[root@infra-node3 cloud-user]# netstat -lntp | grep haproxy
tcp        0      0 127.0.0.1:10443         0.0.0.0:*               LISTEN      583/haproxy         
tcp        0      0 127.0.0.1:10444         0.0.0.0:*               LISTEN      583/haproxy         
tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN      583/haproxy         
tcp        0      0 0.0.0.0:443             0.0.0.0:*               LISTEN      583/haproxy

Among these, ports 10443 and 10444 on 172.0.0.1 are used by HAProxy itself. Further explanations will follow.

Therefore, on each infra node, there can only be one HAProxy pod, as these ports can only be occupied once. If the scheduler cannot find a suitable node, the routing service scheduling will fail:

0/7 nodes are available: 2 node(s) didn't have free ports for the requested pod ports, 5 node(s) didn't match node selector

The OpenShift HAProxy Router supports two deployment methods:

One is the common single Router service deployment, which has one or more instances (pods) distributed across multiple nodes, responsible for external access to services deployed across the entire cluster.
The other is sharding deployment. In this case, there will be multiple Router services, each responsible for a specified number of projects, with a mapping between them using labels. This was proposed as a solution to the performance issues of a single Router.

OpenShift provides the oc adm router command to create router services.

Creating a router:

[root@master1 cloud-user]# oc adm router router2 --replicas=1 --service-account=routerinfo: password for stats user admin has been set to J3YyPjlbqf--> Creating router router2 ...
    warning: serviceaccounts "router" already exists
    clusterrolebinding.authorization.openshift.io "router-router2-role" created
    deploymentconfig.apps.openshift.io "router2" created
    service "router2" created--> Success

For detailed deployment methods, please refer to the official documentation

https://docs.openshift.com/container-platform/3.11/install_config/router/default_haproxy_router.html

2.2 HAProxy Process in Router Pod

Within each pod of the Router service, the openshift-router process starts a haproxy process:

UID   PID  PPID  C STIME TTY   TIME CMD1000000+    1  0  0 Nov21 ?   00:14:27 /usr/bin/openshift-router1000000+ 16011     1  0 12:42 ?   00:00:00 /usr/sbin/haproxy -f /var/lib/haproxy/conf/haproxy.config -p /var/lib/haproxy/run/haproxy.pid -x /var/lib/haproxy/run/haproxy.sock -sf 16004

Viewing the configuration file used by haproxy (only part):

global
  maxconn 20000
  daemon
  ca-base /etc/ssl
  crt-base /etc/ssl
 。。。。  defaults
  maxconn 20000

  # Add x-forwarded-for header.
  # server openshift_backend 127.0.0.1:8080
  errorfile 503 /var/lib/haproxy/conf/error-page-503.http。。。
  timeout http-request 10s
  timeout http-keep-alive 300s

  # Long timeout for WebSocket connections.
  timeout tunnel 1h

frontend public

  bind :80
  mode http
  tcp-request inspect-delay 5s
  tcp-request content accept if HTTP
  monitor-uri /_______internal_router_healthz

  # Strip off Proxy headers to prevent HTTpoxy (https://httpoxy.org/)
  http-request del-header Proxy

  # DNS labels are case insensitive (RFC 4343), we need to convert the hostname into lowercase
  # before matching, or any requests containing uppercase characters will never match.
  http-request set-header Host %[req.hdr(Host),lower]

  # check if we need to redirect/force using https.
  acl secure_redirect base,map_reg(/var/lib/haproxy/conf/os_route_http_redirect.map) -m found
  redirect scheme https if secure_redirect

  use_backend %[base,map_reg(/var/lib/haproxy/conf/os_http_be.map)]

  default_backend openshift_default

# public ssl accepts all connections and isn't checking certificates yet certificates to use will be# determined by the next backend in the chain which may be an app backend (passthrough termination) or a backend
# that terminates encryption in this router (edge)
frontend public_ssl

  bind :443
  tcp-request  inspect-delay 5s
  tcp-request content accept if { req_ssl_hello_type 1 }

  # if the connection is SNI and the route is a passthrough don't use the termination backend, just use the tcp backend
  # for the SNI case, we also need to compare it in case-insensitive mode (by converting it to lowercase) as RFC 4343 says
  acl sni req.ssl_sni -m found
  acl sni_passthrough req.ssl_sni,lower,map_reg(/var/lib/haproxy/conf/os_sni_passthrough.map) -m found
  use_backend %[req.ssl_sni,lower,map_reg(/var/lib/haproxy/conf/os_tcp_be.map)] if sni sni_passthrough

  # if the route is SNI and NOT passthrough enter the termination flow
  use_backend be_sni if sni

  # non SNI requests should enter a default termination backend rather than the custom cert SNI backend since it
  # will not be able to match a cert to an SNI host
  default_backend be_no_sni。。。
backend be_edge_http:demoprojectone:jenkins
  mode http
  option redispatch
  option forwardfor
  balance leastconn
  timeout server  4m

  timeout check 5000ms
  http-request set-header X-Forwarded-Host %[req.hdr(host)]
  http-request set-header X-Forwarded-Port %[dst_port]
  http-request set-header X-Forwarded-Proto http if !{ ssl_fc }
  http-request set-header X-Forwarded-Proto https if { ssl_fc }
  http-request set-header X-Forwarded-Proto-Version h2 if { ssl_fc_alpn -i h2 }
  http-request add-header Forwarded for=%[src];host=%[req.hdr(host)];proto=%[req.hdr(X-Forwarded-Proto)];proto-version=%[req.hdr(X-Forwarded-Proto-Version)]
  cookie 4376ea64d7d0abf11209cfe5f7cca1e7 insert indirect nocache httponly secure
  server pod:jenkins-1-84nrt:jenkins:10.128.2.13:8080 10.128.2.13:8080 cookie 8669a19afc9f0fed6824feb9fb1cf4ac weight 256
 。。。

For simplicity, the above only shows part of the configuration file, which mainly includes three types:

Global configuration, such as maximum connection number maxconn, timeout, etc.; and the front part, which is the frontend configuration, where HAProxy listens for external https and http requests on ports 443 and 80 respectively.
Backend, i.e., the backend configuration for each service, which contains many key contents, such as backend protocol (mode), load balancing method (balance), backend list (server, which here includes its IP address and port), certificates, etc.

Therefore, the routing functionality of OpenShift requires management and control over these three parts.

2.3 Global Configuration Management

To specify or modify HAProxy’s global configuration, OpenShift provides two methods:

(1) The first method is to use the oc adm router command to specify various parameters when creating the router, such as –max-connections for setting the maximum connection number. For example:

oc adm router --max-connections=200000 --ports='81:80,444:443' router3

The maxconn of the created HAProxy will be 20000, and the router3 service will expose ports 81 and 444 to the outside, but the HAProxy pod’s ports will still be 80 and 443.

(2) By setting environment variables for dc/ to set the global configuration of the router.

For a complete list of environment variables, please refer to the official documentation https://docs.openshift.com/container-platform/3.4/architecture/core_concepts/routes.html#haproxy-template-router. For example, after running the following command,

 oc set env dc/router3 ROUTER_SERVICE_HTTPS_PORT=444 ROUTER_SERVICE_HTTP_PORT=81 STATS_PORT=1937

router3 will be redeployed, and the newly deployed HAProxy’s https listening port will be 444, the http listening port will be 80, and the statistics port will be 1937.

2.4 OpenShift Passthrough Type Route and HAProxy Backend

(1) Create a route through the OpenShift Console or oc command, which will expose the jenkins service of the sit project to the domain name sitjenkins.com.cn:

Create a route in the interface:

Result:

Name:                   sitjenkins.com.cn
Namespace:              sit
Labels:                 app=jenkins-ephemeral
                        template=jenkins-ephemeral-template
Annotations:            <none>
Requested Host:         sitjenkins.com.cn
Path:                   <none>
TLS Termination:        passthrough
Endpoint Port:          web

Service:        jenkins
Weight:         100 (100%)
Endpoints:      10.128.2.15:8080, 10.131.0.10:8080

Here, the service name acts as an intermediary, connecting the route and the service endpoints (i.e., pods).

(2) The configuration file of the HAProxy process in the two pods of the router service has an additional backend:

# Secure backend, pass through
backend be_tcp:sit:sitjenkins.com.cn
  balance source

  hash-type consistent
  timeout check 5000ms}
  server pod:jenkins-1-bqhfj:jenkins:10.128.2.15:8080 10.128.2.15:8080 weight 256 check inter 5000ms
  server pod:jenkins-1-h2fff:jenkins:10.131.0.10:8080 10.131.0.10:8080 weight 256 check inter 5000ms

Among these, these backend servers are actually pods, which OpenShift finds through the service name in step (1). The balance is the load balancing strategy, which will be explained later.

(3) The file /var/lib/haproxy/conf/os_sni_passthrough.map has an additional record

sh-4.2$ cat /var/lib/haproxy/conf/os_sni_passthrough.map
^sitjenkins\.com\.cn(:[0-9]+)?(/.*)?$ 1

(4) The file /var/lib/haproxy/conf/os_tcp_be.map has an additional record

sh-4.2$ cat /var/lib/haproxy/conf/os_tcp_be.map
^sitjenkins\.com\.cn(:[0-9]+)?(/.*)?$ be_tcp:sit:sitjenkins.com.cn

(5) The HAProxy process selects the backend logic for this route based on the above map files:

frontend public_ssl  # Explanation: Frontend protocol https,

  bind :443  ## Frontend port 443
  tcp-request  inspect-delay 5s
  tcp-request content accept if { req_ssl_hello_type 1 }

  # if the connection is SNI and the route is a passthrough don't use the termination backend, just use the tcp backend
  # for the SNI case, we also need to compare it in case-insensitive mode (by converting it to lowercase) as RFC 4343 says
  acl sni req.ssl_sni -m found ## Check if the https request supports sni
  acl sni_passthrough req.ssl_sni,lower,map_reg(/var/lib/haproxy/conf/os_sni_passthrough.map) -m found ## Check if the hostname passed through sni is in the os_sni_passthrough.map file
  use_backend %[req.ssl_sni,lower,map_reg(/var/lib/haproxy/conf/os_tcp_be.map)] if sni sni_passthrough ## Get backend name from oc_tcp_be.map based on sni hostname

  # if the route is SNI and NOT passthrough enter the termination flow
  use_backend be_sni if sni

  # non SNI requests should enter a default termination backend rather than the custom cert SNI backend since it
  # will not be able to match a cert to an SNI host
  default_backend be_no_sni

(6) The HAProxy process will restart to apply the modified configuration file.

Some background knowledge needed to understand the script in (5):

SNI: TLS Server Name Indication (SNI) is an extension of the TLS network protocol that informs the server of the hostname the client will connect to before the TLS handshake, allowing the server to return the specified certificate to the client based on that hostname, thus enabling the server to support multiple certificates required for multiple hostnames. For more details, refer to https://en.wikipedia.org/wiki/Server_Name_Indication.
OpenShift passthrough route: This type of route does not terminate the SSL connection at the router; instead, the router passes the TLS connection to the backend. This will be explained further.
HAProxy support for SNI: HAProxy selects the specific backend based on the hostname in the SNI information. For more details, refer to https://www.haproxy.com/blog/enhanced-ssl-load-balancing-with-server-name-indication-sni-tls-extension/.
HAProxy ACL: For more details, refer to https://www.haproxy.com/documentation/aloha/10-0/traffic-management/lb-layer7/acls/.

From the blue comments above, we can see that the HAProxy process retrieves the backend name be_tcp:sit:sitjenkins.com.cn based on the hostname sitjenkins.com.cn passed through the https request via SNI, which corresponds to the backend in step (2).

The HAProxy used by OpenShift’s router adopts a domain-based load balancing routing method, as illustrated below. For detailed explanations, please refer to the official documentation.

2.5 OpenShift Edge and Re-encrypt Type Routes with HAProxy

The HAProxy frontend: The frontend still listens for external HTTPS requests on port 443.

frontend public_ssl
  bind :443
  .....
  # if the route is SNI and NOT passthrough enter the termination flow
  use_backend be_sni if sni

However, when the TLS termination type is not passthrough (edge or re-encrypt), the backend be_sni will be used.

backend be_sni
  server fe_sni 127.0.0.1:10444 weight 1 send-prox

This backend is provided by the local 127.0.0.1:10444 service, thus forwarding to the frontend fe_sni:

frontend fe_sni
  # terminate ssl on edge
  bind 127.0.0.1:10444 ssl no-sslv3 crt /var/lib/haproxy/router/certs/default.pem crt-list /var/lib/haproxy/conf/cert_config.map accept-proxy
  mode http
  ........................

  # map to backend
  # Search from most specific to general path (host case).
  # Note: If no match, haproxy uses the default_backend, no other
  #       use_backend directives below this will be processed.
  use_backend %[base,map_reg(/var/lib/haproxy/conf/os_edge_reencrypt_be.map)]

  default_backend openshift_default

Map mapping file:

sh-4.2$ cat /var/lib/haproxy/conf/os_edge_reencrypt_be.map
^edgejenkins\.com\.cn(:[0-9]+)?(/.*)?$ be_edge_http:sit:jenkins-edge

HAProxy backend for Edge type route:

backend be_edge_http:sit:jenkins-edge
  mode http
  option redispatch
  option forwardfor
  balance leastconn

  timeout check 5000ms
  .....
  server pod:jenkins-1-bqhfj:jenkins:10.128.2.15:8080 10.128.2.15:8080 cookie 71c6bd03732fa7da2f1b497b1e4c7993 weight 256 check inter 5000ms
  server pod:jenkins-1-h2fff:jenkins:10.131.0.10:8080 10.131.0.10:8080 cookie fa8d7fb72a46958a7add1406e6d26cc8 weight 256 check inter 5000ms

HAProxy backend for Re-encrypt type route:

# Plain http backend or backend with TLS terminated at the edge or a
# secure backend with re-encryption.
backend be_secure:sit:reencryptjenkins.com.cn
  mode http
  ........................

    http-request set-header X-Forwarded-Host %[req.hdr(host)]
    http-request set-header X-Forwarded-Port %[dst_port]
    http-request set-header X-Forwarded-Proto http if !{ ssl_fc }
    http-request set-header X-Forwarded-Proto https if { ssl_fc }
    http-request set-header X-Forwarded-Proto-Version h2 if { ssl_fc_alpn -i h2 }
  server pod:jenkins-1-bqhfj:jenkins:10.128.2.15:8080 10.128.2.15:8080 cookie ... weight 256 ssl verifyhost jenkins.sit.svc verify required ca-file /var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt check inter 5000ms # Encrypt the link with the backend and check the hostname
  server pod:jenkins-1-h2fff:jenkins:10.131.0.10:8080 10.131.0.10:8080 cookie ... weight 256 ssl verifyhost jenkins.sit.svc verify required ca-file /var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt check inter 5000ms

Here, it can be seen that the connection is re-encrypted using a key, but it is unclear why the mode is still http instead of https.

2.6 Setting and Modifying Route Configuration

The route configuration mainly has the following important aspects:

(1) SSL termination methods. There are three types:

Edge: TLS is terminated at the router, and non-SSL packets are forwarded to the backend pods. Therefore, a TLS certificate needs to be installed on the router. If not installed, the router’s default certificate will be used.
Passthrough: Encrypted packets are sent directly to the pods without TLS termination at the router, thus there is no need to configure certificates or keys on the router.
Re-encryption: A variant of edge. The router first uses one certificate for TLS termination, then uses another certificate to encrypt again and send to the backend pods. Therefore, the entire network path is encrypted.

Settings:

Can be set when creating the route or by modifying the termination configuration of the route to change its SSL termination method.
For more details, please refer to the official documentation https://docs.okd.io/latest/architecture/networking/routes.html#edge-termination.

(2) Load balancing strategies.

There are three strategies:

Roundrobin: Uses all backends in turn based on weight.
Leastconn: Selects the backend with the least connections to receive requests.
Source: Hashes the source IP to ensure that requests from the same source IP are sent to the same backend.

Settings:

To modify the load balancing strategy for the entire router, use the ROUTER_TCP_BALANCE_SCHEME environment variable to set the load balancing strategy for all passthrough type routes, and use ROUTER_LOAD_BALANCE_ALGORITHM for other types of routes.
Can use haproxy.router.openshift.io/balance to set the load balancing strategy for a specific route.

Examples:

Set the environment variable for the entire router: oc set env dc/router ROUTER_TCP_BALANCE_SCHEME=roundrobin.
After modifying, this router instance will be redeployed, and all passthrough routes will be of the roundrobin type. The default is source type.
Modify the load balancing strategy of a specific route: oc edit route aaaa.svc.cluster.local.

After modification, the balance value in the backend corresponding to this route in HAProxy will be changed to leastconn.

3. How Does OpenShift Router Service Achieve High Availability?

The OpenShift router service supports two high availability modes.

3.1 Single Router Service with Multiple Replicas, Utilizing DNS/LB for High Availability

This mode deploys only one router service, which supports all externally exposed services of the cluster. To achieve HA, the number of replicas must be greater than 1, creating pods on more than one server, and then using DNS polling or layer 4 load balancing.

Since the HAProxy in the router/pod needs to maintain a local configuration file, they are, in fact, stateful containers. OpenShift uses etcd as a unified storage for configurations, and the openshift-router process should adopt some mechanism (be notified or pull regularly) to obtain the router and route configurations from etcd, then modify the local configuration file, and restart the HAProxy process to apply the newly modified configuration file. To understand the working principles in depth, refer to the source code.

3.2 Multiple Router Services Achieve High Availability via Sharding

In this mode, the administrator needs to create and deploy multiple router services, each supporting one or several projects/namespaces. The mapping between routers and projects/namespaces is implemented using labels.

For specific configurations, please refer to the official website https://docs.openshift.com/container-platform/3.11/install_config/router/default_haproxy_router.html.

In fact, similar to the sharding function of some products (like MySQL, memcached), this function is more aimed at solving performance issues rather than completely addressing high availability issues.

4. How to Troubleshoot Common Issues?

From the above analysis, it can be seen that to ensure that both the router and route are functioning properly, at least the following aspects must be in order:

The client accesses the service using the domain name and port configured in the route.
DNS can resolve the domain name to the server where the target router is located (this is more complex when using sharding configurations, especially to pay attention).
If another layer 4 load balancer is used, it must be configured correctly and functioning properly.
HAProxy can match the correct backend based on the domain name.
The configurations of the router and route are correctly reflected in the HAProxy configuration file.
The HAProxy process has restarted, thus reading the newly modified configuration file.
The backend pod list is correct, and at least one pod is functioning properly.

If you see the following error page, it indicates that at least one of the points 3 to 7 above is not functioning properly. At this point, targeted troubleshooting can be performed.

Related Reading:

2019 Kubernetes Six Major Trends Forecast

For more articles, please follow

Click here for more beautiful articles [More Beautiful] 👇