Bernard Pietraga

Blog

Thoughts on security benchmarks: cloud vs external providers

A bit on my background. I’m a Site Reliability Engineer and co-founder of consulting agency. I’m focusing on Site Reliability, Security Engineering, and Infrastructure Automation. I have worked on Linux hardening, improving cloud security posture for companies around the world.

I have seen cases where the tools utilized by security teams weren’t used to their full potential, just to solve simple problems. Mentioned tools cost hundreds of thousands of dollars in licensing fees yearly. Things that could be addressed by simple checks provided as a cheaper product in cloud provider offering or open-source tools. Currently, the market for security add-ons to cloud and container infrastructure is rapidly expanding. Many claims made by advertisers need to be taken with a grain of salt.

DISCLAIMER: I do not advocate canceling the use of external security providers. My point is that the tools need to be chosen wisely with a clear goal in mind and tested. Proper security tools definitely will help you keep a reasonable security posture. If it is possible, look for open-source solutions first. I do encourage you to do proper research or hire someone competent to do so. Mentioned companies here hire smart folks and solve complex problems. The question is if they solve problems you are focusing on. This just touches a small subset of security tools’ functionality. This article doesn’t cover Intrusion Detection or other imporatant things. This article also doesn’t debate the AWS CIS and Foundational Best Practices but rather uses them as an example.

Why it is important to have compliance checks for the cloud?

Compliance checks are important for the cloud because they ensure that data is being stored and accessed in a way that meets the reasonable requirements of the organization. The proper policies can guide engineers to properly configure cloud resources and thus protect the privacy of the data and ensure that it is being used in a way that is authorized by the company. They are not a silver bullet solution, just one preventative measure. By default, the cloud provider-issued ones are not great, but definitely better than nothing. NSA also issues reasonable guidainces for the cloud. Consider swiss cheese model.

Security/Compliance checks example

Often security engineers expect tools with accessible UI which provides a dashboard and alerting functionality on cloud misconfiguration. I will focus on this case, to keep the scope of this article confined.

A great example of this is security checks and tools which provide audit reports. I will focus on AWS as it is the most common cloud provider, but things presented here could be replicated for Google Cloud and Azure.

Now, something which will require a bit of research from your side. You can just skim through policy names or read exactly what they do. Keep an eye for Cloud provider best practices and their provided checks, and compare them with external cloud provider solutions.

Let’s start specifying the problem:

  • We want to have a compliance check platform for generic AWS configuration checks. To start this will be CIS Benchmark and AWS Foundational Best Practices, which will inform us of non-compliant or misconfigured resources. An example could be a non-encrypted cloud storage bucket like S3. This should cover public best practices s by cloud pro
  • We should have to alert with information about our findings.

Take a look at the CIS benchmark. It provides a set of secuirty standards for Cloud Security.
Next is link to official CIS AWS Foundations Benchmark controls. For start lets also take a look at The AWS Foundational Security Best Practices

This benchmark controls alongside with AWS Config and AWS Audit Manager could provide you with compliance check for the above benchmarks. Both tools are part of AWS Security Hub platform.

Here you can find list of AWS Config Managed Rules

AWS issues documentaion and set of rules named: AWS Foundational Security Best Pratices. They can be enabled ihe AWS Security Hub.

Let’s compare it with checks provided by Security Companies

Security Companies love to provide 1 to 1 comparison or matrixes presenting the offering and comparing it to other solutions. I recommend taking a different approach. Just take a look at the documentation.

Let’s take a look at Bridgecrew. They do have a verbose security offering, but for the scope of this article, let’s just focus on the AWS security policies they check. Here is the link to Bridgecrew AWS Policy Index. Take your time and compare it to AWS CIS Benchmark + AWS Best Practices mentioned before. Did you find some similarities?

Now let’s take a look at Lacework and their Advanced Suppression – Tag Martix They are clear about the CIS benchmark as the policies include AWS_CIS_ in the prefix which is plus, but let’s take a look at the rules starting with LW_AWS_ or LW_S3_. Compare them again with AWS Foundational Security Best Pratices. Did you notice something?

Deployment of proabably cheaper solution

Now lets take a look how we could deploy it:

  • You could code terraform yourself enabling using examples from securityhub_standards_subscription.
  • We could use the open-source terraform module from Cloudposse. They are nice folks and their repos are acutally monitored by Bridgecrew.
    • Cloudposse made free Terraform module for AWS Security Hub. Mentioned default rules can be found here in GitHub.
    • AWS Config is neccessary component of AWS Security Hub, it can be managed by terraform resource here or another Cloudposse module terraform-aws-config.
    • Note that I’m not the biggest fan of their module structuring convention of Cloudposse which differs from Hashicorp standard (which is reasonable and embedded in most of the available open-source modules). But using this one can help you bring up the whole setup for one account quickly and it is hard to complain about hard work someone did and gave out for free.
  • You could use AWS Cloudformation or CLI if we are want to.
  • You can just use AWS Console UI to enable AWS Security Hub. AWS Config is part of the bundle. AWS Audit Manager will be used reports.

What we got?

We have replicated the subset of security compliance checks functionality provided by cloud security providers. Of course, alerting is missing, but it can be solved by AWS provided Lambda function.

What about Kubernetes?

Falco has also Kubernetes and AWS Cloudtrail rules for you open-source already.

I’m looking for SIEM

Check out open-source Wazzuh.

But what if I don’t trust Cloud Provider to monitor Cloud Provider?

Well, from my experience already, a lot of the security companies utilize the cloud to perform analytics and then alert them. I do not have links to back it up, so take it with a grain of salt as everything you read on the internet. You can just stop using the cloud altogether.

Is it enough?

Of course not. Security is a complex field, this is just one detached example. We are not touching attack vectors, just take a look at Mittre Att&ck, and other important topics. This is just an example comparison. Having even the best security checks doesn’t replace infrastructure and application hardening.

What about SOC 2, PCI compliance, NIST, etc.

The majority of mentioned compliance standards are also already covered by cloud provider documentation. Do your research, and think what are your goals and requirements.

Do you have questions?

Reach out to me on About Me page.

IAC tagging for your git blame needs – Yor

Have you ever been in situation where you noticed some weird cloud resource or weird Kubernetes pod? Have you ever had a hard time finding respecitve owner of this resource?

Ideally this situation should not happen, with good practices in place and propper tagging enforcement. Unfortunately we are not living in perfect world, the agile manifesto of Facebook "move fast, break things" which spread to a lot of places leave a lot of things untidy. If you worked long enough in the field complex infrastructure you know what I’m reffering to.

Having proper tags/Kuberentes labels in place helps to keep mental picture of infrastructure, maintain and navigate quicker.

Yor – new free automation tool for IAC tagging

yor by Bridgecrew is free and open-source project aiming to help infrastructure engineers help to orginize their cloud and kuberentes resources. There is support for 3 big cloud providers and kuberentes.

Tracability is one of biggest problems in greenfield projects or big corporate projects.

Tracing code in the cloud can be a challenge, especially when you want to track down specific change to cloud infrastructure. One solution is to use custom tags to help track down your code. You can then use native cloud or kuberentes tools to understand the problem better.

Yor adds important information each new created resource including for example git commit sha. This might save you years after, when you will be looking why something is there in the first place. yor_trace or git_file enables you to find resource in code repostory.

It does integrate with major CI platforms and could be easly integrated with new ones. Here are documentation examples. It works with pre-commit hooks.

Why it is important to tag your cloud infrastructure?

Cloud computing has quickly become a mainstay in the modern IT landscape. It offers many benefits, such as lower costs and better agility, but one challenge it faces is how to manage and monitor resources. One solution is to tag resources so that they can be easily found and managed. In this article, we’ll look at tagging resources in Kubernetes and Amazon Web Services (AWS).

Tagging resources can be a great way to improve your management of your cloud infrastructure. For example, you could use tags to group related resources together, such as instances or services. This can make it much easier to find and manage these resources. You could also use tags to track specific changes, such as when an instance is upgraded or deleted. When you tag your resources in Kubernetes or AWS resources managed by Terraform, you can easily find and access them later. This becomes especially important when you need to troubleshoot or manage a complex cloud-based application. By tagging your resources, you can quickly identify which components are affected by a problem.

Kubernetes lets you tag both individual objects (e.g., pods) and collections of objects (e. g. crds). But this all comes at the cost of maintenance. Yor tries to fix this provinding reasonable set of functionality out of the box.

Scan Docker container images without logging into any solution

Sometimes you want a quick check for any CVE in the Docker image. You are in some Linux machine. You don’t want to use docker scan which is based on Snyk and requires login. Well, you can use the Trivy. It is free, has Apache 2.0 License. Additionally, the tool works with the Terraform code and Linux os. It is not a check providing 100% information about all the issues but a good starting point.

trivy image automattable/memes:0.2-spongebob

Another option is to use the achore/grype. It provides similar functionality covering the scanning of the containers. if the image is created using a multilayer build this tool provides a way to check all the layers.

grype automattable/memes:0.2-spongebob --scope all-layers

The Terraformer might be handy for your IAC toolbelt

Terraform now is pretty much one of the most widely used tools if not industry standard for Cloud configuration. No matter if it is AWS, GCP, or Azure. Unfortunately its resource import feature is tedious to use. If you want to move the existing cloud setup to terraform, there is a better way to do it.

I have worked with a few companies which migrated to the cloud and alongside adopting the Infrastructure As Code approach. What I noticed is that a lot of engineers working with Terraform don’t know about one great open-source tool which tremendously saves time. The tool is called Terraformer. It provides a way to import existing cloud-config to Terraform including the corresponding tfstate.

To use it the existing setup just needs to be present on the cloud. It doesn’t matter if it was previously created manually in the UI, or via API calls, AWS Cloudfomation, gcloud CLI, or other ways.

Of course, there is a minor caveat with this solution. It is still beta, not everything is supported in Terraformer, but this mostly applies more to exotic parts of Cloud offering like security tools (AWS Inspector for example, or Google Cloud IDS), than the basic components which are used almost everywhere like IAM (or AD in Azure), EC2 instances, AKS/EKS/GKE and etc.

Edit: 03/02/2022

Friend asked me for example usage, so I’m including it here.

# Import all Cloudwatch and IAM resource from selected aws profile
terraformer import aws -r cloudwatch,iam --profile=your_aws_profile_name

# List supported resources in google cloud, make sure you have your gcloud context set
terraformer import google list

# Make a plan for importing all of Grafana resources.
terraformer plan grafana -r=*

Scaling Deployments With Go Kubernetes Client

So you want to auto scale your deployments with Golang?

I’ve written this post as I haven’t found any comprehensive guide on doing simple scaling from within Golang app. It utilizes Kubernetes client-go and Kubernetes apimachinery. Note that official documentation is great and can provide information that enable everyone to get up to speed easily.
Image notes: The Gopher mascot used in post image is created by Renee French and shared on Creative Commons Attribution 3.0 License. I’ve combined it with burger from Katerina Limpitsouni undraw.co


It is time to start. You have an application running in lovely K8s, which performs some tasks, and you want to autoscale it based on some really custom internal metric. It is not CPU, requests, or any other thing with which can be done using custom metric or other approach. This article is written as a weird solution to a rare problem and is done using Golang. Most of the time deployments, and other bits of Kubernetes infrastructure can be handled using out of the box components. In case of deployment pods it can be Horizontal Pod Autoscaler which with help of vast metrics can achieve the scaling result we need for our system. So think twice about using it.

This solution also might be helpful in cases when the conditional scaling is during selected time, for example during Superbowl adverts, when you know, you will get a lot of the traffic.

So lets asume we have great startup idea. Application which handles the bread grill toasters, which will be used in the restaurant chains all over the world. There is service which hold a state about amount of oders waiting to be grilled and comunicates with the shiny IOT toasters about heating metrics. There is one pod per toaster. Everyday the restaurant starts there are only 3 toasters running, which means only 3 pods defined in deployment. When the orders ramp up, the restaurant manager calls you and asks if there is possbility of heating other cold toasters laying in the room.

This is an example, which is insecure, using plain requests via http, consider secure communication before using it.

Here we go with our sandwich journey

So lets assume that the serivce responsible for managing toasters for the sake of the convinience returns json response with just one field representing current number of orders which is "orders": . It will be basis for our logic of scaling up and down.

We will create logic that if the number of orders to be fullfiled per toaster (pod) is greater than 5 than the pods should be scaled up. If it is lower than 5 than we should scale down, minimally to 3 pods. Everything shoudl happen in 10 second intervals.

RBAC rules needed to do the job

Skip this section if you are only interested in Golang code. RBAC rules are necessary to perform changes on our deployment. This example above is permissive, so adjust accordingly for security. The goal is to enable our simple toaster app to access data about deployments – replicas, which represent amount of pods and change this amount on the fly.

Lets create rbac.yaml file.

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: default
  name: simple-deployments-manager
rules:
- apiGroups: ["extensions", "apps"]
  resources: ["deployments", "pods"]
  verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
---
apiVersion: rbac.authorization.k8s.io/v1
# This role binding allows "dave" to read secrets in the "development" namespace.
# You need to already have a ClusterRole named "deployments".
kind: RoleBinding
metadata:
  name: simple-deployments-manager
  #
  # The namespace of the ClusterRoleBinding determines where the permissions are granted.
  # This only grants permissions within the "development" namespace.
  namespace: default
subjects:
- kind: User
  name: system:serviceaccount:default:default # Name is case sensitive
  apiGroup: rbac.authorization.k8s.io
- kind: Group
  name: manager # Name is case sensitive
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: simple-deployments-manager
  apiGroup: rbac.authorization.k8s.io

Deployment – we also need to put our app somewhere

Also skip this section if you are only interested in Golang code. We need to place our app somewhere and to do this, here is an example of file content. Important note is that we pass the environmental variable which is the address of the service providing us with order data in our case.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: simple-scaler
  labels:
    app: simple-scaler
spec:
  replicas: 1
  selector:
    matchLabels:
      app: simple-scaler
  template:
    metadata:
      labels:
        app: simple-scaler
    spec:
      containers:
      - name: simple-scaler
        image: simple-scaler
        imagePullPolicy: Never
        env:
        - name: MANAGED_SERVICE_HOST
          value: "managed-service:8080"

Docker image for our main.go

To be able to successfully deploy the app to cluster we need to put it into docker. Here is the simple content which runs the app as non-root.

# Build the simplescaler binary
FROM golang:1.16 as builder

WORKDIR /workspace
# Copy the Go Modules manifests
COPY go.mod go.mod
COPY go.sum go.sum
# cache deps before building and copying source so that we don't need to re-download as much
# and so that source changes don't invalidate our downloaded layer
RUN go mod download

# Copy the go source
COPY main.go main.go

# Build
RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 GO111MODULE=on go build -a -o simplescaler main.go

# Use distroless as minimal base image to package the simplescaler binary
# Refer to https://github.com/GoogleContainerTools/distroless for more details
FROM gcr.io/distroless/static:nonroot
WORKDIR /
COPY --from=builder /workspace/simplescaler .
USER 65532:65532

ENTRYPOINT ["/simplescaler"]

Logic – the brain of restaurant scaling – Golang code

Finally, the Golang code. So lets look at the functions we need to get stuff done.

Scroll down if you are only interested in the complete file!

First of all lets create file main.go which we will later run in a separate pod in our greatest cluster.

Now lets start with components we need to map for our app to work. Let’s start with the struct which will be representing the response from the service providing us with a number of orders.

// OrdersResponse is struct representing JSON response from our managed service
// http://managed-service:8080/orders
type OrdersResponse struct {
    Orders int32 "json:\"count,omitempty\"" // Backticks are issue for markdown parser
}

Now lets jump further and map other parts needed. On the side of the Kubernetes we will need the types and interfaces which will represent Kubernetes objects.

// ManagedDeployment struct represents the state used to preform replica set
// scaling.
type ManagedDeployment struct {
    // k8s client of the kuberentes deployment
    Client appsv1.DeploymentInterface

    // System metrics from managed-service from 10 second timespan
    ReplicaCount int32

    // Fetched whole deployment state
    State *v1.Deployment
}

// ManagedDeploymentInterface is the interface exposing methods
type ManagedDeploymentInterface interface {
    Get() error
    UpdateReplicas() error
}

Small factory for the ManagedDeployment will be nice:

// NewManagedDeploymentManager is factory used to create ManagedDeployment
func NewManagedDeploymentManager(deploymentsClient appsv1.DeploymentInterface) *ManagedDeployment {
    return &ManagedDeployment{Client: deploymentsClient}
}

Now the interesting bit. For this we will use some Golang standard libraries and the Kubernetes client-go and Kubernetes apimachinery.

We will need some imports.

import (
    "context"
    "encoding/json"
    "flag"
    "fmt"
    "log"
    "net/http"
    "os"
    "path/filepath"
    "strings"
    "time"

    v1 "k8s.io/api/apps/v1"
    apiv1 "k8s.io/api/core/v1"
    metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
    "k8s.io/client-go/kubernetes"
    appsv1 "k8s.io/client-go/kubernetes/typed/apps/v1"
    "k8s.io/client-go/rest"
    "k8s.io/client-go/tools/clientcmd"
    "k8s.io/client-go/util/retry"
)

Let’s implement a 2 functions from our ManagedDeploymentInterface:

  • Get()
  • UpdateReplicas()

First we will create simple pointer helper, as the client library requires pointers to 32 bit integers.

// int32Ptr is used to create pointer neecessary for the replica count
func int32Ptr(i int32) *int32 { return &i }

Get() – will update the attributes of our ManagedDeployment with fresh data from Deployment – including what is important for us State and ReplicaCount – number of pods.

// Get is used to fetch new state und update the ManagedDeployment.
func (c *ManagedDeployment) Get() error {
    result, getErr := c.Client.Get(context.TODO(),
        "managed-deployment",
        metav1.GetOptions{})
    if getErr != nil {
        return getErr
    }
    replicaCount := *result.Spec.Replicas
    c.State = result
    c.ReplicaCount = replicaCount
    return nil
}

UpdateReplicas(count int32) – will update the ManagedDeployment with count representing number of pods. The 32-bit integer is used as it is official client requirement.

// UpdateReplicas is used to change the number of deployemnt replicas based on
// provided count number
func (c *ManagedDeployment) UpdateReplicas(count int32) error {
    c.State.Spec.Replicas = int32Ptr(count)
    _, updateErr := c.Client.Update(context.TODO(),
        c.State,
        metav1.UpdateOptions{})
    if updateErr != nil {
        return updateErr
    }
    c.ReplicaCount = count
    return nil
}

Now it is time to wrap it up into a method which will query the managed host and return the simple pod count.

// GetManagedOrdersCount asks managed-service api to get running processes
// count. Serializes response body to json and returns int containging number
func GetManagedOrdersCount() (int32, error) {
    url := "http://" + ManagedHost + "/orders"
    r := strings.NewReader("")
    req, clientErr := http.NewRequest("GET", url, r)
    if clientErr != nil {
        return int32(0),
            fmt.Errorf("Prepare request : %v", clientErr)
    }
    req.Header.Set("Content-Type", "application/json")

    client := &http.Client{}

    resp, reqErr := client.Do(req)
    if reqErr != nil {
        return int32(0),
            fmt.Errorf("Failed to query the process count : %v", reqErr)
    }
    defer resp.Body.Close()

    // Decode the data
    var cResp OrdersResponse
    if jsonErr := json.NewDecoder(resp.Body).Decode(&cResp); jsonErr != nil {
        return int32(0), fmt.Errorf("Error parsing process count json : %v", jsonErr)
    }
    return cResp.Orders, nil
}

To connect to the cluster lets setup 2 client helpers, which will aid us if we use running locally (outside docker) or in the cluster.


// clientSetup returns appropriate clientset with config either from ./kube
// or cluster api configuration
func clientSetup() (*kubernetes.Clientset, error) {
    if runningInDocker() {
        return setupDockerClusterClient()
    } else {
        return setupLocalClient()
    }
}

// setupLocalClient from minikube ./kube/config
func setupLocalClient() (*kubernetes.Clientset, error) {
    var ns string
    flag.StringVar(&ns, "namespace", "", "namespace")

    // Bootstrap k8s configuration from local   Kubernetes config file
    kubeconfig := filepath.Join(os.Getenv("HOME"), ".kube", "config")
    log.Println("Using kubeconfig file: ", kubeconfig)
    config, err := clientcmd.BuildConfigFromFlags("", kubeconfig)
    if err != nil {
        return &kubernetes.Clientset{}, err
    }

    // Create an rest client not targeting specific API version
    clientset, err := kubernetes.NewForConfig(config)
    return clientset, err
}

// setupDockerClusterClient from rest cluster api configuration.
func setupDockerClusterClient() (*kubernetes.Clientset, error) {
    log.Println("Running inside container")
    // creates the in-cluster config
    config, err := rest.InClusterConfig()
    if err != nil {
        return &kubernetes.Clientset{}, err
    }
    // creates the clientset
    clientset, err := kubernetes.NewForConfig(config)
    return clientset, err
}

Now the logic part, the main function which will run recursively. The important part is the usage of ticker, which rather than sleep will keep real 10 second intervals.

func main() {
    // ManagedHost is URI of the managed-service running on cluster
    var ManagedHost string = os.Getenv("MANAGED_SERVICE_HOST")

    // to prevent tick shifting rather than using time.Sleep I've decided to go
    // with time.Tick which of course this also has some downsides. Channels
    // here would add safety. In extreme case there might be race condition.
    tick := time.Tick(10 * time.Second)
    tickCount := 0

    clientset, err := clientSetup()
    if err != nil {
        log.Fatal(err)
    }

    deploymentsClient := clientset.AppsV1().Deployments(apiv1.NamespaceDefault)
    ManagedDeployment := NewManagedDeploymentManager(deploymentsClient)

    // Retrieve the initial version of Deployment
    getErr := ManagedDeployment.Get()
    if getErr != nil {
        log.Fatal(fmt.Errorf("Failed to get latest version of \"orders-deployment\" Deployment: %v", getErr))
    }

    if ManagedDeployment.ReplicaCount <= 0 {
        log.Fatal(fmt.Errorf("Replica count is equal %d, this means the service is still deploying or there is an issue", ManagedDeployment.ReplicaCount))
    }

    // Get process count from managed-service
    oldOrdersCount, err := GetManagedOrdersCount()
    if err != nil {
        log.Fatal(err)
    }

    // in 10 second intervals update orders-deployment replicas if needed.
    for range tick {
        // Update replica count
        retryErr := retry.RetryOnConflict(retry.DefaultRetry, func() error {
            tickCount++ // just for logging puproposes

            // Retrieve the latest version of Deployment before attempting to
            // run update. RetryOnConflict uses exponential backoff to avoid
            // exhausting the apiserver
            getErr := ManagedDeployment.Get()
            if getErr != nil {
                return fmt.Errorf("Failed to get latest version of \"orders-deployment\" Deployment: %v", getErr)
            }

            // Get current process count from managed-service
            currentOrdersCount, err := GetManagedOrdersCount()
            if err != nil {
                return err
            }

            // Find how much processes where created in 10 seconds
            ordersIn10Seconds := currentOrdersCount - oldOrdersCount

            // Count no processes for instance (please take to account int math)
            ordersPerInstance := ordersIn10Seconds / ManagedDeployment.ReplicaCount

            // Overwrite the old process count for new comparisons in next tick
            // it is not needed at this state.
            oldOrdersCount = currentOrdersCount

            if ordersPerInstance >= 5 && ManagedDeployment.ReplicaCount <= 3 {
                // Increment the Replica Count, note: won't use ++ as it can
                // impact second conditianal comparsion.
                oneMore := ManagedDeployment.ReplicaCount + 1
                updateErr := ManagedDeployment.UpdateReplicas(oneMore)
                if updateErr == nil {
                    fmt.Println("   -> Replica count changed to",
                        oneMore)
                }
                return updateErr
            } else if ordersPerInstance < 5 && ManagedDeployment.ReplicaCount > 3 {
                oneLess := ManagedDeployment.ReplicaCount - 1
                updateErr := ManagedDeployment.UpdateReplicas(oneLess)
                if updateErr == nil {
                    fmt.Println("   -> Replica count changed to",
                        oneLess)
                }
                return updateErr
            }

            // Overwrite the old process count for new comparisons in next tick
            oldOrdersCount = currentOrdersCount
            return nil
        })

        if retryErr != nil {
            panic(fmt.Errorf("Update failed: %v", retryErr))
        }
    }
    // end of loop
}

That’s our core logic, responsible for handling deployment scaling back and forth.

Final main.go file

Putting it all together, here is the complete main.go file.

package main

import (
    "context"
    "encoding/json"
    "flag"
    "fmt"
    "log"
    "net/http"
    "os"
    "path/filepath"
    "strings"
    "time"

    v1 "k8s.io/api/apps/v1"
    apiv1 "k8s.io/api/core/v1"
    metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
    "k8s.io/client-go/kubernetes"
    appsv1 "k8s.io/client-go/kubernetes/typed/apps/v1"
    "k8s.io/client-go/rest"
    "k8s.io/client-go/tools/clientcmd"
    "k8s.io/client-go/util/retry"
)

// OrdersResponse is struct representing JSON response from:
// http://managed-service:8080/orders
type OrdersResponse struct {
    Orders int32 "json:\"count,omitempty\"" // Backticks are issue for markdown parser
}

// ManagedDeployment struct represents the state used to preform replica set
// scaling.
type ManagedDeployment struct {
    // k8s client of the kuberentes deployment
    Client appsv1.DeploymentInterface

    // System metrics from managed-service from 10 second timespan
    ReplicaCount int32

    // Fetched whole deployment state
    State *v1.Deployment
}

// ManagedDeploymentInterface is the interface exposing methods
type ManagedDeploymentInterface interface {
    Get() error
    UpdateReplicas() error
}

// NewManagedDeploymentManager is factory used to create ManagedDeployment
func NewManagedDeploymentManager(deploymentsClient appsv1.DeploymentInterface) *ManagedDeployment {
    return &ManagedDeployment{Client: deploymentsClient}
}

func main() {
    // ManagedHost is URI of the managed-service running on cluster
    var ManagedHost string = os.Getenv("MANAGED_SERVICE_HOST")

    // to prevent tick shifting rather than using time.Sleep I've decided to go
    // with time.Tick which of course this also has some downsides. Channels
    // here would add safety. In extreme case there might be race condition.
    tick := time.Tick(10 * time.Second)
    tickCount := 0

    // Setup client, might be better to use interfaces here.
    clientset, err := clientSetup()
    if err != nil {
        log.Fatal(err)
    }

    deploymentsClient := clientset.AppsV1().Deployments(apiv1.NamespaceDefault)
    ManagedDeployment := NewManagedDeploymentManager(deploymentsClient)

    // Retrieve the latest version of Deployment
    getErr := ManagedDeployment.Get()
    if getErr != nil {
        log.Fatal(fmt.Errorf("Failed to get latest version of \"orders-deployment\" Deployment: %v", getErr))
    }

    if ManagedDeployment.ReplicaCount <= 0 {
        log.Fatal(fmt.Errorf("Replica count is equal %d, this means the service is still deploying or there is an issue", ManagedDeployment.ReplicaCount))
    }

    // Get process count from managed-service
    oldOrdersCount, err := GetManagedOrdersCount()
    if err != nil {
        log.Fatal(err)
    }

    // in 10 second intervals update orders-deployment replicas if needed.
    for range tick {
        // Update replica count
        retryErr := retry.RetryOnConflict(retry.DefaultRetry, func() error {
            tickCount++ // just for logging puproposes

            // Retrieve the latest version of Deployment before attempting to
            // run update. RetryOnConflict uses exponential backoff to avoid
            // exhausting the apiserver
            getErr := ManagedDeployment.Get()
            if getErr != nil {
                return fmt.Errorf("Failed to get latest version of \"orders-deployment\" Deployment: %v", getErr)
            }

            // Get current process count from managed-service
            currentOrdersCount, err := GetManagedOrdersCount()
            if err != nil {
                return err
            }

            // Find how much processes where created in 10 seconds
            ordersIn10Seconds := currentOrdersCount - oldOrdersCount

            // Count no processes for instance (please take to account int math)
            ordersPerInstance := ordersIn10Seconds / ManagedDeployment.ReplicaCount

            // Overwrite the old process count for new comparisons in next tick
            // it is not needed at this state.
            oldOrdersCount = currentOrdersCount

            if ordersPerInstance >= 5 && ManagedDeployment.ReplicaCount <= 3 {
                // Increment the Replica Count, note: won't use ++ as it can
                // impact second conditianal comparsion.
                oneMore := ManagedDeployment.ReplicaCount + 1
                updateErr := ManagedDeployment.UpdateReplicas(oneMore)
                if updateErr == nil {
                    fmt.Println("   -> Replica count changed to",
                        oneMore)
                }
                return updateErr
            } else if ordersPerInstance < 5 && ManagedDeployment.ReplicaCount > 3 {
                oneLess := ManagedDeployment.ReplicaCount - 1
                updateErr := ManagedDeployment.UpdateReplicas(oneLess)
                if updateErr == nil {
                    fmt.Println("   -> Replica count changed to",
                        oneLess)
                }
                return updateErr
            }

            // Overwrite the old process count for new comparisons in next tick
            oldOrdersCount = currentOrdersCount
            return nil
        })

        if retryErr != nil {
            panic(fmt.Errorf("Update failed: %v", retryErr))
        }
    }
    // end of loop
}

// Get is used to fetch new state und update the ManagedDeployment.
func (c *ManagedDeployment) Get() error {
    result, getErr := c.Client.Get(context.TODO(),
        "managed-deployment",
        metav1.GetOptions{})
    if getErr != nil {
        return getErr
    }
    replicaCount := *result.Spec.Replicas
    c.State = result
    c.ReplicaCount = replicaCount
    return nil
}

// UpdateReplicas is used to change the number of deployemnt replicas based on
// provided count number
func (c *ManagedDeployment) UpdateReplicas(count int32) error {
    c.State.Spec.Replicas = int32Ptr(count)
    _, updateErr := c.Client.Update(context.TODO(),
        c.State,
        metav1.UpdateOptions{})
    if updateErr != nil {
        return updateErr
    }
    c.ReplicaCount = count
    fmt.Println("REPLICA COUNT UPDATE", c.ReplicaCount)
    return nil
}

// GetManagedOrdersCount asks managed-service api to get running processes
// count. Serializes response body to json and returns int containging number
func GetManagedOrdersCount() (int32, error) {
    url := "http://" + ManagedHost + "/orders"
    r := strings.NewReader("")
    req, clientErr := http.NewRequest("GET", url, r)
    if clientErr != nil {
        return int32(0),
            fmt.Errorf("Prepare request : %v", clientErr)
    }
    req.Header.Set("Content-Type", "application/json")

    client := &http.Client{}

    resp, reqErr := client.Do(req)
    if reqErr != nil {
        return int32(0),
            fmt.Errorf("Failed to query the process count : %v", reqErr)
    }
    defer resp.Body.Close()

    // Decode the data
    var cResp OrdersResponse
    if jsonErr := json.NewDecoder(resp.Body).Decode(&cResp); jsonErr != nil {
        return int32(0), fmt.Errorf("Error parsing process count json : %v", jsonErr)
    }
    return cResp.Orders, nil
}

// int32Ptr is used to create pointer neecessary for the replica count
func int32Ptr(i int32) *int32 { return &i }

// runningInDocker check if there is .dockerenv present, meaning the process is
// running in container. I'm aware that this is not deterministic check for all
// of the setups, but it work fine for this example minikube app.
func runningInDocker() bool {
    if _, err := os.Stat("/.dockerenv"); err == nil {
        return true
    } else if os.IsNotExist(err) {
        return false
    } else {
        // Schrodinger cat, this might be container or not :)
        return false
    }
}

// clientSetup returns appropriate clientset with config either from ./kube
// or cluster api configuration
func clientSetup() (*kubernetes.Clientset, error) {
    if runningInDocker() {
        return setupDockerClusterClient()
    } else {
        return setupLocalClient()
    }
}

// setupLocalClient from minikube ./kube/config
func setupLocalClient() (*kubernetes.Clientset, error) {
    var ns string
    flag.StringVar(&ns, "namespace", "", "namespace")

    // Bootstrap k8s configuration from local   Kubernetes config file
    kubeconfig := filepath.Join(os.Getenv("HOME"), ".kube", "config")
    log.Println("Using kubeconfig file: ", kubeconfig)
    config, err := clientcmd.BuildConfigFromFlags("", kubeconfig)
    if err != nil {
        return &kubernetes.Clientset{}, err
    }

    // Create an rest client not targeting specific API version
    clientset, err := kubernetes.NewForConfig(config)
    return clientset, err
}

// setupDockerClusterClient from rest cluster api configuration.
func setupDockerClusterClient() (*kubernetes.Clientset, error) {
    log.Println("Running inside container")
    // creates the in-cluster config
    config, err := rest.InClusterConfig()
    if err != nil {
        return &kubernetes.Clientset{}, err
    }
    // creates the clientset
    clientset, err := kubernetes.NewForConfig(config)
    return clientset, err
}

Bash array as a Terraform list of strings input variable

So terraform supports string list assigment to new input variable but not the bash array format. But during the time of writing.

Lets say that we have bash array like this one below

BASH_ARRAY=( "a" "b" "c" )

And we want to pass the bash array to terraform variable declaration which is list of strings.

variable "example" {
  type = list(string)
}

To do it we need to transform the bash array to string format which will look like this:

"[\"a\","\b\",\"c\"]"

To do it use the bash below

# define array
BASH_ARRAY=( "a" "b" "c" )

# convert array to serialized list
ARRAY_WITH_QUOTES=()
for ENTRY in "${BASH_ARRAY[@]}";
do
  ARRAY_WITH_QUOTES+=( "\"${ENTRY}\"" )
done
TERRAFORM_LIST=$(IFS=,; echo [${ARRAY_WITH_QUOTES[*]}])

# pass it to the teraform
terraform plan -var example=$TERRAFORM_LIST

Done, this way you can use bash array as terraform list of string variable

Ruby WeakRef and running garbage collection in tests

Recently, I was working with Puppet gem internals. I’ve found that each manifest run, which idea is to manage some properties of system, creates transaction object. Transaction object holds infomation about desried system state. By extending Puppet provided codebase using plain Ruby, I’ve noticed an issue. There is no API for fetching and accessing one of the transaction objects which state I wanted to read.

I needed to access its values. One of quick ways was to use ObjectSpace to find the object I wanted without touching original codebase. Flow was dependent on this object and I wanted to test it using RSpec.

So I made 2 test cases where one flow would go with transaction object existing in object space and second without it. In real usage case object are garbage collected after puppet manifest is applied. But in RSpec framework, objects created inside it block are not directly garbage collected.

The Ruby WeekRef an GC standard libraries came with help. In another paragraph, you can find a recreated problem with example code and way to use WeakRef to solve polluted ObjectSpace.

Example of using garbage collection in Ruby tests

In file called without_weakref_spec.rb we create 2 classes. One called Modifier which we left empty with default methods. Second class Communicator will behave differently depending on the presence of Modifier in Ruby object space.

At the Communicator initialization the modifier_object method is fired, and Modifier object is searched inside memory via ObjectSpace using find method on enumerator ObjectSpace.each_object. The modifier?(obj) is used to check if the object class is equals to Modifier, it is important to notice that the not all objects might implement .class method so rescue returning false is added for this kind of objects. Lastly writing about this class we have call method which returns the ‘Modifier object was found’ in case of Modifier presence in object space.

At the end of the file we have 2 tests which check if the Communicator class behaves properly depending on object in memory.

require 'rspec'

class Modifier; end

class Communicator
  def initialize
    @modifier_object = modifier_object
  end

  def call
    return 'Modifier object was found' if @modifier_object
    'No modifier object found'
  end

  private

  def modifier_object
    ObjectSpace.each_object.find { |obj| modifier?(obj) }
  end

  def modifier?(obj)
    obj.class == Modifier
  rescue
    false
  end
end

RSpec.describe Communicator do
  it "modifier object in object space" do
    Modifier.new
    expect(Communicator.new.call).to eq('Modifier object was found')
  end

  it "modifier object not found" do
    expect(Communicator.new.call).to eq('No modifier object found')
  end
end

If we run test we can see.

$ rspec without_weakref_spec.rb

Failures:

  1) Communicator modifier object not found
     Failure/Error: expect(Communicator.new.call).to eq('No modifier object found')

       expected: "No modifier object found"
            got: "Modifier object was found"

       (compared using ==)
     # ./without_weakref_spec.rb:35:in `block (2 levels) in '

Finished in 0.03979 seconds (files took 0.10658 seconds to load)
2 examples, 1 failure

Failed examples:

rspec ./without_weakref_spec.rb:34 # Communicator modifier object not found

It is because at the end of the Modifier object in test is not garbage collected.

Solution

Let’s copy code to new file without_weakref_spec.rb

We can solve object in memory issue by adding WeakRef from Ruby standard library at the
top of the file.

require 'weakref'

Then GC begging of test case which will run garbage collector in test run

GC.start

At the end in the test spec where we create Modifier object wrap it in WeakRef
reference allowing object to be easly garbage collected

WeakRef.new(Modifier.new)

So whole with_weakref_spec.rb file with test suite will look like this

require 'rspec'
require 'weakref'

class Modifier; end

class Communicator
  def initialize
    @modifier_object = modifier_object
  end

  def call
    return 'Modifier object was found' if @modifier_object
    'No modifier object found'
  end

  private

  def modifier_object
    ObjectSpace.each_object.find { |obj| modifier?(obj) }
  end

  def modifier?(obj)
    obj.class == Modifier
  rescue
    false
  end
end

RSpec.describe Communicator do
  it "modifier object in object space" do
    GC.start
    WeakRef.new(Modifier.new)
    expect(Communicator.new.call).to eq('Modifier object was found')
  end

  it "modifier object not found" do
    GC.start
    expect(Communicator.new.call).to eq('No modifier object found')
  end
end

And running test suite will give us

$ rspec with_weakref_spec.rb
..

Finished in 0.02668 seconds (files took 0.11199 seconds to load)
2 examples, 0 failures