Container log monitoring on Microk8s with Loki, Grafana and Promtail

Hi πŸ‘‹

This is a short tutorial describing how to monitor your Kubernetes cluster container logs using Loki stack. But why? Because it is easier to view, filter your logs in Grafana and to store them persistently in Loki rather than viewing them in a terminal.

Let’s get started! Assuming you already have Microk8s installed, enable the following addons:

You can enable an add-on by running microk8s enable. Ex: microk8s enable dns

addons:
  enabled:
    dns                  # CoreDNS
    ha-cluster           # Configure high availability on the current node
    metrics-server    # K8s Metrics Server for API access to service metrics
    storage               # Storage class; allocates storage from host directory

Note: Microk8s comes with a bundled kubectl and helm3. Just run microk8s kubectl or microk8s helm3. If you want to use your host kubectl you can configure it via: microk8s config > ~/.kube/config.

Warning: Be extra careful when running the microk8s config > ~/.kube/config command because it will overwrite the old config file.

Then proceed by installing Loki. Loki will store all the logs using object storage. This is efficient but the trade-off is that you can’t do complex aggregations and searches against your data. We are going to install Loki for exploration purposes but if you’re looking for a production ready version, check out the loki distributed helm chart.

Run the following helm commands to install Loki. You may want to install helm or use microk8s helm3 command.

helm repo add grafana https://grafana.github.io/helm-charts

helm install loki grafana/loki -

You should get the following pods and services by running kubectl get pods and kubectl get services:

NAME                        READY   STATUS        RESTARTS   AGE
loki-0                      1/1     Running       0          9m8s

NAME            TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
kubernetes      ClusterIP   10.152.183.1     <none>        443/TCP    54m
loki-headless   ClusterIP   None             <none>        3100/TCP   9m23s
loki            ClusterIP   10.152.183.187   <none>        3100/TCP   9m23s

Now, we can safely install Promtail. Promtail will import all the container logs into Loki and it should work auto-magically by auto-discovering all the pods that are running inside your cluster.

To let Promtail know about our existing Loki’s address, we can give it the service URL: http://loki-headless.default.svc.cluster.local:3100/loki/api/v1/push.

helm install promtail grafana/promtail --set config.lokiAddress=http://loki-headless.default.svc.cluster.local:3100/loki/api/v1/push

Finally, we need to visualize the logs using Grafana. Install it by running the helm command and then, edit the service to change its type from ClusterIP to NodePort.

Changing the service type to NodePort will allow you to visit Grafana in your browser without the need of adding an ingester.

❗❗To use vscode as the default editor export the following environment variable: KUBE_EDITOR=code -w

helm install grafana grafana/grafana

 kubectl edit service/grafana
# Change metadata.spec.type to NodePort
# Grab the service's port using kubectl get services and look for 32204:
# grafana                         NodePort    10.152.183.84    <none>        80:32204/TCP   6d

Note: If you’re on Windows to access the service you will need to run kubectl cluster-info and use the IP address of the cluster. On Linux you should be able to access http://localhost:32204.

kubectl cluster-info
Kubernetes control plane is running at https://172.20.138.170:16443

To access Grafana visit: http://172.20.138.170:32204 where 32204 is the service’s NodePort.

Grab your Grafana admin password by following the instructions from the helm notes. The notes are displayed after Grafana has been installed. If you don’t have base64 on your OS check out CyberChef, it can decode base64 text.


After you’ve successfully logged in, head to Settings -> DataSources and add the Loki data source.

Head back to the Explore menu and display Loki’s logs using the Loki data source in Grafana. You can click log browser to view all available values for the app label.

Promtail should now import logs into Loki and create labels dynamically for each newly created container. If you followed along, congratulations!

Thanks for reading and happy hacking! πŸ”§

Improving the throughput of a Producer βœˆ

Hello πŸ‘‹,

In this article I will give you some tips on how to improve the throughput of a message producer.

I had to write a Golang based application which would consume messages from Apache Kafka and send them into a sink using HTTP JSON / HTTP Protocol Buffers.

To see if my idea works, I started using a naΓ―ve approach in which I polled Kafka for messages and then send each message into the sink, one at a time. This worked, but it was slow.

To better understand the system, a colleague has setup Grafana and a dashboard for monitoring, using Prometheus metrics provided by the Sink. This allowed us to test various versions of the producer and observe it’s behavior.

Let’s explore what we can do to improve the throughput.

Request batching πŸ“ͺ

A very important improvement is request batching.

Instead of sending one message at a time in a single HTTP request, try to send more, if the sink allows it.

As you can see in the image, this simple idea improved the throughput from 100msg/sec to ~4000msg/sec.

Batching is tricky, if your batches are large the receiver might be overwhelmed, or the producer might have a tough time building them. If your batches contain a few items you might not see an improvement. Try to choose a batch number which isn’t too high and not to low either.

Fast JSON libraries ⏩

If you’re using HTTP and JSON then it’s a good idea to replace the standard JSON library.

There are lots of open-source JSON libraries that provide much higher performance compared to standard JSON libraries that are built in the language.

See:

The improvements will be visible.

Partitioning πŸ–‡

There are several partitioning strategies that you can implement. It depends on your tech stack.

Kafka allows you to assign one consumer to one partition, if you have 3 partitions in a topic then you can run 3 consumer instances in parallel from that topic, in the same consumer group, this is called replication, I did not use this as the Sink does not allow it, only one instance of the Producer is running at a time.

If you have multiple topics that you want to consume from, you can partition on the topic name or topic name pattern by subscribing to multiple topics at once using regex. You can have 3 consumers consuming from highspeed.* and 3 consumer consuming from other.*. If each topic has 3 partitions.

Note: The standard build of librdkafka doesn’t support negative lookahead regex expressions, if that’s what you need you will need to build the library from source. See issues/2769. It’s easy to do and the confluent-kafka-go client supports custom builds of librdkafka.

Negative lookahead expressions allow you to ignore some patterns, see this example for a better understanding: regex101.com/r/jZ9AEz/1

Source Parallelization πŸ•Š

The confluent-kafka-go client allows you to poll Kafka for messages. Since polling is thread safe, it can be done in multiple goroutines or threads for a performance improvement.

import (
	"fmt"
	"github.com/confluentinc/confluent-kafka-go/kafka"
)

func main() {
    var wg sync.WaitGroup
	c, err := kafka.NewConsumer(&kafka.ConfigMap{
		"bootstrap.servers": "localhost",
		"group.id":          "myGroup",
		"auto.offset.reset": "earliest",
	})

	if err != nil {
		panic(err)
	}
   
	c.SubscribeTopics([]string{"myTopic", "^aRegex.*[Tt]opic"}, nil)
    for i := 0; i < 5; i++ {
        go func() {
            wg.Add(1)
            defer wg.Done()
	        for {
	    	    msg, err := c.ReadMessage(-1)
		        if err == nil {
		    	    fmt.Printf("Message on %s: %s\n", msg.TopicPartition, string(msg.Value))
                    // TODO: Send data through a channel to be processed by another goroutine.
    		    } else {
			        // The client will automatically try to recover from all errors.
			        fmt.Printf("Consumer error: %v (%v)\n", err, msg)
		        }
	        }
        }()
    }
    wg.Wait()

	c.Close()
}

Buffered Golang channels can also be used in this scenario in order to improve the performance.

Protocol Buffers πŸ”·

Finally, I saw a huge performance improvement when replacing the JSON body of the request with Protocol Buffers encoded and snappy compressed data.

If your Sink supports receiving protocol buffers, then it is a good idea to try sending it instead of JSON.

Honorable Mention: GZIP Compressed JSON πŸ“š

The Sink supported receiving GZIP compressed JSON, but in my case I didn’t see any notable performance improvements.

I’ve compared the RAM and CPU usage of the Producer, the number of bytes sent over the network and the message throughput. While there were some improvements in some areas, I decided not to implement GZIP compression.

It’s all about trade-offs and needs.

Conclusion

As you could see, there are several things you can do to your producers in order to make them more efficient.

  • Request Batching
  • Fast JSON Libraries
  • Partitioning
  • Source Parallelization & Buffering
  • Protocol Buffers
  • Compression

I hope you’ve enjoyed this article and learned something! If you have some ideas, please let me know in the comments.

Thanks for reading! πŸ˜€