Cost-efficient logging solution in Azure using Event hub and Azure Data Explorer

Storing and managing logs is one of the important activities in software companies. Because using logs you can find root causes of multiple things like bugs, downtime, cyber attack, etc. Different cloud providers have their own managed logging and monitoring systems like AWS have cloud-watch, cloud-trail, etc and Azure has application insight. But suppose you have hundreds of microservices inside the Kubernetes cluster and you want easy to use, easy to set up logging system which will work exactly the same for all micro-services. You can use fluentd but in this article, I will show you how to use Event hub and Azure Data Explorer to collect and access logs in Azure.

As you can see in the feature image, your application which can be deployed in whether app service, Azure Kubernetes Service, or Virtual Machine will push logs to the Event hub and then Azure data explorer will do storing and querying job for you. In this article, I will run my simple flask application locally and it will push logs to the event hub which we access using azure data explorer.

Plan of action,

Create infrastructure using terraform
Writing a small flask web app
checking logs in ADX

You can find all code used in this article at https://github.com/lets-learn-it/terraform-learning/tree/azure/06-eh-adx-logging

Creating Infrastructure

I am using terraform version Terraform v1.0.11 Add Azure provider and make sure to use the version of azurem > "2.88.1" because we are using $Default in azurerm_kusto_eventhub_data_connection

terraform {
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "2.88.1"
    }
  }
}

provider "azurerm" {
  features {}
}

Now create one resource group in which all our resources will reside. Like below

resource "azurerm_resource_group" "logs_rg" {
  name     = var.resource_group
  location = "East US"
}

Now create an event hub namespace and event hub in that namespace. We will use the default consumer group for this example.

resource "azurerm_eventhub_namespace" "eh_namespace" {
  name                = var.eh_namespace
  location            = azurerm_resource_group.logs_rg.location
  resource_group_name = azurerm_resource_group.logs_rg.name
  sku                 = "Standard"
  capacity            = 1
  zone_redundant      = true

  tags = var.tags
}

resource "azurerm_eventhub" "eh" {
  name                = var.eh_name
  namespace_name      = azurerm_eventhub_namespace.eh_namespace.name
  resource_group_name = var.resource_group
  partition_count     = 1
  message_retention   = 1
}

data "azurerm_eventhub_consumer_group" "default" {
  name                = "$Default"
  namespace_name      = azurerm_eventhub_namespace.eh_namespace.name
  eventhub_name       = azurerm_eventhub.eh.name
  resource_group_name = var.resource_group
}

We need an Azure data explorer cluster and database in it to store all logs.

resource "azurerm_kusto_cluster" "adx" {
  name                = var.adx_cluster
  location            = azurerm_resource_group.logs_rg.location
  resource_group_name = azurerm_resource_group.logs_rg.name
  engine              = "V3"
  double_encryption_enabled  = var.double_encryption

  sku {
    name     = var.adx_sku_name
    capacity = var.adx_sku_capacity
  }

  tags = var.tags
}

resource "azurerm_kusto_database" "database" {
  name                = var.adx_database
  resource_group_name = var.resource_group
  location            = azurerm_resource_group.logs_rg.location
  cluster_name        = azurerm_kusto_cluster.adx.name
  hot_cache_period    = var.hot_cache_period
  soft_delete_period  = var.soft_delete_period
}

Create variable find and add all variables we are using till now.

variable "resource_group" {
    description = "where we place event hub and azure data explorer"
    type = string
}

variable "adx_cluster" {
    description = "name of adx cluster"
    type = string
}

variable "adx_database" {
    type = string
    description = "name of adx dataset"
}

variable "eh_namespace" {
    type = string
    description = "name of event hub namespace"
}

variable "eh_name" {
    type = string
    description = "name of event hub"
}

variable "double_encryption" {
    type = bool
}

variable "hot_cache_period" {
    type = string
    description = "data will be cached for this no of days"
}

variable "soft_delete_period" {
    type = string
    description = "after these no of days data will be deleted"
}

variable "adx_sku_name" {
    type = string
    description = "type of adx cluster"
}

variable "adx_sku_capacity" {
    type = string
}

variable "tags" {
    type = map(string)
}

variable "adx_eh_connection_name" {
    type = string
}

variable "adx_db_table_name" {
    type = string
}

variable "ingestion_mapping_rule_name" {
    type = string
}

variable "eh_message_format" {
    type = string
    default = "JSON"
}

create terraform.tfvars file for providing values to all variables used,

resource_group = "eh-adx-logs"

adx_cluster = "logscluster"
adx_database = "logsdb"
double_encryption = true
hot_cache_period = "P31D"
soft_delete_period = "P365D"
adx_sku_name = "Standard_D11_v2"
adx_sku_capacity = 2

eh_namespace = "logseventhubns"
eh_name = "logs_eventhub"

adx_eh_connection_name = "adxehconn"
adx_db_table_name = "logs_table"
ingestion_mapping_rule_name = "logs_table_json_ingestion_mapping"

tags = {
    "environment": "prod"
}

To create these resources run a plan and apply. You will see, terraform is creating 5 resources. (Azure data cluster took 15 min for me 😒)

terraform plan

# then run
terraform apply

Currently, terraform does not support creating tables in the database and mapping for tables so we will create it manually.

To open query editor, go to Azure data explorer cluster in Azure dashboard, and on the left side, you can find databases. In databases, you can find our database, logsdb. double click on it then select Query. In that run following table creation query.

.create table logs_table ( 
	level:string, 
    message:string, 
    loggerName:string, 
    exception:string, 
    applicationName:string, 
    processName:string, 
    processID:string, 
    threadName:string, 
    threadID:string, 
    timestamp:datetime
)

To create ingestion mapping, run the following query

.create table logs_table ingestion json mapping 'logs_table_json_ingestion_mapping' 
'[{"column":"level","Properties":{"path":"$.level"}},{"column":"message","Properties":{"path":"$.message"}},{"column":"loggerName","Properties":{"path":"$.loggerName"}},{"column":"exception","Properties":{"path":"$.exception"}},{"column":"applicationName","Properties":{"path":"$.applicationName"}},{"column":"processName","Properties":{"path":"$.processName"}},{"column":"processID","Properties":{"path":"$.processID"}},{"column":"threadName","Properties":{"path":"$.threadName"}},{"column":"threadID","Properties":{"path":"$.threadID"}},{"column":"timestamp","Properties":{"path":"$.timestamp"}}]'

After creating the table and mapping, we need to create a connection between the event hub and the azure data explorer. While creating infrastructure using terraform, we didn't create it because it need a table and mapping. Now add the following resources in Terraform and run the plan and apply again.

resource "azurerm_kusto_eventhub_data_connection" "eventhub_connection" {
  name                = var.adx_eh_connection_name
  resource_group_name = var.resource_group
  location            = azurerm_resource_group.logs_rg.location
  cluster_name        = azurerm_kusto_cluster.adx.name
  database_name       = azurerm_kusto_database.database.name

  eventhub_id    = azurerm_eventhub.eh.id
  consumer_group = data.azurerm_eventhub_consumer_group.default.name

  table_name        = var.adx_db_table_name
  mapping_rule_name = var.ingestion_mapping_rule_name
  data_format       = var.eh_message_format
}

Flask Application

To push logs to the event hub, we need EventhubHandler for python logging library. Install it using pip as follow

pip install EventhubHandler

Now, import required packages and create a logger. Make sure to use JSONFormatter it because our mapping is expecting JSON from the event hub. I am creating a root level logger so that all logs in all modules will go to the event hub.

from flask import Flask
import logging
from EventhubHandler.handler import EventHubHandler
from EventhubHandler.formatter import JSONFormatter
app = Flask(__name__)

logger = logging.getLogger()

eh = EventHubHandler()
eh.setLevel(logging.DEBUG)

# format will be depends on what you choose at adx
# I am using JSON
formatter = JSONFormatter({"level": "levelname", 
                            "message": "message", 
                            "loggerName": "name", 
                            "processName": "processName",
                            "processID": "process", 
                            "threadName": "threadName", 
                            "threadID": "thread",
                            "timestamp": "asctime",
                            "exception": "exc_info",
                            "applicationName": ""})
eh.setFormatter(formatter)
logger.addHandler(eh)

Write some endpoints so that we can test our flask application. I created /exception endpoint to check how exceptions are getting logged.

@app.route("/")
def hello_world():
    logger.info("inside hello world")
    return "<p>Hello, World!</p>"

@app.get("/exception")
def exception():
    try:
        x = 1 / 0
    except ZeroDivisionError as e:
        logger.exception('ZeroDivisionError: {0}'.format(e))
    return "Exception Occured"

if __name__ == '__main__':
    app.run(host="0.0.0.0")

Save all flask code in one file & name it main.py. And make sure to set 3 environment variables as follow (for Linux based system),

export applicationName="my-log-app"
export eh_ns_connection_string=<event hub namespace connection string>
export eventhub_name="logs_eventhub"

Run application using the following command,

python main.py

Checking logs

To check logs, we need to run query in that same query editor. We set applicationName=my-log-app. Now using this we will get last 20 min logs

logs_table
| where applicationName contains "my-log-app"
| where timestamp > ago(20m)

When to use & when not

If you have 100s of microservices then only. My team is using this and we have 40 microservices. The surprising thing is we have never gone above 10% Azure data cluster utilization.
If you have fewer services then it will be too costly per service. check pricing https://azure.microsoft.com/en-in/pricing/details/data-explorer/#pricing

Creating Infrastructure

Flask Application

Checking logs

When to use & when not

You might also like...