Kafka with ELK implementation

Aug 17, 2020

Apache Kafka is the numerous common buffer solution deployed together with the ELK Stack. Kafka is deployed within the logs delivery and the indexing units, acting as a segregation unit for the data being collected:

In this blog, we’ll see how to deploy all the components required to set up a resilient logs pipeline with Apache Kafka and ELK Stack:

  • Filebeat – collects logs and forwards them to a Kafka topic.
  • Kafka – brokers the data flow and queues it.
  • Logstash – aggregates the data from the Kafka topic, processes it and ships to Elasticsearch.
  • Elasticsearch – indexes the data.
  • Kibana – for analyzing the data.


 

My environment:

To perform the steps below, I set up a single Ubuntu 18.04 VM machine on AWS EC2 using local storage. In real-life scenarios, you will probably have all these components running on separate machines.

I started the instance in the public subnet of a VPC and then set up a security group to enable access from anywhere using SSH and TCP 5601 (for Kibana).

Using Apache Access Logs for the pipeline, you can use VPC Flow Logs, ALB Access logs etc.

We will start by installing the main component in the stack — Elasticsearch.

Login to your Ubuntu system using sudo privileges. For the remote Ubuntu server using ssh to access it. Windows users can use putty or Powershell to log in to Ubuntu system.

Elasticsearch requires Java to run on any system. Make sure your system has Java installed by running the following command. This command will show you the current Java version.

sudo apt install openjdk-11-jdk-headless

Check the installation is successful or not by the below command

~$ java — versionopenjdk 11.0.3 2019–04–16OpenJDK Runtime Environment (build 11.0.3+7-Ubuntu-1ubuntu218.04.1)OpenJDK 64-Bit Server VM (build 11.0.3+7-Ubuntu-1ubuntu218.04.1, mixed mode, sharing)

Finally, I added a new elastic IP address and associated it with the running instance.
The example logs used for the tutorial are Apache access logs.

 

Step 1: Installing Elasticsearch

We will start by installing the main component in the stack — Elasticsearch. Since version 7.x, Elasticsearch is bundled with Java so we can jump right ahead with adding Elastic’s signing key:

Download and install the public signing key:

wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -

Now you may need to install the apt-transport-https package on Debian before proceeding:

sudo apt-get install apt-transport-https
echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-7.x.list

Our next step is to add the repository definition to our system:

echo “deb https://artifacts.elastic.co/packages/7.x/apt stable main” | sudo tee -a /etc/apt/sources.list.d/elastic-7.x.list

You can install the Elasticsearch Debian package with:

sudo apt-get update && sudo apt-get install elasticsearch

Before we bootstrap Elasticsearch, we need to apply some basic configurations using the Elasticsearch configuration file at: /etc/elasticsearch/elasticsearch.yml:

sudo su
nano /etc/elasticsearch/elasticsearch.yml

Since we are installing Elasticsearch on AWS, we will bind Elasticsearch to the localhost.

Also, we need to define the private IP of our EC2 instance as a master-eligible node:

network.host: "localhost"
http.port:9200
cluster.initial_master_nodes: ["<InstancePrivateIP"]

Save the file and run Elasticsearch with:

sudo service elasticsearch start

To confirm that everything is working as expected, point curl to: http://localhost:9200, and you should see something like the following output (give Elasticsearch a minute or two before you start to worry about not seeing any response):

{
  "name" : "elasticsearch",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "W_Ky1DL3QL2vgu3sdafyag",
  "version" : {
    "number" : "7.2.0",
    "build_flavor" : "default",
    "build_type" : "deb",
    "build_hash" : "508c38a",
    "build_date" : "2019-06-20T15:54:18.811730Z",
    "build_snapshot" : false,
    "lucene_version" : "8.0.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

 

Step 2: Installing Logstash

Next up, the “L” in ELK — Logstash. Logstash and installing it is easy. Just type the following command.

sudo apt-get install logstash -y

Next, we will configure a Logstash pipeline that pulls our logs from a Kafka topic, processes these logs and ships them on to Elasticsearch for indexing.

Verify Java is installed:

java -version
openjdk version "1.8.0_191"
OpenJDK Runtime Environment (build 1.8.0_191-8u191-b12-2ubuntu0.16.04.1-b12)
OpenJDK 64-Bit Server VM (build 25.191-b12, mixed mode)

Let’s create a new config file:

Since we already defined the repository in the system, all we have to do to install Logstash is run:

sudo nano /etc/logstash/conf.d/apache.conf

Next, we will configure a Logstash pipeline that pulls our logs from a Kafka topic, processes these logs, and ships them on to Elasticsearch for indexing.

Let’s create a new config file:

input {
  kafka {
    bootstrap_servers => "localhost:9092"
    topics => "apache"
    }
}

filter {
    grok {
      match => { "message" => "%{COMBINEDAPACHELOG}" }
    }
    date {
    match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
    }
  geoip {
      source => "clientip"
    }
}

output {
  elasticsearch {
    hosts => ["localhost:9200"]
  }
}

As you can see — we’re using the Logstash Kafka input plugin to define the Kafka host and the topic we want Logstash to pull from. We’re applying some filtering to the logs and we’re shipping the data to our local Elasticsearch instance.

 

Step 3: Installing Kibana

Let’s move on to the next component in the ELK Stack — Kibana. As before, we will use a simple apt command to install Kibana:

sudo apt-get install kibana

We will then open up the Kibana configuration file at: /etc/kibana/kibana.yml, and make sure we have the correct configurations defined:

server.port: 5601
server.host: "<INSTANCE_PRIVATE_IP>"
elasticsearch.hosts: ["http://<INSTANCE_PRIVATE_IP>:9200"]

Then enable and start the Kibana service:

sudo systemctl enable kibana
sudo systemctl start kibana

We would need to install Firebeat. Use:

sudo apt install filebeat

 

Open up Kibana in your browser with http://<PUBLIC_IP>:5601. You will be presented with the Kibana home page.

Parshwa Kapadia

About the Author

Parshwa Kapadia

DBA and SQL Developer (SSIS, SSRS, Azure, Power BI)

5+ years of involvement in Database Development. Worked with numerous companies on projects including database optimization, performance tuning, automation, and SQL programming.