Introduction to Logstash: Transform Log Files and Unlock the Power of ELK Stack 📚

What's the first thing you imagine when you hear the word logs. I bet you pictured a huge amount of text that's hard to read and it just gives you a headache every time you try to tell the lines apart. And if you're looking for an error or something, it feels like you're looking for a needle in a haystack.

So, how can you transform these logs from clunky text which is hard to read into documents in your Elasticsearch index? The answer lies in the letter L in the ELK stack.

We've talked about the E, we've talked about the K, now it's time to talk about L: Logstash.

Logstash is a powerful open-source data processing pipeline tool that collects data, transforms it into a common format, and sends it to a destination for storage or further analysis.

And as I previously pointed out in the previous posts: the great advantage of using this stack together is the seamless integration that allows you to save the effort of integration.

You can look at Logstash as an ETL (Extract, Transform, Load) tool if that would simplify things for you.

So, let's see Logstash in action and run it!

Running Logstash ⚡

If you followed the previous posts, you would see that I depend on Docker Compose to run most of my services. Because it really saves time and allows me to run all your needed services using a single command. If you need to understand more, click here.

Now, let's prepare our docker-compose.yml file that will allow us to run our ELK stack.

version: '3.7'

services:

elasticsearch:

image: docker.elastic.co/elasticsearch/elasticsearch:8.13.0

environment:

- node.name=elasticsearch

- cluster.name=es-docker-cluster

- discovery.type=single-node

- bootstrap.memory_lock=true

- "ES_JAVA_OPTS=-Xms512m -Xmx512m"

- "xpack.security.enabled=false"

ulimits:

memlock:

soft: -1

hard: -1

volumes:

- esdata1:/usr/share/elasticsearch/data

ports:

- 9200:9200

kibana:

image: docker.elastic.co/kibana/kibana:8.13.0

ports:

- 5601:5601

environment:

ELASTICSEARCH_URL: http://elasticsearch:9200

depends_on:

- elasticsearch

logstash:

image: docker.elastic.co/logstash/logstash:8.13.0

ports:

- 5044:5044

volumes:

- ./logstash.conf:/usr/share/logstash/pipeline/logstash.conf

- ./logs:/logs

command: logstash -f /usr/share/logstash/pipeline/logstash.conf

depends_on:

- elasticsearch

volumes:

esdata1:

driver: local

As you can see it's pretty simple once you understand the following:

./logs:logs: Adding a volume to your container allows your container to read from the directory you specify. So, in our case I added my logfile.log in a folder named logs in the same directory of my docker-compose.yml.
./logstash.conf:/usr/share/logstash/pipeline/logstash.conf: Here you are telling Logstash to read its configuration file also from the same sirectory as the docker-compose-file.

Now before we run docker compose up we need to add a configuration file.

Configuring Logstash 🔧

Logstash configuration is basically the instructions of how it's going to handle the logs. Like for example from where to expect its input logs, how to parse your logs, should new logs be expected in each new line or for example each new JSON object...etc.

We'll prepare a very simple configuration file which will look like this

input {
  file {
    path => "/logs/logfile.log"
    start_position => "beginning"
  }
}

filter {
  grok {
    match => { "message" => "%{COMBINEDAPACHELOG}" }
  }
  date {
    match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
    target => "@timestamp"
  }
}

output {
  elasticsearch {
    hosts => ["elasticsearch:9200"]
    index => "logstash_index"
  }
}

Here we're specifying the following:

input: Specify the input of Logstash to be the log file in the log folder we created in our YAML file. You could also add other files and wildcard log file names such as *.log.
filter: Basically, we're telling Logstash to extract the COMBINESAPACHELOG format (which is a format of logging) and we're parsing the timestamp of the log into the timestamp of the document.
output: Finally, we're telling Logstash to store the created document in an index named logstash_index in Elasticsearch.

Great! Now, we're good to go. Let's fire up the docker compose up command.

Make sure that you created a logs folder with a logfile.log file inside of it in the directory of the YAML file.

Extract, Transform & Load 🔀

Once your containers are up and running let's create an index named logstash_index in Kibana.

As you can see we can view our new index from the index managemnt page in Kibana and the docs count is zero because our log file is empty. Let's add some logs to our logfile.log.

I will open the log file and add some random logs but in the following format:

127.0.0.1 - safar [10/Oct/2023:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 "https://www.ahmed-safar.blogspot.com/" "Mozilla/4.08 [en] (Win98; I ;Nav)"

Which is called the COMBINESAPACHELOG format. I will add four lines which, according to my configuration, will be considered as 4 documents of logs.

I added my logs which I'm doing manually just for testing. But this should be the log file of some application, and the application will be the one populating it.

After adding my logs I'm expecting to see my logs index populated by 4 documents.

Excellent! Seems like Logstash processed my entries, transformed them, stored them in Elasticsearch and now we are able to view it in Kibana in almost real-time speed. Which is the beauty of the seamless integration between the three services. But let's click Discover Index to see if our logs were parsed correctly.

As you can see the 4 documents were parsed successfully. And it's very important to note the vital information that Logstash was able to provide Elasticsearch with. Such as:

log.file.path: The source of this logged document
@timestamp: Overriding the timestamp that Elasticsearch automatically creates for any created document by the timestamp in the log. Which is something that we did by design and can be disabled if you wish to not override the timestamp of the document creation.
user.name: The name of the user that exists in the log

And many other useful fields you can see in the left side of the index.

Kibana will also allow you to visualize these logs to salvage important statistics from it. Like when do exceptions usually occur, which user causes the most exceptions, how often does a certain browser or entry point is responsible for warning or errors...etc.

So, let's try to visualize some of the fields in the log documents as statistics. And we'll start by clicking on Visualize in the picture above to see statistics related to the user.name field.

Visualize Logs 📑📊

Once you click on visualize you will be redirected to the Visualize section of Kibana. Which is where you can apply different types of graphs to have a better view of your data. So, let's try a donut graph to see which user caused the most logs.

Okay, it's obvious that the user Kratos has caused quite a mess. How about we try another field like the timestamp for example to see which time do these logs seem to be provided the most.

The statistics tell us that each log happened in a different timestamp. But of course these timestamps are very precise (down to the milisecond). So, it's not the best example. However, I'm just showing you different fields in different views and graphs.

Perfect! I think now you know enough about Logstash to kickstart your ELK journey and discover much more than I have uncovered.

Before you go, don't you think that in terms of separation of concerns, it doesn't make sense that Logstash is responsible for both looking for the logs and transforming them? It seems like Elastic agrees. This is why they added another tool to the ELK stack: Beats.

Elastic decided that it's better to leave the logs collection for another tool which is Beats and just let Logstash be responsible for the transformation and distribution.

Looks like our stack is getting bigger than just ELK. So, let's start calling it the Elastic stack instead. And let's take a look at Beats in a later post.

Search This Blog

HARDCODE