Introduction to Logstash: Transform Log Files and Unlock the Power of ELK Stack π
What's the first thing you imagine when you hear the word logs. I bet you pictured a huge amount of text that's hard to read and it just gives you a headache every time you try to tell the lines apart. And if you're looking for an error or something, it feels like you're looking for a needle in a haystack.
So, how can you transform these logs from clunky text which is hard to read into documents in your Elasticsearch index? The answer lies in the letter L in the ELK stack.
We've talked about the E, we've talked about the K, now it's time to talk about L: Logstash.
Logstash is a powerful open-source data processing pipeline tool that collects data, transforms it into a common format, and sends it to a destination for storage or further analysis.
And as I previously pointed out in the previous posts: the great advantage of using this stack together is the seamless integration that allows you to save the effort of integration.
You can look at Logstash as an ETL (Extract, Transform, Load) tool if that would simplify things for you.
So, let's see Logstash in action and run it!
Running Logstash ⚡
If you followed the previous posts, you would see that I depend on Docker Compose to run most of my services. Because it really saves time and allows me to run all your needed services using a single command. If you need to understand more, click here.
Now, let's prepare our docker-compose.yml file that will allow us to run our ELK stack.
version: '3.7'
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:8.13.0
environment:
- node.name=elasticsearch
- cluster.name=es-docker-cluster
- discovery.type=single-node
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
- "xpack.security.enabled=false"
ulimits:
memlock:
soft: -1
hard: -1
volumes:
- esdata1:/usr/share/elasticsearch/data
ports:
- 9200:9200
kibana:
image: docker.elastic.co/kibana/kibana:8.13.0
ports:
- 5601:5601
environment:
ELASTICSEARCH_URL: http://elasticsearch:9200
depends_on:
- elasticsearch
logstash:
image: docker.elastic.co/logstash/logstash:8.13.0
ports:
- 5044:5044
volumes:
- ./logstash.conf:/usr/share/logstash/pipeline/logstash.conf
- ./logs:/logs
command: logstash -f /usr/share/logstash/pipeline/logstash.conf
depends_on:
- elasticsearch
volumes:
esdata1:
driver: local
As you can see it's pretty simple once you understand the following:
- ./logs:logs: Adding a volume to your container allows your container to read from the directory you specify. So, in our case I added my logfile.log in a folder named logs in the same directory of my docker-compose.yml.
- ./logstash.conf:/usr/share/logstash/pipeline/logstash.conf: Here you are telling Logstash to read its configuration file also from the same sirectory as the docker-compose-file.
Configuring Logstash π§
We'll prepare a very simple configuration file which will look like this
input {file {path => "/logs/logfile.log"start_position => "beginning"}}filter {grok {match => { "message" => "%{COMBINEDAPACHELOG}" }}date {match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]target => "@timestamp"}}output {elasticsearch {hosts => ["elasticsearch:9200"]index => "logstash_index"}}
- input: Specify the input of Logstash to be the log file in the log folder we created in our YAML file. You could also add other files and wildcard log file names such as *.log.
- filter: Basically, we're telling Logstash to extract the COMBINESAPACHELOG format (which is a format of logging) and we're parsing the timestamp of the log into the timestamp of the document.
- output: Finally, we're telling Logstash to store the created document in an index named logstash_index in Elasticsearch.
Extract, Transform & Load π
127.0.0.1 - safar [10/Oct/2023:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 "https://www.ahmed-safar.blogspot.com/" "Mozilla/4.08 [en] (Win98; I ;Nav)"
Excellent! Seems like Logstash processed my entries, transformed them, stored them in Elasticsearch and now we are able to view it in Kibana in almost real-time speed. Which is the beauty of the seamless integration between the three services. But let's click Discover Index to see if our logs were parsed correctly.
As you can see the 4 documents were parsed successfully. And it's very important to note the vital information that Logstash was able to provide Elasticsearch with. Such as:
- log.file.path: The source of this logged document
- @timestamp: Overriding the timestamp that Elasticsearch automatically creates for any created document by the timestamp in the log. Which is something that we did by design and can be disabled if you wish to not override the timestamp of the document creation.
- user.name: The name of the user that exists in the log
Kibana will also allow you to visualize these logs to salvage important statistics from it. Like when do exceptions usually occur, which user causes the most exceptions, how often does a certain browser or entry point is responsible for warning or errors...etc.
So, let's try to visualize some of the fields in the log documents as statistics. And we'll start by clicking on Visualize in the picture above to see statistics related to the user.name field.
Visualize Logs ππ
Before you go, don't you think that in terms of separation of concerns, it doesn't make sense that Logstash is responsible for both looking for the logs and transforming them? It seems like Elastic agrees. This is why they added another tool to the ELK stack: Beats.
Comments
Post a Comment