Introduction to Beats: Collect Data from Anywhere & Level Up your Elastic Stack 📡

 

 Logstash did a very impressive job transforming our logs into documents that allowed us to understand and visualize how our applications are behaving. But, because of that Logstash could require quite the memory and CPU to run. So, it's not the most efficient move to make Logstash collect the data from various sources.

Elastic offered a solution to this concern and introduced Beats. Beats is a lightweight shipper for forwarding and centralizing log data. It's installed as an agent on your servers to capture all sorts of operational data like logs or network packet data. Beats is great for gathering data and works efficiently with a large number of files. It can also handle back pressure (when Logstash is busy) and ensures that no data is lost during such periods.

And to be clear Logstash can do most of what Beats. So, why use Beats instead?

1. Lightweight Data Shipping: Beats is designed to be lightweight and requires fewer resources than Logstash. This makes it ideal for forwarding logs from a machine with limited resources.

2. Backpressure-Sensitive Protocol: Beats communicates with Logstash using a backpressure-sensitive protocol, which ensures that Beats doesn't overload Logstash by sending too much data at once. If Logstash is busy, Beats slows down its read rate. Logstash doesn't have this capability on its own.

3. At-Least-Once Delivery: Beats keeps track of the read offset in the files and ensures the at-least-once delivery of events. If Logstash goes down, Beats will remember where it left off when Logstash comes back online.

4. File Rotation and Wildcards: Beats can handle log rotation and wildcards in file paths, which makes it easier to collect logs from many different files.

5. Multiline Events: Beats can handle multiline events (like stack traces) on the client side before shipping them to Logstash.

6. Distributed Architecture: Beats can be installed on every application server, which allows it to fetch logs locally and then send them to Logstash or Elasticsearch. This distributed architecture can be more scalable and resilient than having Logstash fetch logs from all your servers.

So, after pointing out why Beats could be a more efficient way to gather data let's see how to use it in this simple example.

Configuring Beats ⚙️

In order to run Beats, you have to prepare a YAML file (filebeat.yml) for its configuration. Which acts like the configuration file we prepared here for Logstash.

filebeat.inputs:

- type: log

  enabled: true

  paths:

    - /logs/*.log

output.logstash:

  hosts: ["logstash:5044"]

It's very simple. You basically specify the inputs and their paths. And in this case, I will be reading from any file in this path that has the .log extension. I also specified the output to be Logstash and only because they are running in the same cluster/node I can reference the Logstash service by name. Otherwise, I need to specify the hostname where my Logstash service is deployed.

Perfect! Now let's tell Logstash that it should be expecting logs from Beats instead of looking for the logs in specific paths. And I will do that by editing the logstash.conf file and changing the input to beats (which will run on port 5044) instead of file.

input {
  beats {
    port => 5044
  }
}

filter {
  grok {
    match => { "message" => "%{COMBINEDAPACHELOG}" }
  }
  date {
    match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
    target => "@timestamp"
  }
}

output {
  elasticsearch {
    hosts => ["elasticsearch:9200"]
    index => "logstash_index"
  }
}
Excellent! Now that we've prepared our configuration files let's go ahead and start our Elastic services.

Running Beats 🏃

As in all of my previous Elastic tools examples: Elasticsearch, Kibana and Logstash, I will be using a docker-compose.yml file to run my ELK stack and my Beats container too.

But I will not pull it directly from Docker Hub. Rather, I will build my own image because when I tried the former, I had read/write permission problems. Which is something common when running Docker on Windows.

So, I will prepare a dockerfile first for my Beats image then after it builds, I will include it in my docker-compose.yml.

Please note that you might not need to do this, and you should try first to include the image in you YAML file and if your Beats container doesn't run and keeps exiting then you can try making a dockerfile such as the one I will show you now.

# Use the official Filebeat image from the Elastic Docker registry

FROM docker.elastic.co/beats/filebeat:8.13.0

# Copy the Filebeat configuration file from the local directory into the container

COPY filebeat.yml /usr/share/filebeat/filebeat.yml

# Change the ownership of the configuration file to root:filebeat

USER root

RUN chown root:filebeat /usr/share/filebeat/filebeat.yml

# Switch back to the filebeat user

USER filebeat

It's just a simple dockerfile that pulls the image and takes the filebeat.yml which should be on the same directory as this dockerfile if you wish to use the same dockerfile as me.

Now, run docker build -t my-filebeat . in the directory of the docker file and wait until the image is built successfully.

The next step is to run the docker-compose.yml file but first let's add the Beats image we built (my-filebeat) and remove the logs volume from the previous post's Logstash section and add it to the Beats section. And it should be like this.

version: '3.7'
services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.13.0
    environment:
      - node.name=elasticsearch
      - cluster.name=es-docker-cluster
      - discovery.type=single-node
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
      - "xpack.security.enabled=false"
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - esdata1:/usr/share/elasticsearch/data
    ports:
      - 9200:9200

  kibana:
    image: docker.elastic.co/kibana/kibana:8.13.0
    ports:
      - 5601:5601
    environment:
      ELASTICSEARCH_URL: http://elasticsearch:9200
    depends_on:
      - elasticsearch
      
  logstash:
      image: docker.elastic.co/logstash/logstash:8.13.0
      ports:
        - 5044:5044
      volumes:
        - ./logstash.conf:/usr/share/logstash/pipeline/logstash.conf
      command: logstash -f /usr/share/logstash/pipeline/logstash.conf
      depends_on:
        - elasticsearch  
        
  filebeat:
    image: my-filebeat
    volumes:
      - ./logs:/logs
    depends_on:
      - logstash

volumes:
  esdata1:
    driver: local
Perfect! Now let's run docker compose up in the directory of this docker-compose.yml file and see what happens.

Collect Everything 🌪️ 

I have added two log files in the log folder which exists in the same directory as my docker-compose.yml file. one is called logfile.log and the other is called logfile-2.log. Notice how the two log files have .log extension so I'm expecting that Beats collects logs from both files.

Before adding logs to the log files, I will open Kibana to check the number of docs in the logstash_index.



 Our index is empty as expected. Let's now add logs to the 2 log files and see what happens.


Okay, now Beats should be monitoring these 2 files and should ship the added logs to Logstash and then Logstash should do its magic.


And as expected, the 8 logs from the 2 different files were added to our logs index in Elasticsearch. But what if need to tell the logs apart depending on their source?

Visualize Logs 📊

Let's click on Discover Index to see the documents that were created.


The 8 documents are here, and each one represents a different log in our log files. If you click on the field log.file.path you will be able to see the source of each log file.

So, let's click on Visualize to see how many came from logfile-2.log and how many from logfile.log.


Just like in my previous post, we are able to visualize the statistics based on the fields that Logstash provided. And everything also happened in near real-time. So, that wraps it up!

In conclusion, the Elastic stack is ever-growing, and it provides much more features and details than I have shown you in this or the previous posts.

I encourage you to build on whatever knowledge you've gained from any of my posts and dig deeper into the different capabilities that the Elastic stack offers to achieve anything that your system needs or might need in the future. 

Comments

Popular posts

Why I Hate Microservices Part 1: The Russian Dolls Problem 🪆🪆🪆

Why I Hate Microservices Part 3: The Identity Crisis 😵

Why I Hate Microservices Part 2: The Who's Telling the Truth Problem 🤷