Introduction to Telegraf

Original link: https://jasonkayzk.github.io/2023/06/27/Telegraf%E7%AE%80%E4%BB%8B/

Telegraf is an agent program written in Go that collects system and service statistics and writes them to the InfluxDB database;

This article briefly introduces the use of Telegraf;

Introduction to Telegraf

Telegraf is an agent program written in Go that collects system and service statistics and writes them to the InfluxDB database;
Official website:

Telegraf, part of the TICK Stack, is a plugin-driven server agent for collecting and reporting metrics.

Telegraf integrates to extract various indicators, events and logs directly from its running containers and systems, extract indicators from third-party APIs, and even listen to indicators through StatsD and Kafka consumer services.

It also has output plugins to send metrics to various other data stores, services and message queues, including InfluxDB, Graphite, OpenTSDB, Datadog, Librato, Kafka, MQTT, NSQ and many more.

As a data collection module, Telegraf needs to be installed on the monitored target host. The design goal of Telegraf is small memory usage, and the metrics collection of various services and third-party components is built through plug-ins;

Telegraf is powered by 4 separate plugins:

  • Input Plugins: Input plugins to collect data from systems, services, and third-party components;
  • Processor Plugins: processing plugins, converting, processing, and filtering data;
  • Aggregator Plugins: Aggregation plug-ins, data feature aggregation;
  • Output Plugins: output plugins, write metrics data;

Why use telegraf and influxdb?

  • In the data collection and platform monitoring system, Telegraf can collect the operation information of various components, without the need for handwritten scripts to collect regularly, reducing the difficulty of data acquisition;
  • Telegraf is easy to configure, as long as you have a basic Linux foundation, you can get started quickly;
  • Telegraf collects data in time series, and the data structure contains time series information. Influxdb is designed for this type of data. Using Influxdb, you can complete various analysis and calculation operations on the collected data;

Install

Use apt to install:

 apt install telegraf

After the installation is complete, telegraf creates a background service, so it can be managed with systemctl:

Start command:

 systemctl start telegraf

Restart command:

 systemctl restart telegraf

Telegraf configuration

Telegraf provides a lot of configuration, the configuration file is in: vim /etc/telegraf/telegraf.conf ;

Configuration instructions:

After yum is installed, a telegraf.conf file will be generated under /etc/telegraf, and environment variables can be used in the form of “$ENV_ITEM” in the configuration file;

Here are some main configuration items:

global_tags

The content recorded here will be saved as Tags in each Item of InfluxDB;

agent

This part is the behavior definition of the data collection service;

  • interval: the default data collection interval for all inputs;
  • round_interval: round the collection interval to interval For example, if interval = 10s, always collect at :00, :10, :20, etc.;
  • metric_batch_size: Telegraf sends metrics to batch outputs for most metric_batch_size metrics;
  • metric_buffer_limit: Telegraf will cache the indicators for each output of metric_buffer_limit, and flush this buffer when it is successfully written; this should be a multiple, and metric_batch_size cannot be less than 2 times metric_batch_size;
  • collection_jitter: Collection jitter is used to randomly jitter collections. Each plugin will sleep for a random amount of time within the jitter before collecting. This can be used to avoid many plugins querying things like sysfs at the same time, which would have a measurable impact on the system;
  • flush_interval: The default data flush interval for all outputs. The maximum flush_interval is flush_interval+flush_jitter
  • flush_jitter: Jitter the flush interval by a random amount. This is primarily to avoid heavy write spikes for users running a large number of Telegraf instances. For example, flush_jitter 5s and flush_interval 10s means that a flush will occur every 10-15 seconds;
  • precision: By default, precision will be set to the same timestamp order as the collection interval, with a maximum value of 1s. Precision will not be used for service input, such as logparser and statsd Valid values ​​are ns, us (or s) ms, and s ;
  • logfile: Specify the log file name empty string to log to stderr;
  • debug: run Telegraf in debug mode;
  • quiet: run Telegraf in quiet mode (error messages only);
  • hostname: override the default hostname, if it is empty use os.Hostname();
  • omit_hostname: if true, no host flag is set in the Telegraf proxy;
    inputs
    Input dependent, the following configuration parameters are available for all inputs:
  • interval: How often to collect this metric. Normal plugins use a single global interval, but if a particular input should run less or more often, it can be configured here;
  • name_override: The base name of the override metric. (Defaults to the name entered).
  • name_prefix: Specifies a prefix to append to the metric name.
  • name_suffix: Specifies a suffix to append to the metric name.
  • tags : A map of tags to apply to specific input measurements.
    aggregator
    The following configuration parameters are available for all aggregators:
  • period: The period of time to refresh and clear each aggregator. The aggregator will ignore all metrics sent with a timestamp outside this time period.
  • delay: Delay before flushing each aggregator. This is to control how long the aggregator waits before receiving metrics from the input plugin, if the aggregator is flushing and the input is collected in the same interval.
  • drop_original: If true, the aggregator will drop the original metric and not send it to the output plugin.
  • name_override: The base name of the override metric. (Defaults to the name entered).
  • name_prefix: Specifies a prefix to append to the metric name.
  • name_suffix: Specifies a suffix to append to the metric name.
  • tags : A map of tags to apply to specific input measurements.
    processor
    The following configuration parameters are available for all processors:
  • order: This is the order in which the processors are executed. If not specified, processor execution order will be random.
    measurement filtering
    Filters can be configured based on input, output, processor or aggregator;
  • namepass: An array of glob pattern strings. Only emit points whose measurement names match the patterns in this list.
  • fieldpass: An array of glob pattern strings. Only emit fields whose field keys match the patterns in this list. Not available for output.
  • fielddrop: inverse fieldpass. Fields with field keys matching one of the patterns will be discarded from that point. Not available for output.
  • tagpass: A table mapping tag keys to arrays of glob pattern strings. Only emit points in the table that contain a tag key and a tag value that matches one of its patterns.
  • tagdrop: Inverse tagpass. If a match is found, the point is discarded. This is tested at the point after passing the tagpass test.
  • taginclude: An array of glob pattern strings. Only emit tags with a tag key matching one of the patterns. In contrast to tagpass, which will pass the entire point based on its tags, taginclude removes all non-matching tags from the point. This filter can be used on both input and output, but is recommended on input as it is more efficient to filter out tags at the point of ingestion.
  • tagexclude: reciprocal taginclude. Tags with a tag key matching one of the patterns will be discarded from that point on.
    outputs
    output related;

For more parameters see:

generate configuration file

View help:

 telegraf --help

Generate configuration file:

 # 比如在当前目录下生成mysql相关的配置文件telegraf config > telegraf-mysql.conf

It is recommended that the generated configuration be placed in the /etc/telegraf/telegraf.d directory;

telegraf supports reading multiple configuration files, and multiple configuration files can be placed in the /etc/telegraf/telegraf.d directory;

Generate configuration files specifying input and output plugins:

 telegraf --input-filter <pluginname>[:<pluginname>] --output-filter <outputname>[:<outputname>] config > telegraf.conf

For example, generate a configuration file telegraf.conf with cpu, memroy, disk, diskio, net and influxdb plugins, and specify output to influxdb and opentsdb:

 telegraf --input-filter cpu:mem:disk:diskio:net --output-filter influxdb:opentsdb config > telegraf.conf

You can also use the default configuration file:

 telegraf --input-filter cpu:mem:http_listener --output-filter influxdb config

Test whether the configuration was successful:

Test whether the cpu configuration entered in the /etc/telegraf/telegraf.conf configuration file is correct:

 telegraf -config /etc/telegraf/telegraf.conf -input-filter cpu -test

Test /etc/telegraf/telegraf.conf output influxdb configuration is correct:

 telegraf -config /etc/telegraf/telegraf.conf -output-filter influxdb -test

Test /etc/telegraf/telegraf.d/mysql.conf input cpu and output influxdb configuration is correct:

 telegraf -config /etc/telegraf/telegraf.d/mysql.conf -input-filter cpu -output-filter influxdb -test

After the configuration file is saved and modified, remember to restart telegraf:

 service telegraf restart

view log

Telegraf log directory: /var/log/telegraf/telegraf.log ;

Instructions for use

Telegraf can be used in combination with storage such as InfluxDB and Kafka;

When using it, you only need to write the corresponding configuration file and use different plug-ins to implement it;

For example, enter the plugin:

  • input.exec: The Exec input plugin parses supported Telegraf input data formats (line protocol, JSON, Graphite, Value, Nagios, Collectd, and Dropwizard) into metrics. Each Telegraf metric includes metric name, tag, field and timestamp;
  • inputs.zookeeper: collect zk information;
  • inputs.cpu: collect CPU information;

Output plugin:

  • outputs.kafka: write the result to Kafka’s Broker;
  • outputs.elasticsearch: write the result to ES;
  • outputs.file: write the results to a file;
    By defining the corresponding plug-in logic, the collection of indicators can be completed;

Use case 1: CPU information collection

Write the configuration file, create /etc/telegraf/telegraf.d/cpu.conf :

 [global_tags] ip = "127.0.0.1"[agent] interval = "5s" round_interval = true metric_batch_size = 1000 metric_buffer_limit = 10000 collection_jitter = "0s" flush_interval = "5s" flush_jitter = "0s" precision = "1ms" logtarget = "file" logfile = "/tmp/telegraf-cpu.log" logfile_rotation_max_size = "10MB" logfile_rotation_max_archives = 10 hostname = "" omit_hostname = false[[inputs.cpu]] ## Whether to report per-cpu stats or not percpu = true ## Whether to report total system cpu stats or not totalcpu = true ## If true, collect raw CPU time metrics. collect_cpu_time = false ## If true, compute and report the sum of all non-idle CPU states. report_active = false[[outputs.file]] ## Files to write to, "stdout" is a specially handled file. files = ["stdout", "/tmp/metrics.out"] ## Use batch serialization format instead of line based delimiting. The ## batch format allows for the production of non line based output formats and ## may more efficiently encode and write metrics. # use_batch_format = false ## The file will be rotated after the time interval specified. When set ## to 0 no time based rotation is performed. # rotation_interval = "0h" ## The logfile will be rotated when it becomes larger than the specified ## size. When set to 0 no size based rotation is performed. # rotation_max_size = "0MB" ## Maximum number of rotated archives to keep, any older logs are deleted. ## If set to -1, no archives are removed. # rotation_max_archives = 5 ## Data format to output. ## Each data format has its own unique set of configuration options, read ## more about them here: ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_OUTPUT.md data_format = "json"

For specific input and output plug-in parameters, please refer to:

Verify configuration file

You can use telegraf -config xxx.config -test to verify:

 telegraf -config cpu.conf -test

The output is as follows:

 2023-05-22T06:30:06Z I! Starting Telegraf 1.21.4+ds1-0ubuntu2> cpu,cpu=cpu0,host=ubuntu,ip=127.0.0.1 usage_guest=0,usage_guest_nice=0,usage_idle=100,usage_iowait=0,usage_irq=0,usage_nice=0,usage_softirq=0,usage_steal=0,usage_system=0,usage_user=0 1684737007400000000> cpu,cpu=cpu1,host=ubuntu,ip=127.0.0.1 usage_guest=0,usage_guest_nice=0,usage_idle=100,usage_iowait=0,usage_irq=0,usage_nice=0,usage_softirq=0,usage_steal=0,usage_system=0,usage_user=0 1684737007400000000> cpu,cpu=cpu2,host=ubuntu,ip=127.0.0.1 usage_guest=0,usage_guest_nice=0,usage_idle=100,usage_iowait=0,usage_irq=0,usage_nice=0,usage_softirq=0,usage_steal=0,usage_system=0,usage_user=0 1684737007400000000> cpu,cpu=cpu3,host=ubuntu,ip=127.0.0.1 usage_guest=0,usage_guest_nice=0,usage_idle=100,usage_iowait=0,usage_irq=0,usage_nice=0,usage_softirq=0,usage_steal=0,usage_system=0,usage_user=0 1684737007400000000> cpu,cpu=cpu4,host=ubuntu,ip=127.0.0.1 usage_guest=0,usage_guest_nice=0,usage_idle=100,usage_iowait=0,usage_irq=0,usage_nice=0,usage_softirq=0,usage_steal=0,usage_system=0,usage_user=0 1684737007400000000> cpu,cpu=cpu5,host=ubuntu,ip=127.0.0.1 usage_guest=0,usage_guest_nice=0,usage_idle=100,usage_iowait=0,usage_irq=0,usage_nice=0,usage_softirq=0,usage_steal=0,usage_system=0,usage_user=0 1684737007400000000> cpu,cpu=cpu6,host=ubuntu,ip=127.0.0.1 usage_guest=0,usage_guest_nice=0,usage_idle=100,usage_iowait=0,usage_irq=0,usage_nice=0,usage_softirq=0,usage_steal=0,usage_system=0,usage_user=0 1684737007400000000> cpu,cpu=cpu7,host=ubuntu,ip=127.0.0.1 usage_guest=0,usage_guest_nice=0,usage_idle=100,usage_iowait=0,usage_irq=0,usage_nice=0,usage_softirq=0,usage_steal=0,usage_system=0,usage_user=0 1684737007400000000> cpu,cpu=cpu8,host=ubuntu,ip=127.0.0.1 usage_guest=0,usage_guest_nice=0,usage_idle=100,usage_iowait=0,usage_irq=0,usage_nice=0,usage_softirq=0,usage_steal=0,usage_system=0,usage_user=0 1684737007400000000

After checking that there is no problem, restart the telegraf service:

 systemctl restart telegraf

View the output:

 cat /tmp/metrics.out

The output is as follows:

 {"fields":{"active":234635264,"available":68976783360,"available_percent":98.43237993042763,"buffered":20967424,"cached":475803648,"commit_limit":39332610048,"committed_as":390959104,"dirty":0,"free":69279043584,"high_free":0,"high_total":0,"huge_page_size":2097152,"huge_pages_free":0,"huge_pages_total":0,"inactive":306810880,"low_free":0,"low_total":0,"mapped":129556480,"page_tables":1888256,"shared":737280,"slab":101105664,"sreclaimable":48656384,"sunreclaim":52449280,"swap_cached":0,"swap_free":4294963200,"swap_total":4294963200,"total":70075297792,"used":299483136,"used_percent":0.42737333330917343,"vmalloc_chunk":0,"vmalloc_total":35184372087808,"vmalloc_used":20594688,"write_back":0,"write_back_tmp":0},"name":"mem","tags":{"host":"ubuntu","ip":"127.0.0.1"},"timestamp":1684736700}{"fields":{"boot_time":1684724406,"context_switches":1591707,"entropy_avail":256,"interrupts":1575945,"processes_forked":1567},"name":"kernel","tags":{"host":"ubuntu","ip":"127.0.0.1"},"timestamp":1684736700}{"fields":{"free":4294963200,"total":4294963200,"used":0,"used_percent":0},"name":"swap","tags":{"host":"ubuntu","ip":"127.0.0.1"},"timestamp":1684736700}{"fields":{"in":0,"out":0},"name":"swap","tags":{"host":"ubuntu","ip":"127.0.0.1"},"timestamp":1684736700}{"fields":{"load1":0,"load15":0,"load5":0,"n_cpus":16,"n_users":1},"name":"system","tags":{"host":"ubuntu","ip":"127.0.0.1"},"timestamp":1684736700}{"fields":{"uptime":12294},"name":"system","tags":{"host":"ubuntu","ip":"127.0.0.1"},"timestamp":1684736700}......

appendix

reference article

This article is reproduced from: https://jasonkayzk.github.io/2023/06/27/Telegraf%E7%AE%80%E4%BB%8B/
This site is only for collection, and the copyright belongs to the original author.