Original link: https://jasonkayzk.github.io/2023/06/27/Telegraf%E7%AE%80%E4%BB%8B/
Telegraf is an agent program written in Go that collects system and service statistics and writes them to the InfluxDB database;
This article briefly introduces the use of Telegraf;
Introduction to Telegraf
Telegraf is an agent program written in Go that collects system and service statistics and writes them to the InfluxDB database;
Official website:
Telegraf, part of the TICK Stack, is a plugin-driven server agent for collecting and reporting metrics.
Telegraf integrates to extract various indicators, events and logs directly from its running containers and systems, extract indicators from third-party APIs, and even listen to indicators through StatsD and Kafka consumer services.
It also has output plugins to send metrics to various other data stores, services and message queues, including InfluxDB, Graphite, OpenTSDB, Datadog, Librato, Kafka, MQTT, NSQ and many more.
As a data collection module, Telegraf needs to be installed on the monitored target host. The design goal of Telegraf is small memory usage, and the metrics collection of various services and third-party components is built through plug-ins;
Telegraf is powered by 4 separate plugins:
- Input Plugins: Input plugins to collect data from systems, services, and third-party components;
- Processor Plugins: processing plugins, converting, processing, and filtering data;
- Aggregator Plugins: Aggregation plug-ins, data feature aggregation;
- Output Plugins: output plugins, write metrics data;
Why use telegraf and influxdb?
- In the data collection and platform monitoring system, Telegraf can collect the operation information of various components, without the need for handwritten scripts to collect regularly, reducing the difficulty of data acquisition;
- Telegraf is easy to configure, as long as you have a basic Linux foundation, you can get started quickly;
- Telegraf collects data in time series, and the data structure contains time series information. Influxdb is designed for this type of data. Using Influxdb, you can complete various analysis and calculation operations on the collected data;
Install
Use apt to install:
apt install telegraf
After the installation is complete, telegraf creates a background service, so it can be managed with systemctl:
Start command:
systemctl start telegraf
Restart command:
systemctl restart telegraf
Telegraf configuration
Telegraf provides a lot of configuration, the configuration file is in: vim /etc/telegraf/telegraf.conf
;
Configuration instructions:
After yum is installed, a telegraf.conf file will be generated under /etc/telegraf, and environment variables can be used in the form of “$ENV_ITEM”
in the configuration file;
Here are some main configuration items:
global_tags
The content recorded here will be saved as Tags in each Item of InfluxDB;
agent
This part is the behavior definition of the data collection service;
- interval: the default data collection interval for all inputs;
- round_interval: round the collection interval to interval For example, if interval = 10s, always collect at :00, :10, :20, etc.;
- metric_batch_size: Telegraf sends metrics to batch outputs for most metric_batch_size metrics;
- metric_buffer_limit: Telegraf will cache the indicators for each output of metric_buffer_limit, and flush this buffer when it is successfully written; this should be a multiple, and metric_batch_size cannot be less than 2 times metric_batch_size;
- collection_jitter: Collection jitter is used to randomly jitter collections. Each plugin will sleep for a random amount of time within the jitter before collecting. This can be used to avoid many plugins querying things like sysfs at the same time, which would have a measurable impact on the system;
- flush_interval: The default data flush interval for all outputs. The maximum flush_interval is flush_interval+flush_jitter
- flush_jitter: Jitter the flush interval by a random amount. This is primarily to avoid heavy write spikes for users running a large number of Telegraf instances. For example, flush_jitter 5s and flush_interval 10s means that a flush will occur every 10-15 seconds;
- precision: By default, precision will be set to the same timestamp order as the collection interval, with a maximum value of 1s. Precision will not be used for service input, such as logparser and statsd Valid values are ns, us (or s) ms, and s ;
- logfile: Specify the log file name empty string to log to stderr;
- debug: run Telegraf in debug mode;
- quiet: run Telegraf in quiet mode (error messages only);
- hostname: override the default hostname, if it is empty use os.Hostname();
- omit_hostname: if true, no host flag is set in the Telegraf proxy;
inputs
Input dependent, the following configuration parameters are available for all inputs: - interval: How often to collect this metric. Normal plugins use a single global interval, but if a particular input should run less or more often, it can be configured here;
- name_override: The base name of the override metric. (Defaults to the name entered).
- name_prefix: Specifies a prefix to append to the metric name.
- name_suffix: Specifies a suffix to append to the metric name.
- tags : A map of tags to apply to specific input measurements.
aggregator
The following configuration parameters are available for all aggregators: - period: The period of time to refresh and clear each aggregator. The aggregator will ignore all metrics sent with a timestamp outside this time period.
- delay: Delay before flushing each aggregator. This is to control how long the aggregator waits before receiving metrics from the input plugin, if the aggregator is flushing and the input is collected in the same interval.
- drop_original: If true, the aggregator will drop the original metric and not send it to the output plugin.
- name_override: The base name of the override metric. (Defaults to the name entered).
- name_prefix: Specifies a prefix to append to the metric name.
- name_suffix: Specifies a suffix to append to the metric name.
- tags : A map of tags to apply to specific input measurements.
processor
The following configuration parameters are available for all processors: - order: This is the order in which the processors are executed. If not specified, processor execution order will be random.
measurement filtering
Filters can be configured based on input, output, processor or aggregator; - namepass: An array of glob pattern strings. Only emit points whose measurement names match the patterns in this list.
- fieldpass: An array of glob pattern strings. Only emit fields whose field keys match the patterns in this list. Not available for output.
- fielddrop: inverse fieldpass. Fields with field keys matching one of the patterns will be discarded from that point. Not available for output.
- tagpass: A table mapping tag keys to arrays of glob pattern strings. Only emit points in the table that contain a tag key and a tag value that matches one of its patterns.
- tagdrop: Inverse tagpass. If a match is found, the point is discarded. This is tested at the point after passing the tagpass test.
- taginclude: An array of glob pattern strings. Only emit tags with a tag key matching one of the patterns. In contrast to tagpass, which will pass the entire point based on its tags, taginclude removes all non-matching tags from the point. This filter can be used on both input and output, but is recommended on input as it is more efficient to filter out tags at the point of ingestion.
- tagexclude: reciprocal taginclude. Tags with a tag key matching one of the patterns will be discarded from that point on.
outputs
output related;
For more parameters see:
generate configuration file
View help:
telegraf --help
Generate configuration file:
# 比如在当前目录下生成mysql相关的配置文件telegraf config > telegraf-mysql.conf
It is recommended that the generated configuration be placed in the /etc/telegraf/telegraf.d directory;
telegraf supports reading multiple configuration files, and multiple configuration files can be placed in the /etc/telegraf/telegraf.d
directory;
Generate configuration files specifying input and output plugins:
telegraf --input-filter <pluginname>[:<pluginname>] --output-filter <outputname>[:<outputname>] config > telegraf.conf
For example, generate a configuration file telegraf.conf with cpu, memroy, disk, diskio, net and influxdb plugins, and specify output to influxdb and opentsdb:
telegraf --input-filter cpu:mem:disk:diskio:net --output-filter influxdb:opentsdb config > telegraf.conf
You can also use the default configuration file:
telegraf --input-filter cpu:mem:http_listener --output-filter influxdb config
Test whether the configuration was successful:
Test whether the cpu configuration entered in the /etc/telegraf/telegraf.conf
configuration file is correct:
telegraf -config /etc/telegraf/telegraf.conf -input-filter cpu -test
Test /etc/telegraf/telegraf.conf
output influxdb configuration is correct:
telegraf -config /etc/telegraf/telegraf.conf -output-filter influxdb -test
Test /etc/telegraf/telegraf.d/mysql.conf
input cpu and output influxdb configuration is correct:
telegraf -config /etc/telegraf/telegraf.d/mysql.conf -input-filter cpu -output-filter influxdb -test
After the configuration file is saved and modified, remember to restart telegraf:
service telegraf restart
view log
Telegraf log directory: /var/log/telegraf/telegraf.log
;
Instructions for use
Telegraf can be used in combination with storage such as InfluxDB and Kafka;
When using it, you only need to write the corresponding configuration file and use different plug-ins to implement it;
For example, enter the plugin:
- input.exec: The Exec input plugin parses supported Telegraf input data formats (line protocol, JSON, Graphite, Value, Nagios, Collectd, and Dropwizard) into metrics. Each Telegraf metric includes metric name, tag, field and timestamp;
- inputs.zookeeper: collect zk information;
- inputs.cpu: collect CPU information;
- …
Output plugin:
- outputs.kafka: write the result to Kafka’s Broker;
- outputs.elasticsearch: write the result to ES;
- outputs.file: write the results to a file;
By defining the corresponding plug-in logic, the collection of indicators can be completed;
Use case 1: CPU information collection
Write the configuration file, create /etc/telegraf/telegraf.d/cpu.conf
:
[global_tags] ip = "127.0.0.1"[agent] interval = "5s" round_interval = true metric_batch_size = 1000 metric_buffer_limit = 10000 collection_jitter = "0s" flush_interval = "5s" flush_jitter = "0s" precision = "1ms" logtarget = "file" logfile = "/tmp/telegraf-cpu.log" logfile_rotation_max_size = "10MB" logfile_rotation_max_archives = 10 hostname = "" omit_hostname = false[[inputs.cpu]] ## Whether to report per-cpu stats or not percpu = true ## Whether to report total system cpu stats or not totalcpu = true ## If true, collect raw CPU time metrics. collect_cpu_time = false ## If true, compute and report the sum of all non-idle CPU states. report_active = false[[outputs.file]] ## Files to write to, "stdout" is a specially handled file. files = ["stdout", "/tmp/metrics.out"] ## Use batch serialization format instead of line based delimiting. The ## batch format allows for the production of non line based output formats and ## may more efficiently encode and write metrics. # use_batch_format = false ## The file will be rotated after the time interval specified. When set ## to 0 no time based rotation is performed. # rotation_interval = "0h" ## The logfile will be rotated when it becomes larger than the specified ## size. When set to 0 no size based rotation is performed. # rotation_max_size = "0MB" ## Maximum number of rotated archives to keep, any older logs are deleted. ## If set to -1, no archives are removed. # rotation_max_archives = 5 ## Data format to output. ## Each data format has its own unique set of configuration options, read ## more about them here: ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_OUTPUT.md data_format = "json"
For specific input and output plug-in parameters, please refer to:
- https://github.com/influxdata/telegraf/blob/release-1.16/plugins/inputs/cpu/README.md
- https://github.com/influxdata/telegraf/lob/release-1.16/plugins/outputs/file/README.md
The main function is to collect CPU information every 5 seconds and output it to a file in JSON format!
Verify configuration file
You can use telegraf -config xxx.config -test
to verify:
telegraf -config cpu.conf -test
The output is as follows:
2023-05-22T06:30:06Z I! Starting Telegraf 1.21.4+ds1-0ubuntu2> cpu,cpu=cpu0,host=ubuntu,ip=127.0.0.1 usage_guest=0,usage_guest_nice=0,usage_idle=100,usage_iowait=0,usage_irq=0,usage_nice=0,usage_softirq=0,usage_steal=0,usage_system=0,usage_user=0 1684737007400000000> cpu,cpu=cpu1,host=ubuntu,ip=127.0.0.1 usage_guest=0,usage_guest_nice=0,usage_idle=100,usage_iowait=0,usage_irq=0,usage_nice=0,usage_softirq=0,usage_steal=0,usage_system=0,usage_user=0 1684737007400000000> cpu,cpu=cpu2,host=ubuntu,ip=127.0.0.1 usage_guest=0,usage_guest_nice=0,usage_idle=100,usage_iowait=0,usage_irq=0,usage_nice=0,usage_softirq=0,usage_steal=0,usage_system=0,usage_user=0 1684737007400000000> cpu,cpu=cpu3,host=ubuntu,ip=127.0.0.1 usage_guest=0,usage_guest_nice=0,usage_idle=100,usage_iowait=0,usage_irq=0,usage_nice=0,usage_softirq=0,usage_steal=0,usage_system=0,usage_user=0 1684737007400000000> cpu,cpu=cpu4,host=ubuntu,ip=127.0.0.1 usage_guest=0,usage_guest_nice=0,usage_idle=100,usage_iowait=0,usage_irq=0,usage_nice=0,usage_softirq=0,usage_steal=0,usage_system=0,usage_user=0 1684737007400000000> cpu,cpu=cpu5,host=ubuntu,ip=127.0.0.1 usage_guest=0,usage_guest_nice=0,usage_idle=100,usage_iowait=0,usage_irq=0,usage_nice=0,usage_softirq=0,usage_steal=0,usage_system=0,usage_user=0 1684737007400000000> cpu,cpu=cpu6,host=ubuntu,ip=127.0.0.1 usage_guest=0,usage_guest_nice=0,usage_idle=100,usage_iowait=0,usage_irq=0,usage_nice=0,usage_softirq=0,usage_steal=0,usage_system=0,usage_user=0 1684737007400000000> cpu,cpu=cpu7,host=ubuntu,ip=127.0.0.1 usage_guest=0,usage_guest_nice=0,usage_idle=100,usage_iowait=0,usage_irq=0,usage_nice=0,usage_softirq=0,usage_steal=0,usage_system=0,usage_user=0 1684737007400000000> cpu,cpu=cpu8,host=ubuntu,ip=127.0.0.1 usage_guest=0,usage_guest_nice=0,usage_idle=100,usage_iowait=0,usage_irq=0,usage_nice=0,usage_softirq=0,usage_steal=0,usage_system=0,usage_user=0 1684737007400000000
After checking that there is no problem, restart the telegraf service:
systemctl restart telegraf
View the output:
cat /tmp/metrics.out
The output is as follows:
{"fields":{"active":234635264,"available":68976783360,"available_percent":98.43237993042763,"buffered":20967424,"cached":475803648,"commit_limit":39332610048,"committed_as":390959104,"dirty":0,"free":69279043584,"high_free":0,"high_total":0,"huge_page_size":2097152,"huge_pages_free":0,"huge_pages_total":0,"inactive":306810880,"low_free":0,"low_total":0,"mapped":129556480,"page_tables":1888256,"shared":737280,"slab":101105664,"sreclaimable":48656384,"sunreclaim":52449280,"swap_cached":0,"swap_free":4294963200,"swap_total":4294963200,"total":70075297792,"used":299483136,"used_percent":0.42737333330917343,"vmalloc_chunk":0,"vmalloc_total":35184372087808,"vmalloc_used":20594688,"write_back":0,"write_back_tmp":0},"name":"mem","tags":{"host":"ubuntu","ip":"127.0.0.1"},"timestamp":1684736700}{"fields":{"boot_time":1684724406,"context_switches":1591707,"entropy_avail":256,"interrupts":1575945,"processes_forked":1567},"name":"kernel","tags":{"host":"ubuntu","ip":"127.0.0.1"},"timestamp":1684736700}{"fields":{"free":4294963200,"total":4294963200,"used":0,"used_percent":0},"name":"swap","tags":{"host":"ubuntu","ip":"127.0.0.1"},"timestamp":1684736700}{"fields":{"in":0,"out":0},"name":"swap","tags":{"host":"ubuntu","ip":"127.0.0.1"},"timestamp":1684736700}{"fields":{"load1":0,"load15":0,"load5":0,"n_cpus":16,"n_users":1},"name":"system","tags":{"host":"ubuntu","ip":"127.0.0.1"},"timestamp":1684736700}{"fields":{"uptime":12294},"name":"system","tags":{"host":"ubuntu","ip":"127.0.0.1"},"timestamp":1684736700}......
appendix
reference article
- https://www.cnblogs.com/imyalost/p/9873621.html
- https://ephrain.net/telemetry-%E4%BD%BF%E7%94%A8-telegraf-%E5%92%8C-influxdb%EF%BC%8C%E8%A8%98%E9%8C% 84%E7%B3%BB%E7%B5%B1%E8%B3%87%E6%BA%90%E4%BD%BF%E7%94%A8%E9%87%8F/
- https://www.cnblogs.com/duanxz/p/10432512.html
- https://blog.fleeto.us/post/telegraf-monitor/
This article is reproduced from: https://jasonkayzk.github.io/2023/06/27/Telegraf%E7%AE%80%E4%BB%8B/
This site is only for collection, and the copyright belongs to the original author.