【监控与可观测性】03-ELK日志体系搭建:从采集到告警的完整闭环

📅 2026/7/4 3:48:51 👁️ 阅读次数 📝 编程学习
【监控与可观测性】03-ELK日志体系搭建:从采集到告警的完整闭环

ELK 日志体系搭建:从采集到告警的完整闭环

专栏:监控 & 可观测性
难度:进阶
标签:ELKElasticsearchLogstashKibanaFilebeat日志


前言

分散在各服务器上的日志,在出问题时根本来不及一台台查。ELK 日志平台把所有日志集中管理,是运维的眼睛。


一、架构选型

应用服务器 ↓ Filebeat(轻量采集,推荐) Kafka(消息队列,流量削峰) ↓ Logstash(清洗、转换、过滤) Elasticsearch(存储、索引) ↓ Kibana(查询、可视化) ↓ Watcher 告警规则 钉钉/企微通知

为什么加 Kafka?日志量高峰期 Logstash 处理不过来,Kafka 做缓冲,防止数据丢失。


二、Docker Compose 快速搭建

# docker-compose.ymlversion:'3.8'services:elasticsearch:image:elasticsearch:8.8.0environment:-discovery.type=single-node-ES_JAVA_OPTS=-Xms2g-Xmx2g-xpack.security.enabled=falseports:-"9200:9200"volumes:-es_data:/usr/share/elasticsearch/datakibana:image:kibana:8.8.0ports:-"5601:5601"environment:-ELASTICSEARCH_HOSTS=http://elasticsearch:9200depends_on:-elasticsearchlogstash:image:logstash:8.8.0volumes:-./logstash.conf:/usr/share/logstash/pipeline/logstash.confdepends_on:-elasticsearchvolumes:es_data:

三、Filebeat 配置(部署在应用服务器)

# /etc/filebeat/filebeat.ymlfilebeat.inputs:-type:logenabled:truepaths:-/var/log/nginx/access.logfields:service:nginxenv:productionmultiline:# 处理Java异常的多行日志pattern:'^\d{4}-\d{2}-\d{2}'negate:truematch:after-type:logpaths:-/opt/app/logs/*.logfields:service:myappoutput.kafka:hosts:["kafka:9092"]topic:"logs-%{[fields.service]}"codec.json:pretty:false

四、Logstash 处理管道

# logstash.confinput{kafka{bootstrap_servers=>"kafka:9092"topics_pattern=>"logs-.*"group_id=>"logstash"codec=>json}}filter{# 解析Nginx access logif[fields][service]=="nginx"{grok{match=>{"message"=>'%{IPORHOST:remote_ip} - %{DATA:user} \[%{HTTPDATE:time}\] "%{WORD:method} %{DATA:url} HTTP/%{NUMBER:http_version}" %{NUMBER:response_code} %{NUMBER:bytes} "%{DATA:referrer}" "%{DATA:agent}" %{NUMBER:request_time}'}}mutate{convert=>{"response_code"=>"integer""bytes"=>"integer""request_time"=>"float"}}}# 慢请求标记if[request_time]and[request_time]>1.0{mutate{add_tag=>["slow_request"]}}# 解析时间戳date{match=>["time","dd/MMM/yyyy:HH:mm:ss Z"]target=>"@timestamp"}}output{elasticsearch{hosts=>["elasticsearch:9200"]index=>"logs-%{[fields][service]}-%{+YYYY.MM.dd}"}}

五、Elasticsearch 索引模板

# 创建索引模板,避免字段映射冲突curl-XPUT"http://localhost:9200/_index_template/logs"-H'Content-Type: application/json'-d' { "index_patterns": ["logs-*"], "template": { "settings": { "number_of_shards": 2, "number_of_replicas": 1, "index.lifecycle.name": "logs-policy" }, "mappings": { "properties": { "@timestamp": {"type": "date"}, "response_code": {"type": "integer"}, "request_time": {"type": "float"}, "remote_ip": {"type": "ip"} } } } }'

六、ILM 索引生命周期管理(自动清理)

# 配置策略:7天转到warm,30天删除curl-XPUT"http://localhost:9200/_ilm/policy/logs-policy"-H'Content-Type: application/json'-d' { "policy": { "phases": { "hot": { "actions": { "rollover": { "max_size": "10GB", "max_age": "1d" } } }, "warm": { "min_age": "7d", "actions": { "readonly": {}, "shrink": {"number_of_shards": 1}, "forcemerge": {"max_num_segments": 1} } }, "delete": { "min_age": "30d", "actions": {"delete": {}} } } } }'

七、Kibana 日志查询技巧

# 查询Nginx 5xx错误 response_code >= 500 AND fields.service: "nginx" # 查询慢请求 tags: slow_request AND request_time > 2 # 查询特定IP remote_ip: "192.168.1.100" # 时间范围+关键字 @timestamp:[now-1h TO now] AND message: "OutOfMemoryError"

结语:ELK平台的核心价值是把日志变成可搜索、可告警的数据资产。Filebeat轻量采集,Kafka削峰,Logstash清洗,ES存储,Kibana展示,五个组件各司其职。