ELK有坑,千万别踩( 五 )

关于grok语法官方给定的语法
https://github.com/logstash-plugins/logstash-patterns-core/tree/master/patterns
https://github.com/elastic/logstash/tree/v1.4.2/patterns
如果你使用的正则语言可以将nginx日志进行匹配,就可以成功对日志进行切割,效果看下图:
调试的过程中可以使用这个网站来进行正则语法的测试:http://grokdebug.herokuapp.com/
1. 线上配置的信息格式192.168.70.94 跟权威指南中的一样SYSLOGBASE %{SYSLOGTIMESTAMP:timestamp} (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:logsource} %{SYSLOGPROG}:以下这个是Logstash默认自带了Apache标准日志的grok正则表达式:COMMONAPACHELOG %{IPORHOST:clientip} %{USER:ident} %{USER:auth} [%{HTTPDATE:timestamp}] "(?:%{word:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" %{NUMBER:response} (?:%{NUMBER:bytes}|-)COMBINEDAPACHELOG %{COMMONAPACHELOG} %{QS:referrer} %{QS:agent}2. 我的nginx日志切割格式NGINX_ACCESS %{IPORHOST:remote_addr} - %{USERNAME:remote_user} [%{HTTPDATE:tiem_local}] "%{DATA:request}" %{INT:status} %{NUMBER:bytes_sent} "%{DATA:http_referer}" "%{DATA:http_user_agent}"MAINNGINXLOG %{COMBINEDAPACHELOG} %{QS:x_forwarded_for}COMBINEDAPACHELOG 合并的apache日志logstash客户端用的是这种方式COMMONAPACHELOG 普通的apache日志当grok匹配失败的时候,插件会为这个事件打个tag,默认是_grokparsefailure 。LogStash允许你把这些处理失败的事件路由到其他地方做后续的处理input { # ... }filter {grok {match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} [%{IPV4:ip};%{WORD:environment}] %{LOGLEVEL:log_level} %{GREEDYDATA:message}" }}}output {if "_grokparsefailure" in [tags] {# write events that didn't match to a filefile { "path" => "/tmp/grok_failures.txt" }} else {elasticsearch { }}}看下面红色的地方,表示grok匹配失败,才会将tags的标签定义成_grokparsefailure这个默认的

ELK有坑,千万别踩

文章插图
 
解决说是要设置锚点 目前不懂什么意思 先放到这里
https://www.jianshu.com/p/86133dd66ca4
另外一种说法,暂时不会用,先放着1.if "_grokparsefailure" in [tags] { drop { } }2.match语句跟第一个一样的没啥要点,看着官网例子搞就行了3.尽量用grok吧 ,grep功能以后要去掉的 。当时想的另外一种解决方法就是改nginx的日志格式成json形式的,但是我不想用这种方法 。
log_format json '{"@timestamp":"$time_iso8601",''"host":"$server_addr",''"clientip":"$remote_addr",''"request":"$request",''"status":"$status",''"request_method": "$request_method",''"size":"$body_bytes_sent",''"request_time":"$request_time",''"upstreamtime":"$upstream_response_time",''"upstreamhost":"$upstream_addr",''"http_host":"$host",''"url":"$uri",''"http_forward":"$http_x_forwarded_for",''"referer":"$http_referer",''"agent":"$http_user_agent"}';access_log/var/log/nginx/access.log json ;问题解决Nginx日志没有被成功切割的终极原因以下是两种日志方式:log_formatmain'$remote_addr - $remote_user [$time_iso8601] "$request" ''$status $body_bytes_sent "$http_referer" ''"$http_user_agent" "$http_x_forwarded_for" "$host" "$request_time"';log_format format2'$remote_addr - $remote_user [$time_local] "$request" ''$status $body_bytes_sent "$http_referer" ''"$http_user_agent" "$http_x_forwarded_for" "$host" "$request_time"';在logstash中进行切割的时候调用的时间变量是不同的,灵感来自如下: grok {match => { "message" => "%{TIMESTAMP_ISO8601:time}" }}date{match => ["time", "yyyy-MM-dd HH:mm:ss", "ISO8601"]target => "@timestamp"}mutate{remove_field => ["time"]}定义:HTTPDATE %{MONTHDAY}/%{MONTH}/%{YEAR}:%{TIME} %{INT}TIMESTAMP_ISO8601 %{YEAR}-%{MONTHNUM}-%{MONTHDAY}[T ]%{HOUR}:?%{MINUTE}(?::?%{SECOND})?%{ISO8601_TIMEZONE}?#NGINX_ACCESS %{IPORHOST:remote_addr} - %{USERNAME:remote_user} [%{HTTPDATE:time_iso8601}] "%{DATA:request}" %{INT:status} %{NUMBER:bytes_sent} "%{DATA:http_referer}" "%{DATA:http_user_agent}"{DATA:http_user_agent}"NGINX_ACCESS %{IPORHOST:remote_addr} - %{USERNAME:remote_user} [%{TIMESTAMP_ISO8601:time_iso8601}] "%{DATA:request}" %{INT:status} %{NUMBER:bytes_sent} "%{DATA:http_referer}" "%{DATA:http_user_agent}"SYSLOGBASE %{SYSLOGTIMESTAMP:timestamp} (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:logsource} %{SYSLOGPROG}:COMMONAPACHELOG %{IPORHOST:clientip} %{USER:ident} %{USER:auth} [%{HTTPDATE:timestamp}] "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" %{NUMBER:response} (?:%{NUMBER:bytes}|-)COMBINEDAPACHELOG %{COMMONAPACHELOG} %{QS:referrer} %{QS:agent}Nginx中时间格式1:$time_local对应logstash中[%{HTTPDATE:timestamp}]Nginx中时间格式2:$time_iso8601 对应logstash中[%{TIMESTAMP_ISO8601:timestamp}]


推荐阅读