使用 windows_exporter 可以非常方便地给 prometheus 增加监控 windows server 的能力。

通常情况下只需使用默认配置就可以监控 cpu,内存,网络,服务了。但某些场合,如服务器安装了安全狗,在某些配置下可能不能获取某些服务的状态,此时就需要自定义配置,比如只监控某些服务。

windows_exporter 配置说明

来源 

https://github.com/prometheus-community/windows_exporter


说明

适用于 windows 机器的 prometheus 导出器。


兼容性

windows_exporter 支持 windows server 版本 2008r2 和更高版本,以及桌面 windows 版本 7 和更高版本。


部署方式

下载exporter:

https://github.com/prometheus-community/windows_exporter/releases/download/v0.16.0/windows_exporter-0.16.0-amd64.exe


可直接执行.exe文件,也可自定义方式启动,直接启动将使用默认配置:


自定义配置

flags:
  -h, --help                     show context-sensitive help (also try
                                 --help-long and --help-man).
      --collectors.dfsr.sources-enabled="connection,folder,volume"
                                 comma-seperated list of dfsr perflib sources to
                                 use.
      --collectors.exchange.list
                                 list the collectors along with their perflib
                                 object name/ids
      --collectors.exchange.enabled=""
                                 comma-separated list of collectors to use.
                                 defaults to all, if not specified.
      --collector.iis.site-whitelist=". "
                                 regexp of sites to whitelist. site name must
                                 both match whitelist and not match blacklist to
                                 be included.
      --collector.iis.site-blacklist=collector.iis.site-blacklist
                                 regexp of sites to blacklist. site name must
                                 both match whitelist and not match blacklist to
                                 be included.
      --collector.iis.app-whitelist=". "
                                 regexp of apps to whitelist. app name must both
                                 match whitelist and not match blacklist to be
                                 included.
      --collector.iis.app-blacklist=collector.iis.app-blacklist
                                 regexp of apps to blacklist. app name must both
                                 match whitelist and not match blacklist to be
                                 included.
      --collector.logical_disk.volume-whitelist=". "
                                 regexp of volumes to whitelist. volume name
                                 must both match whitelist and not match
                                 blacklist to be included.
      --collector.logical_disk.volume-blacklist=""
                                 regexp of volumes to blacklist. volume name
                                 must both match whitelist and not match
                                 blacklist to be included.
      --collector.msmq.msmq-where=collector.msmq.msmq-where
                                 wql 'where' clause to use in wmi metrics query.
                                 limits the response to the msmqs you specify
                                 and reduces the size of the response.
      --collectors.mssql.classes-enabled="accessmethods,availreplica,bufman,databases,dbreplica,genstats,locks,memmgr,sqlstats,sqlerrors,transactions"
                                 comma-separated list of mssql wmi classes to
                                 use.
      --collectors.mssql.class-print
                                 if true, print available mssql wmi classes and
                                 exit. only displays if the mssql collector is
                                 enabled.
      --collector.net.nic-whitelist=". "
                                 regexp of nic:s to whitelist. nic name must
                                 both match whitelist and not match blacklist to
                                 be included.
      --collector.net.nic-blacklist=""
                                 regexp of nic:s to blacklist. nic name must
                                 both match whitelist and not match blacklist to
                                 be included.
      --collector.process.whitelist=".*"
                                 regexp of processes to include. process name
                                 must both match whitelist and not match
                                 blacklist to be included.
      --collector.process.blacklist=""
                                 regexp of processes to exclude. process name
                                 must both match whitelist and not match
                                 blacklist to be included.
      --collector.service.services-where=""
                                 wql 'where' clause to use in wmi metrics query.
                                 limits the response to the services you specify
                                 and reduces the size of the response.
      --collector.smtp.server-whitelist=". "
                                 regexp of virtual servers to whitelist. server
                                 name must both match whitelist and not match
                                 blacklist to be included.
      --collector.smtp.server-blacklist=collector.smtp.server-blacklist
                                 regexp of virtual servers to blacklist. server
                                 name must both match whitelist and not match
                                 blacklist to be included.
      --collector.textfile.directory="c:\\program files\\windows_exporter\\textfile_inputs"
                                 directory to read text files with metrics from.
      --config.file=config.file  yaml configuration file to use. values set in
                                 this file will be overriden by cli flags.
      --web.config.file=""       [experimental] path to configuration file that
                                 can enable tls or authentication.
      --telemetry.addr=":9182"   host:port for exporter.
      --telemetry.path="/metrics"
                                 url path for surfacing collected metrics.
      --telemetry.max-requests=5
                                 maximum number of concurrent requests. 0 to
                                 disable.
      --collectors.enabled="cpu,cs,logical_disk,net,os,service,system,textfile"
                                 comma-separated list of collectors to use. use
                                 '[defaults]' as a placeholder for all the
                                 collectors enabled by default.
      --collectors.print         if true, print available collectors and exit.
      --scrape.timeout-margin=0.5
                                 seconds to subtract from the timeout allowed by
                                 the client. tune to allow for overhead or high
                                 loads.
      --log.level="info"         only log messages with the given severity or
                                 above. valid levels: [debug, info, warn, error,
                                 fatal]
      --log.format="logger:stderr"
                                 set the log target and format. example:
                                 "logger:syslog?appname=bob&local=7" or
                                 "logger:stdout?json=true"
      --version                  show application version.

使用配置文件

可以使用–config.file标志指定 yaml 配置文件。例如

.\windows_exporter.exe --config.file=config.yml

config.yml格式如下,可根据配置文档进行内容调整:


collectors:
  enabled: cpu,cs,net,service
collector:
  service:
    services-where: "name='windows_exporter'"
log:
  level: warn

rules配置参考

包含cpu超过90%使用量预警,内存超过90%用量预警,磁盘用量90%预警,windows_export自身预警及服务预警,如开头所说,未配置时将会监控所有服务,很多时候只需要监控特定服务即可

- name: windowsserver
  rules:      
  - alert: windowsservercpuusage
    expr: 100 - (avg by (instance) (rate(windows_cpu_time_total{mode="idle"}[2m])) * 100) > 90
    for: 0m
    labels:
      severity: warning
    annotations:
      summary: windows server cpu usage (instance {{ $labels.instance }})
      description: "cpu usage is more than 90%\n  value = {{ $value }}\n  labels = {{ $labels }}"
  - alert: windowsservermemoryusage
    expr: 100 - ((windows_os_physical_memory_free_bytes / windows_cs_physical_memory_bytes) * 100) > 90
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: windows server memory usage (instance {{ $labels.instance }})
      description: "memory usage is more than 90%\n  value = {{ $value }}\n  labels = {{ $labels }}"
  - alert: windowsserverdiskspaceusage
    expr: 100.0 - 100 * ((windows_logical_disk_free_bytes / 1024 / 1024 ) / (windows_logical_disk_size_bytes / 1024 / 1024)) > 90
    for: 2m
    labels:
      severity: critical
    annotations:
      summary: windows server disk space usage (instance {{ $labels.instance }})
      description: "disk usage is more than 80%\n  value = {{ $value }}\n  labels = {{ $labels }}"
  - alert: windowsservercollectorerror
    expr: windows_exporter_collector_success == 0
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: windows server collector error (instance {{ $labels.instance }})
      description: "collector {{ $labels.collector }} was not successful\n  value = {{ $value }}\n  labels = {{ $labels }}"
  - alert: windowsserverservicestatus
    expr: windows_service_status{status="ok"} != 1
    for: 1m
    labels:
      severity: critical
    annotations:
      summary: windows server service status (instance {{ $labels.instance }})
      description: "windows service state is not ok\n  value = {{ $value }}\n  labels = {{ $labels }}"

使用prometheus能够非常简单地建立起 web 服务器集群/数据库集群监控,通过这些监控,不仅能实时监控服务器集群的状态,也能够通过这些监控信息对服务器进行优化,特别是数据库参数方面的优化,以后月萌api将分享更多相关的文章。


参考:https://blog.csdn.net/qq_43021786/article/details/118809772