使用Prometheus Operator优雅的监控Kubernetes

什么是Prometheus-Operator

Prometheus-Operator是一套为了方便整合prometheus和kubernetes的软件,使用Prometheus-Operator可以非常简单的在kubernetes集群中部署Prometheus服务,并且提供对kubernetes集群的监控,并且通过Prometheus-Operator用户能够使用简单的声明性配置来配置和管理Prometheus实例,这些配置将响应、创建、配置和管理Prometheus监控实例

Operator的核心思想是将Prometheus的部署与它监控的对象的配置分离,做到部署与监控对象的配置分离之后,就可以轻松实现动态配置。使用Operator部署了Prometheus之后就可以不用再管Prometheus Server了,以后如果要添加监控对象或者添加告警规则,只需要编写对应的ServiceMonitor和Prometheus资源就可以,不用再重启Prometheus服务,Operator会动态的观察配置的改动,并将其生成为对应的prometheus配置文件其中Operator可以部署、管理Prometheus Service

Prometheus-Operator的架构


上图是Prometheus-Operator官方提供的架构图,从下向上看,Operator可以部署并且管理Prometheus Server,并且Operator可以观察Prometheus,那么这个观察是什么意思呢,上图中Service1 - Service5其实就是kubernetes的service,kubernetes的资源分为Service、Deployment、ServiceMonitor、ConfigMap等等,所以上图中的Service和ServiceMonitor都是kubernetes资源,一个ServiceMonitor可以通过labelSelector的方式去匹配一类Service(一般来说,一个ServiceMonitor对应一个Service),Prometheus也可以通过labelSelector去匹配多个ServiceMonitor。

  • Operator : Operator是整个系统的主要控制器,会以Deployment的方式运行于Kubernetes集群上,并根据自定义资源(Custom Resources Definition)CRD 来负责管理与部署Prometheus,Operator会通过监听这些CRD的变化来做出相对应的处理。

  • Prometheus : Operator会观察集群内的Prometheus CRD(Prometheus 也是一种CRD)来创建一个合适的statefulset在monitoring(.metadata.namespace指定)命名空间,并且挂载了一个名为prometheus-k8s的Secret为Volume到/etc/prometheus/config目录,Secret的data包含了以下内容

    • configmaps.json指定了rule-files在configmap的名字
    • prometheus.yaml为主配置文件
  • ServiceMonitor : ServiceMonitor就是一种kubernetes自定义资源(CRD),Operator会通过监听ServiceMonitor的变化来动态生成Prometheus的配置文件中的Scrape targets,并让这些配置实时生效,operator通过将生成的job更新到上面的prometheus-k8s这个Secret的Data的prometheus.yaml字段里,然后prometheus这个pod里的sidecar容器prometheus-config-reloader当检测到挂载路径的文件发生改变后自动去执行HTTP Post请求到/api/-reload-路径去reload配置。该自定义资源(CRD)通过labels选取对应的Service,并让prometheus server通过选取的Service拉取对应的监控信息(metric)

  • Service :Service其实就是指kubernetes的service资源,这里特指Prometheus exporter的service,比如部署在kubernetes上的mysql-exporter的service

想象一下,我们以传统的方式去监控一个mysql服务,首先需要安装mysql-exporter,获取mysql metrics,并且暴露一个端口,等待prometheus服务来拉取监控信息,然后去Prometheus Server的prometheus.yaml文件中在scarpe_config中添加mysql-exporter的job,配置mysql-exporter的地址和端口等信息,再然后,需要重启Prometheus服务,就完成添加一个mysql监控的任务

现在我们以Prometheus-Operator的方式来部署Prometheus,当我们需要添加一个mysql监控我们会怎么做,首先第一步和传统方式一样,部署一个mysql-exporter来获取mysql监控项,然后编写一个ServiceMonitor通过labelSelector选择刚才部署的mysql-exporter,由于Operator在部署Prometheus的时候默认指定了Prometheus选择label为:prometheus: kube-prometheus的ServiceMonitor,所以只需要在ServiceMonitor上打上prometheus: kube-prometheus标签就可以被Prometheus选择了,完成以上两步就完成了对mysql的监控,不需要改Prometheus配置文件,也不需要重启Prometheus服务,是不是很方便,Operator观察到ServiceMonitor发生变化,会动态生成Prometheus配置文件,并保证配置文件实时生效

如何编写ServiceMonitor

ServiceMonitor是一个kubernetes的自定义资源,所以得遵循Kubernetes ServiceMonitor编写规范,这里通过动态添加一个mysql监控的示例来演示如何编写ServiceMonitor

先决条件:

  1. 已搭建好kubernetes集群
  2. 已通过使用prometheus-operator来部署好了Prometheus服务
  3. kubernetes中已经成功安装好了mysql

在满足上述先决条件的情况下,首先打开prometheus server的界面,如下图,可以在Targets下看到已经有prometheus和kubernetes自身的监控了,还没有mysql监控

接下来在kubernetes中添加mysql监控的exporter:prometheus-mysql-exporter 这里采用helm的方式安装prometheus-mysql-exporter,按照github上的步骤进行安装,修改values.yaml中的datasource为安装在kubernetes中mysql的地址,然后执行命令helm install --name me-release -f values.yaml stable/prometheus-mysql-exporter ,接下来通过执行命令kubectl get service查看刚才运行的mysql-exporter的service:

1
2
3
4
5
6
7
8
9
10
[email protected]:/usr/local/src/charts# kubectl get service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
alertmanager-operated ClusterIP None <none> 9093/TCP,6783/TCP 4d
grafana-grafana NodePort 10.106.170.105 <none> 80:32324/TCP 4d
kube-prometheus NodePort 10.111.168.249 <none> 9090:30900/TCP 22h
kube-prometheus-alertmanager NodePort 10.101.224.156 <none> 9093:30903/TCP 22h
kube-prometheus-exporter-kube-state ClusterIP 10.107.60.242 <none> 80/TCP 22h
kube-prometheus-exporter-node ClusterIP 10.111.96.180 <none> 9100/TCP 22h
me-release-prometheus-mysql-exporter ClusterIP 10.109.252.44 <none> 9104/TCP 50s
prometheus-operated ClusterIP None <none> 9090/TCP 4d

找到me-release-prometheus-mysql-exporter,说明服务已经起来了,然后执行命令kubectl describe service me-release-prometheus-mysql-exporter 查看该service信息:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
[email protected]:/usr/local/src/charts# kubectl describe service me-release-prometheus-mysql-exporter
Name: me-release-prometheus-mysql-exporter
Namespace: monitoring
Labels: app=prometheus-mysql-exporter
chart=prometheus-mysql-exporter-0.1.0
heritage=Tiller
release=me-release
Annotations: <none>
Selector: app=prometheus-mysql-exporter,release=me-release
Type: ClusterIP
IP: 10.109.252.44
Port: mysql-exporter 9104/TCP
TargetPort: 9104/TCP
Endpoints: 10.244.4.15:9104
Session Affinity: None
Events: <none>

可以看到该service在k8s内的端口是Port: mysql-exporter 9104/TCP

接下来编写ServiceMonitor文件,执行命令vim servicemonitor.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor #资源类型为ServiceMonitor
metadata:
labels:
prometheus: kube-prometheus #prometheus默认通过 prometheus: kube-prometheus发现ServiceMonitor,只要写上这个标签prometheus服务就能发现这个ServiceMonitor
name: prometheus-exporter-mysql
spec:
jobLabel: app #jobLabel指定的标签的值将会作为prometheus配置文件中scrape_config下job_name的值,也就是Target,如果不写,默认为service的name
selector:
matchLabels: #该ServiceMonitor匹配的Service的labels,如果使用mathLabels,则下面的所有标签都匹配时才会匹配该service,如果使用matchExpressions,则至少匹配一个标签的service都会被选择
app: prometheus-mysql-exporter # 由于前面查看mysql-exporter的service信息中标签包含了app: prometheus-mysql-exporter这个标签,写上就能匹配到
namespaceSelector:
any: true #表示从所有namespace中去匹配,如果只想选择某一命名空间中的service,可以使用matchNames: []的方式
# mathNames: []
endpoints:
- port: mysql-exporter #前面查看mysql-exporter的service信息中,提供mysql监控信息的端口是Port: mysql-exporter 9104/TCP,所以这里填mysql-exporter
interval: 30s #每30s获取一次信息
# path: /metrics HTTP path to scrape for metrics,默认值为/metrics
honorLabels: true

保存并退出文件,然后执行命令:kubectl create -f servicemonitor.yaml,创建成功之后执行命令kubectl get serviceMonitor查看是否有刚才创建的serviceMonitor:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
[email protected]:~/mysql-exporter# kubectl create -f servicemonitor.yaml
servicemonitor.monitoring.coreos.com/prometheus-exporter-mysql created
[email protected]:~/mysql-exporter# kubectl get serviceMonitor
NAME AGE
grafana 4d
kafka-release 5h
kafka-release-exporter 5h
kube-prometheus 23h
kube-prometheus-alertmanager 23h
kube-prometheus-exporter-coredns 23h
kube-prometheus-exporter-kube-controller-manager 23h
kube-prometheus-exporter-kube-etcd 23h
kube-prometheus-exporter-kube-scheduler 23h
kube-prometheus-exporter-kube-state 23h
kube-prometheus-exporter-kubelets 23h
kube-prometheus-exporter-kubernetes 23h
kube-prometheus-exporter-node 23h
prometheus-exporter-mysql 13s
prometheus-operator 4d
[email protected]:~/mysql-exporter#

可以看到Prometheus-exporter-mysql已经存在了,表示创建成功了,过1分钟左右,在prometheus的界面中查看Targets,可以看到已经成功添加了mysql监控。

上面提到prometheus通过标签prometheus: kube-prometheus选择ServiceMonitor,该配置写在这里, 当然,你可以通过在values.yaml中配置serviceMonitorsSelector来指定按照自己的规则选择serviceMonitor,关于如何配置serviceMonitorsSelector将放在后文统一讲解

如何动态添加告警规则

当我们动态添加了监控对象,一般会对该对象配置告警规则,采用prometheus-operator的架构模式下,当我们需要动态配置告警规则的时候,可以使用另一种自定义资源(CRD)PrometheusRule,PrometheusRule和ServiceMonitor都是一种自定义资源,ServiceMonitor用于动态添加监控实例,而PrometheusRule则用于动态添加告警规则,下面依然通过动态添加mysql的告警规则为例来演示如何使用PrometheusRule资源。

执行命令vim mysql-rule.yaml,输入以下内容

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
apiVersion: monitoring.coreos.com/v1 #这和ServiceMonitor一样
kind: PrometheusRule #该资源类型是Prometheus,这也是一种自定义资源(CRD)
metadata:
labels:
app: "prometheus-rule-mysql"
prometheus: kube-prometheus #同ServiceMonitor,ruleSelector也会默认选择标签为prometheus: kube-prometheus的PrometheusRule资源
name: prometheus-rule-mysql
spec:
groups: #编写告警规则,和prometheus的告警规则语法相同
- name: mysql.rules
rules:
- alert: TooManyErrorFromMysql
expr: sum(irate(mysql_global_status_connection_errors_total[1m])) > 10
labels:
severity: critical
annotations:
description: mysql产生了太多的错误.
summary: TooManyErrorFromMysql
- alert: TooManySlowQueriesFromMysql
expr: increase(mysql_global_status_slow_queries[1m]) > 10
labels:
severity: critical
annotations:
description: mysql一分钟内产生了{{ $value }}条慢查询日志.
summary: TooManySlowQueriesFromMysql

Prometheus选择PrometheusRule资源是通过ruleSelector来选择,默认也是通过标签:prometheus: kube-prometheus来选择,在这里可以看到,ruleSelector和ServiceMonitorsSelector都是可以配置的,如何配置将放在后文的配置统一讲解

保存以上文件之后执行kubectl create -f mysql-rule.yaml,创建成功之后执行命令kubectl get prometheusRule可以看到刚才创建的PrometheusRule资源prometheus-rule-mysql:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[email protected]:~/mysql-exporter# kubectl create -f mysql-rule.yaml
prometheusrule.monitoring.coreos.com/prometheus-rule-mysql created
[email protected]:~/mysql-exporter# kubectl get prometheusRule
NAME AGE
kube-prometheus 1h
kube-prometheus-alertmanager 1h
kube-prometheus-exporter-kube-controller-manager 1h
kube-prometheus-exporter-kube-etcd 1h
kube-prometheus-exporter-kube-scheduler 1h
kube-prometheus-exporter-kube-state 1h
kube-prometheus-exporter-kubelets 1h
kube-prometheus-exporter-kubernetes 1h
kube-prometheus-exporter-node 1h
kube-prometheus-rules 1h
prometheus-rule-mysql 8s

等待1分钟左右,在prometheus图形界面中可以找到刚才添加的mysql.rule的内容了

如何动态更新Alertmanager配置

原理

Operator部署Alertmanager的时候会生成一个statefulset类型对象,通过命令kubectl get statefulset --all-namespaces可以找到这个statefulset,可以看到name是alertmanager-kube-prometheus

1
2
3
4
5
6
7
8
9
10
11
12
13
[email protected]:~/prometheus-operator/helm/alertmanager/templates# kubectl get statefulset --all-namespaces
NAMESPACE NAME DESIRED CURRENT AGE
elk elastic-release-elasticsearch-data 2 2 8d
elk elastic-release-elasticsearch-master 3 3 8d
elk logstash-release 1 1 9d
kafka kafka-release 3 3 1d
kafka zookeeper-release 3 3 12d
kube-system mongodb-release-arbiter 1 1 16d
kube-system mongodb-release-primary 1 1 16d
kube-system mongodb-release-secondary 1 1 16d
kube-system my-release-mysqlha 3 3 15d
monitoring alertmanager-kube-prometheus 1 1 9h
monitoring prometheus-kube-prometheus 1 1 9h

然后执行命令kubectl describe statefulset alertmanager-kube-prometheus -n monitoring可以看到该statefulset的详细信息:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
[email protected]:~/prometheus-operator/helm/alertmanager/templates# kubectl describe statefulset alertmanager-kube-prometheus -n monitoring
Name: alertmanager-kube-prometheus
Namespace: monitoring
CreationTimestamp: Wed, 05 Sep 2018 09:46:08 +0800
Selector: alertmanager=kube-prometheus,app=alertmanager
Labels: alertmanager=kube-prometheus
app=alertmanager
chart=alertmanager-0.1.6
heritage=Tiller
release=kube-prometheus
Annotations: <none>
Replicas: 1 desired | 1 total
Update Strategy: RollingUpdate
Pods Status: 1 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
Labels: alertmanager=kube-prometheus
app=alertmanager
Containers:
alertmanager:
Image: intellif.io/prometheus-operator/alertmanager:v0.15.1
Ports: 9093/TCP, 6783/TCP
Host Ports: 0/TCP, 0/TCP
Args:
--config.file=/etc/alertmanager/config/alertmanager.yaml
--cluster.listen-address=$(POD_IP):6783
--storage.path=/alertmanager
--web.listen-address=:9093
--web.external-url=http://192.168.11.178:30903
--web.route-prefix=/
--cluster.peer=alertmanager-kube-prometheus-0.alertmanager-operated.monitoring.svc:6783
Requests:
memory: 200Mi
Liveness: http-get http://:web/api/v1/status delay=0s timeout=3s period=10s #success=1 #failure=10
Readiness: http-get http://:web/api/v1/status delay=3s timeout=3s period=5s #success=1 #failure=10
Environment:
POD_IP: (v1:status.podIP)
Mounts:
/alertmanager from alertmanager-kube-prometheus-db (rw)
/etc/alertmanager/config from config-volume (rw) #挂载的secert目录
config-reloader:
Image: intellif.io/prometheus-operator/configmap-reload:v0.0.1
Port: <none>
Host Port: <none>
Args:
-webhook-url=http://localhost:9093/-/reload
-volume-dir=/etc/alertmanager/config
Limits:
cpu: 5m
memory: 10Mi
Environment: <none>
Mounts:
/etc/alertmanager/config from config-volume (ro)
Volumes:
config-volume:
Type: Secret (a volume populated by a Secret)
SecretName: alertmanager-kube-prometheus
Optional: false
alertmanager-kube-prometheus-db:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
Volume Claims: <none>
Events: <none>

该statefulset挂载了一个名为alertmanager-kube-prometheus的secret资源到alertmanager容器内部的/etc/alertmanager/config/alertmanager.yaml,上面的Volumes:下面的config-volume:标签下可以看到,Type:字段的值为Secret表示挂载一个secret资源,secrect的name是alertmanager-kube-prometheus,通过一下命令查看该secret: kubectl describe secrets alertmanager-kube-prometheus -n monitoring

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[email protected]:~# kubectl describe secrets alertmanager-kube-prometheus -n monitoring
Name: alertmanager-kube-prometheus
Namespace: monitoring
Labels: alertmanager=kube-prometheus
app=alertmanager
chart=alertmanager-0.1.6
heritage=Tiller
release=kube-prometheus
Annotations: <none>

Type: Opaque

Data
====
alertmanager.yaml: 567 bytes

可以看到该secert的Data项里面有一个key为alertmanager.yaml的属性,其value包含567bytes,而这个alertmanager.yaml的值其实就是alertmanager容器的/etc/alertmanager/config/alertmanager.yaml中的内容,statefulset通过挂载的方式将/etc/alertmanager/config挂载成一个secret,执行kubectl edit secrets -n monitoring alertmanager-kube-prometheus可以看到该secret的内容:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
apiVersion: v1
data:
alertmanager.yaml: Z2xvYmFsOg0KICByZXNvbHZlX3RpbWVvdXQ6IDVtDQogIHNtdHBfYXV0aF9wYXNzd29yZDogeWluamsxMjM0NQ0KICBzbXRwX2F1dGhfdXNlcm5hbWU6IGlub3JpX3lpbmprQDE2My5jb20NCiAgc210cF9mcm9tOiBpbm9yaV95aW5qazFAMTYzLmNvbQ0KICBzbXRwX3JlcXVpcmVfdGxzOiBmYWxzZQ0KICBzbXRwX3NtYXJ0aG9zdDogc210cC4xNjMuY29tOjI1DQpyZWNlaXZlcnM6DQotIGVtYWlsX2NvbmZpZ3M6DQogIC0gaGVhZGVyczoNCiAgICAgIFN1YmplY3Q6ICdbRVJST1JdIHByb21ldGhldXMuLi4uLi4uLi4uLi4nDQogICAgdG86IDExMjE1NjI2NDhAcXEuY29tDQogIG5hbWU6IHRlYW0tWC1tYWlscw0KLSBuYW1lOiAibnVsbCINCnJvdXRlOg0KICBncm91cF9ieToNCiAgLSBhbGVydG5hbWUNCiAgLSBjbHVzdGVyDQogIC0gc2VydmljZQ0KICBncm91cF9pbnRlcnZhbDogNW0NCiAgZ3JvdXBfd2FpdDogNjBzDQogIHJlY2VpdmVyOiB0ZWFtLVgtbWFpbHMNCiAgcmVwZWF0X2ludGVydmFsOiAyNGgNCiAgcm91dGVzOg0KICAtIG1hdGNoOg0KICAgICAgYWxlcnRuYW1lOiBEZWFkTWFuc1N3aXRjaA0KICAgIHJlY2VpdmVyOiAibnVsbCI=
kind: Secret
metadata:
creationTimestamp: 2018-09-05T01:46:08Z
labels:
alertmanager: kube-prometheus
app: alertmanager
chart: alertmanager-0.1.6
heritage: Tiller
release: kube-prometheus
name: alertmanager-kube-prometheus
namespace: monitoring
resourceVersion: "5820063"
selfLink: /api/v1/namespaces/monitoring/secrets/alertmanager-kube-prometheus
uid: 75a589e8-b0ad-11e8-8746-005056bf1d6e
type: Opaque

其中data:下面的alertmanager.yaml这个key对应的值是一串base64编码过后的字符串,将这段字符串复制出来通过base64反编码之后内容如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
global:
resolve_timeout: 5m
smtp_auth_password: xxxxxx
smtp_auth_username: [email protected]
smtp_from: [email protected]
smtp_require_tls: false
smtp_smarthost: smtp.163.com:25
receivers:
- email_configs:
- headers:
Subject: '[ERROR] prometheus............'
to: [email protected]
name: team-X-mails
- name: "null"
route:
group_by:
- alertname
- cluster
- service
group_interval: 5m
group_wait: 60s
receiver: team-X-mails
repeat_interval: 24h
routes:
- match:
alertname: DeadMansSwitch
receiver: "null"

这其实就是alertmanager的config配置,上面说到,该内容会被挂载到alertmanager容器的/etc/alertmanager/config/alertmanager.yaml,我们进入alertmanager容器去看看该文件,执行命令kubectl exec -it alertmanager-kube-prometheus-0 -n monitoring sh进入到容器(可能你的容器名和我的不同,可以通过kubectl get pods –all-namespaces命令查看所有的容器),然后进入目录/etc/alertmanager/config,然后ls可以看到该目录下有一个叫alertmanager.yaml的文件,而该文件的内容就是上面base64反编译之后的内容,我们通过修改名为alertmanager-kube-prometheus的secret的data属性中的alertmanager.yaml字段对应的值就相当于修改了该文件中的内容,所以现在问题就变简单了,在alertmanager的pod中还有另一个container叫做config-reloader,它会监听/etc/alertmanager/config目录,当该目录下的文件发生变化的时候,config-reloader会向alertmanager发起http://localhost:9093/-/reloadPOST请求,alertmanager会重新加载该目录下的配置文件,从而实现了动态配置更新

如何操作

在理解了alertmanager动态配置的原理之后,问题就很清晰了,我们需要动态配置alertmanager只需要更新名为alertmanager-kube-prometheus(你的secret名不一定为这个名字,但一定是alertmanager-{*}格式)的secret的data属性中的alertmanager.yaml字段的值就可以了,更新secret有两种方法,一是通过kubectl edit secret的方式,一种是通过kubectl patch secret的方式,但是两种方式更新secret都需要输入base64编码之后的字符串,这里通过linux下的base64命令进行编码:

首先修改上面base64反编译后的文件,比如smtp_from改成另一个邮箱发送,修改完成之后保存文件,然后通过命令base64 file > test.txt的方式将配置通过base64编码并将编码结果输出到test.txt文件中,然后进入test.txt文件中复制编码之后的字符串,如果通过第一种方式更新secret,执行命令kubectl edit secrets -n monitoring alertmanager-kube-prometheus然后data下面的alertmanager.yaml的值为刚才复制的字符串,保存并退出就可以了。如果通过第二种方式更新secert,执行命令kubectl patch secert alertmanager-kube-prometheus -n -n monitoring -p '{"data":{"alertmanager.yaml":"此处填写刚才复制的base64编码之后的配置字符串"}'即可完成更新,该命令中 -p参数后面跟的是一个JSON字符串,将刚才复制的base64编码后的字符串填入正确的位置可以了

在完成更新之后可以访问alertmanager的界面http://192.168.11.178:30903/#/status,查看配置已经生效了

通过上面的操作我们已经实现了监控对象的动态发现,监控告警规则的动态添加,告警配置(发送邮件)的动态配置,基本上已经实现了所有配置的动态配置

Prometheus-Operator配置

配置Prometheus一般情况下只需要配置kube-prometheus下的values.yaml就能实现对alertmanager、promethes的配置,该文件该如何配置在配置项上一般都有说明,这里主要讲解上文提到的几个配置以及其他比较常用的几个配置,避免占用太多篇幅,我已经将比较简单或不常用的配置以省略号代替:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
...

alertmanager: #所有alertmanager的配置都在这个标签下面
config: #alermanager的config,和传统的alertmanager的config配置相同
global:
resolve_timeout: 5m
route:
group_by: ['job']
group_wait: 30s
group_interval: 5m
repeat_interval: 12h
receiver: 'null'
routes:
- match:
alertname: DeadMansSwitch
receiver: 'null'
receivers:
- name: 'null'

## 外部url,报警发送邮件后可以通过该Url访问Alertmanager的界面
externalUrl: "http://192.168.11.178:30903"

...
## Node labels for Alertmanager pod assignment 通过该选择器将会选择alertmanager在哪个node上面部署
## Ref: https://kubernetes.io/docs/user-guide/node-selection/
##
nodeSelector: {}

...

## List of Secrets in the same namespace as the AlertManager
## object, which shall be mounted into the AlertManager Pods.
## Ref: https://github.com/coreos/prometheus-operator/blob/master/Documentation/api.md#alertmanagerspec
##
secrets: []

service:
...
## Port to expose on each node
## Only used if service.type is 'NodePort'
##
nodePort: 30903 #暴露的端口

## Service type
##
type: NodePort # 以NodePort的方式部署alertmanager 的Service,这将可以使用node的ip访问服务

prometheus: #所有的prometheus的配置都在这里编写
## Alertmanagers to which alerts will be sent
## Ref: https://github.com/coreos/prometheus-operator/blob/master/Documentation/api.md#alertmanagerendpoints
##
alertingEndpoints: []
#alertmanager地址,不填写默认会使用同时部署的alertmanager,在helm/prometheus/templates/prometheus.yaml文件第40行,
#github中https://github.com/coreos/prometheus-operator/blob/master/helm/prometheus/templates/prometheus.yaml#L40
# - name: ""
# namespace: ""
# port: 9093
# scheme: http

...

## External URL at which Prometheus will be reachable
##同上,外部访问地址
externalUrl: ""

## List of Secrets in the same namespace as the Prometheus
## object, which shall be mounted into the Prometheus Pods.
## Ref: https://github.com/coreos/prometheus-operator/blob/master/Documentation/api.md#prometheusspec
##
secrets: []

## How long to retain metrics
## prometheus时序数据库数据保存时间
retention: 24h

## Namespaces to be selected for PrometheusRules discovery.
## If unspecified, only the same namespace as the Prometheus object is in is used.
## 选择PrometheusRule的namespace,如何配置any:true则回去所有命名空间中去寻找PrometheusRule资源
ruleNamespaceSelector: {}
## any: true
## or
##

## Rules PrometheusRule CRD selector
## Ref: https://github.com/coreos/prometheus-operator/blob/master/Documentation/design.md
##
## 1. If `matchLabels` is used, `rules.additionalLabels` must contain all the labels from
## `matchLabels` in order to be be matched by Prometheus
## 2. If `matchExpressions` is used `rules.additionalLabels` must contain at least one label
## from `matchExpressions` in order to be matched by Prometheus
## Ref: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels
## 上文提到的ruleNamespaceSelector,不填写默认会使用prometheus: {{.Values.prometheusLabelValue}},
## 而prometheusLabel定义在helm/prometheus/values.yaml中,默认为.Release.Name即releaseName(kube-prometheus)
## 我们可以通过修改helm/prometheus/values.yaml中的prometheusLabel值来修改默认选择的标签,也可以在这里定义自己的标签,
## 在这里定义了标签之后默认的会失效,比如在这里定义一个comment: prometheus标签,默认的prometheus: kube-prometheus就会失效,prometheus也将根据新定义的标签来选择PrometheusRule资源
rulesSelector: {}
# rulesSelector: {
# matchExpressions: [{key: prometheus, operator: In, values: [example-rules, example-rules-2]}]
# }
### OR
# rulesSelector: {
# matchLabels: [{role: example-rules}]
# }

## Prometheus alerting & recording rules
## Ref: https://prometheus.io/docs/querying/rules/
## Ref: https://prometheus.io/docs/alerting/rules/
##
rules: #可以在这里配置PrometheusRule资源,一般不在这里配置,而是单独编写PrometheusRule这样可以实现动态配置
specifiedInValues: true
## What additional rules to be added to the PrometheusRule CRD
## You can use this together with `rulesSelector`
additionalLabels: {}
# prometheus: example-rules
# application: etcd
value: {}

service:

## Port to expose on each node
## Only used if service.type is 'NodePort'
##
nodePort: 30900 #同上,暴露的端口

## Service type
##
type: NodePort #同上,将Prometheus Service映射到外网

## Service monitors selector
## Ref: https://github.com/coreos/prometheus-operator/blob/master/Documentation/design.md
## 同上,prometheus默认会选择带有prometheus: kube-prometheus标签的ServiceMonitor资源,也可以在这里配置自定义的标签,一旦在这里配置了标签,默认的将会失效,prometheus会按照新的serviceMonitorSelector中定义的标签来选择对应的ServiceMonitor
serviceMonitorsSelector: {}
# matchLabels:
# - comment: prometheus
# - release: kube-prometheus

## ServiceMonitor CRDs to create & be scraped by the Prometheus instance.
## Ref: https://github.com/coreos/prometheus-operator/blob/master/Documentation/service-monitor.md
##
serviceMonitors: [] #可以在这里配置serviceMonitor资源,一般不再这里配置,参见如何编写ServiceMonitor章节

参考

  1. [官方博客] The Prometheus Operator: Managed Prometheus setups for Kubernetes
  2. [官方文档] Prometheus Operator
  3. Prometheus-Operator的Github地址
  4. kube-prometheus/values.yaml
  5. Document/custon-configuration.md
  6. kube-prometheus/values.values中各配置的作用
  7. Prometheus Operator 介紹與安裝
  8. Prometheus Operator