实操Install Victoriametrics in K8s

互联网 2022-06-19 08:06www.robotxin.com人工智能机器人网

背景

之前给大家介绍了victoriametrics以及安装中的一些注意事项,今天来给大家实操一下,如何在k8s中进行安装。本次是基于云上的k8s上安装一个cluster版本的victoriametrics,需要使用到云上的负载均衡。

注:victoriametrics后续简称vm

安装准备 一个k8s集群,我的k8s版本是v1.20.6 在集群上准备好一个storageclass,我这里用的NFS来做的 operator镜像tag为v0.17.2,vmstorage、vmselect和vminsert镜像tag为v1.63.0。可提前拉取镜像保存到本地镜像仓库 安装须知

vm可以通过多种方式安装,如二进制、docker镜像以及源码。可根据场景进行选择。如果在k8s中进行安装,我们可以直接使用operator来进行安装。下面重点说一下安装过程中的一些注意事项。

一个最小的集群必须包含以下节点:

 一个vmstorage单节点,另外要指定-retentionPeriod和-storageDataPath两个参数  一个vminsert单节点,要指定-storageNode=  一个vmselect单节点,要指定-storageNode=注:高可用情况下,建议每个服务至少有个两个节点

在vmselect和vminsert前面需要一个负载均衡,比如vmauth、nginx。这里我们使用云上的负载均衡。同时要求:

 以/insert开头的请求必须要被路由到vminsert节点的8480端口  以/select开头的请求必须要被路由到vmselect节点的8481端口注:各服务的端口可以通过-httpListenAddr进行指定

建议为集群安装监控

如果是在一个主机上进行安装测试集群,vminsert、vmselect和vmstorage各自的-httpListenAddr参数必须唯一,vmstorage的-storageDataPath、-vminsertAddr、-vmselectAddr这几个参数必须有唯一的值。

当vmstorage通过-storageDataPath目录大小小于通过-storage.minFreeDiskSpaceBytes指定的可用空间时,会切换到只读模式;vminsert停止像这类节点发送数据,转而将数据发送到其他可用vmstorage节点

安装过程 安装vm

1、创建crd

# 下载安装文件 export VM_VERSION=`basename $(curl -fs -o/dev/null - %{redirect_url} https://github./VictoriaMetrics/operator/releases/latest)` get https://github./VictoriaMetrics/operator/releases/donload/$VM_VERSION/bundle_crd.zip unzip  bundle_crd.zip  kubectl apply -f release/crds  # 检查crd [root@test opt]# kubectl get crd  |grep vm vmagents.operator.victoriametrics.                2022-01-05T07:26:01Z vmalertmanagerconfigs.operator.victoriametrics.   2022-01-05T07:26:01Z vmalertmanagers.operator.victoriametrics.         2022-01-05T07:26:01Z vmalerts.operator.victoriametrics.                2022-01-05T07:26:01Z vmauths.operator.victoriametrics.                 2022-01-05T07:26:01Z vmclusters.operator.victoriametrics.              2022-01-05T07:26:01Z vmnodescrapes.operator.victoriametrics.           2022-01-05T07:26:01Z vmpodscrapes.operator.victoriametrics.            2022-01-05T07:26:01Z vmprobes.operator.victoriametrics.                2022-01-05T07:26:01Z vmrules.operator.victoriametrics.                 2022-01-05T07:26:01Z vmservicescrapes.operator.victoriametrics.        2022-01-05T07:26:01Z vmsingles.operator.victoriametrics.               2022-01-05T07:26:01Z vmstaticscrapes.operator.victoriametrics.         2022-01-05T07:26:01Z vmusers.operator.victoriametrics.                 2022-01-05T07:26:01Z 

2、安装operator

# 安装operator。记得提前修改operator的镜像地址 kubectl apply -f release/operator/  # 安装后检查operator是否正常 [root@test opt]# kubectl get po -n monitoring-system vm-operator-76dd8f7b84-gsbfs              1/1     Running   0          25h 

3、安装vmcluster operator安装完成后,需要根据自己的需求去构建自己的的cr。我这里安装一个vmcluster。先看看vmcluster安装文件

# cat vmcluster-install.yaml apiVersion: operator.victoriametrics./v1beta1 kind: VMCluster metadata:   name: vmcluster   namespace: monitoring-system spec:   replicationFactor: 1   retentionPeriod: "4"   vminsert:     image:       pullPolicy: IfNotPresent       repository: images.huazai./release/vminsert       tag: v1.63.0     podMetadata:       labels:         victoriaMetrics: vminsert     replicaCount: 1     resources:       limits:         cpu: "1"         memory: 1000Mi       requests:         cpu: 500m         memory: 500Mi   vmselect:     cacheMountPath: /select-cache     image:       pullPolicy: IfNotPresent       repository: images.huazai./release/vmselect       tag: v1.63.0     podMetadata:       labels:         victoriaMetrics: vmselect     replicaCount: 1     resources:       limits:         cpu: "1"         memory: 1000Mi       requests:         cpu: 500m         memory: 500Mi     storage:       volumeClaimTemplate:         spec:           aessModes:           - ReadWriteOnce           resources:             requests:               storage: 2G           storageClassName: nfs-csi           volumeMode: Filesystem   vmstorage:     image:       pullPolicy: IfNotPresent       repository: images.huazai./release/vmstorage       tag: v1.63.0     podMetadata:       labels:         victoriaMetrics: vmstorage     replicaCount: 1     resources:       limits:         cpu: "1"         memory: 1500Mi       requests:         cpu: 500m         memory: 750Mi     storage:       volumeClaimTemplate:         spec:           aessModes:           - ReadWriteOnce           resources:             requests:               storage: 20G           storageClassName: nfs-csi           volumeMode: Filesystem     storageDataPath: /vm-data    # install vmcluster  kubectl apply -f vmcluster-install.yaml    # 检查vmcluster install结果 [root@test opt]# kubectl get po -n monitoring-system  NAME                                      READY   STATUS    RESTARTS   AGE vm-operator-76dd8f7b84-gsbfs              1/1     Running   0          26h vminsert-vmcluster-main-69766c8f4-r795   1/1     Running   0          25h vmselect-vmcluster-main-0                 1/1     Running   0          25h vmstorage-vmcluster-main-0                1/1     Running   0          25h 

4、创建vminsert和vmselect service

 

# 查看创建的svc [root@test opt]# kubectl get svc -n monitoring-system NAME                            TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)                      AGE vminsert-vmcluster-main         ClusterIP   10.0.182.73            8480/TCP                     25h vmselect-vmcluster-main         ClusterIP   None                   8481/TCP                     25h vmstorage-vmcluster-main        ClusterIP   None                   8482/TCP,8400/TCP,8401/TCP   25h  # 这里为了方便不同k8s集群的数据都可以存储到该vm来,同时方便后续查询数据, # 重新创建两个svc,类型为nodeport,分别为vminsert-lbsvc和vmselect-lbsvc.同时配置云上的lb监听8480和8481端口,后端服务器为vm所在集群的节点ip, # 端口为vminsert-lbsvc和vmsleect-lbsvc两个service暴露出来的nodeport # 但与vm同k8s集群的比如opentelemetry需要存储数据时,仍然可以用: # vminsert-vmcluster-main.kube-system.svc.cluster.local:8480 # 与vm不同k8s集群的如opentelemetry存储数据时使用lb:8480  # cat vminsert-lb-svc.yaml apiVersion: v1 kind: Service metadata:   labels:     app.kuberes.io/ponent: monitoring     app.kuberes.io/instance: vmcluster-main     app.kuberes.io/name: vminsert   name: vminsert-vmcluster-main-lbsvc   namespace: monitoring-system spec:   externalTrafficPolicy: Cluster   ports:   - name: http     nodePort: 30135     port: 8480     protocol: TCP     targetPort: 8480   selector:     app.kuberes.io/ponent: monitoring     app.kuberes.io/instance: vmcluster-main     app.kuberes.io/name: vminsert   sessionAffinity: None   type: NodePort    # cat vmselect-lb-svc.yaml apiVersion: v1 kind: Service metadata:   labels:     app.kuberes.io/ponent: monitoring     app.kuberes.io/instance: vmcluster-main     app.kuberes.io/name: vmselect   name: vmselect-vmcluster-main-lbsvc   namespace: monitoring-system spec:   externalTrafficPolicy: Cluster   ports:   - name: http     nodePort: 31140     port: 8481     protocol: TCP     targetPort: 8481   selector:     app.kuberes.io/ponent: monitoring     app.kuberes.io/instance: vmcluster-main     app.kuberes.io/name: vmselect   sessionAffinity: None   type: NodePort    # 创建svc   kubectl apply -f vmselect-lb-svc.yaml   kubectl apply -f vminsert-lb-svc.yaml    # !!配置云上lb,  自行配置   # 最后检查vm相关的pod和svc  [root@test opt]# kubectl get po,svc -n monitoring-system  NAME                                          READY   STATUS    RESTARTS   AGE pod/vm-operator-76dd8f7b84-gsbfs              1/1     Running   0          30h pod/vminsert-vmcluster-main-69766c8f4-r795   1/1     Running   0          29h pod/vmselect-vmcluster-main-0                 1/1     Running   0          29h pod/vmstorage-vmcluster-main-0                1/1     Running   0          29h  NAME                                    TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)                      AGE service/vminsert-vmcluster-main         ClusterIP   10.0.182.73            8480/TCP                     29h service/vminsert-vmcluster-main-lbsvc   NodePort    10.0.255.212           8480:30135/TCP               7h54m service/vmselect-vmcluster-main         ClusterIP   None                   8481/TCP                     29h service/vmselect-vmcluster-main-lbsvc   NodePort    10.0.45.239            8481:31140/TCP               7h54m service/vmstorage-vmcluster-main        ClusterIP   None                   8482/TCP,8400/TCP,8401/TCP   29h  安装prometheus-expoter

这里还是来安装node exporter,暴露k8s节点数据,由后续的opentelemetry来采集,并通过vminsert存储到vmstorage。数据通过vmselect来进行查询

# kubectl apply -f prometheus-node-exporter-install.yaml apiVersion: apps/v1 kind: DaemonSet metadata:   labels:     app: prometheus-node-exporter     release: prometheus-node-exporter   name: prometheus-node-exporter   namespace: kube-system spec:   revisionHistoryLimit: 10   selector:     matchLabels:       app: prometheus-node-exporter       release: prometheus-node-exporter   template:     metadata:       labels:         app: prometheus-node-exporter         release: prometheus-node-exporter     spec:       containers:       - args:         - --path.procfs=/host/proc         - --path.sysfs=/host/sys         - --path.rootfs=/host/root         - --eb.listen-address=$(HOST_IP):9100         env:         - name: HOST_IP           value: 0.0.0.0         image: images.huazai./release/node-exporter:v1.1.2         imagePullPolicy: IfNotPresent         livenessProbe:           failureThreshold: 3           httpGet:             path: /             port: 9100             scheme: HTTP           periodSeconds: 10           suessThreshold: 1           timeoutSeconds: 1         name: node-exporter         ports:         - containerPort: 9100           hostPort: 9100           name: metrics           protocol: TCP         readinessProbe:           failureThreshold: 3           httpGet:             path: /             port: 9100             scheme: HTTP           periodSeconds: 10           suessThreshold: 1           timeoutSeconds: 1         resources:           limits:             cpu: 200m             memory: 50Mi           requests:             cpu: 100m             memory: 30Mi         terminationMessagePath: /dev/termination-log         terminationMessagePolicy: File         volumeMounts:         - mountPath: /host/proc           name: proc           readOnly: true         - mountPath: /host/sys           name: sys           readOnly: true         - mountPath: /host/root           mountPropagation: HostToContainer           name: root           readOnly: true       dnsPolicy: ClusterFirst       hostNetork: true       hostPID: true       restartPolicy: Alays       schedulerName: default-scheduler       securityContext:         fsGroup: 65534         runAsGroup: 65534         runAsNonRoot: true         runAsUser: 65534       serviceAount: prometheus-node-exporter       serviceAountName: prometheus-node-exporter       terminationGracePeriodSeconds: 30       tolerations:       - effect: NoSchedule         operator: Exists       volumes:       - hostPath:           path: /proc           type: ""         name: proc       - hostPath:           path: /sys           type: ""         name: sys       - hostPath:           path: /           type: ""         name: root   updateStrategy:     rollingUpdate:       maxUnavailable: 1     type: RollingUpdate  # 检查node-exporter [root@test ~]# kubectl get po -n kube-system  |grep prometheus prometheus-node-exporter-89jk                 1/1     Running   0          31h prometheus-node-exporter-hj4gh                 1/1     Running   0          31h prometheus-node-exporter-hxm8t                 1/1     Running   0          31h prometheus-node-exporter-nhqp6                 1/1     Running   0          31h  安装opentelemetry

prometheus node exporter安装好之后,再来安装opentelemetry(以后有机会再介绍)

# opentelemetry 配置文件。定义数据的接收、处理、导出 # 1.receivers即从哪里获取数据 # 2.processors即对获取的数据的处理 # 3.exporters即将处理过的数据导出到哪里,本次数据通过vminsert最终写入到vmstorage # kubectl apply -f opentelemetry-install-cm.yaml apiVersion: v1 data:   relay: |     exporters:       prometheusremoterite:         # 我这里配置lb_ip:8480,即vminsert地址         endpoint: http://lb_ip:8480/insert/0/prometheus         # 不同的集群添加不同的label,比如cluster: uat/prd         external_labels:           cluster: uat     extensions:       health_check: {}     processors:       batch: {}       memory_limiter:         ballast_size_mib: 819         check_interval: 5s         limit_mib: 1638         spike_limit_mib: 512     receivers:       prometheus:         config:           scrape_configs:           - job_name: opentelemetry-collector             scrape_interval: 10s             static_configs:             - targets:               - localhost:8888 ...省略...           - job_name: kube-state-metrics             kuberes_sd_configs:             - namespaces:                 names:                 - kube-system               role: service             metric_relabel_configs:             - regex: ReplicaSet;([|-]+)-[0-9|a-z]+               replacement: $$1               source_labels:               - created_by_kind               - created_by_name               target_label: created_by_name             - regex: ReplicaSet               replacement: Deployment               source_labels:               - created_by_kind               target_label: created_by_kind             relabel_configs:             - action: keep               regex: kube-state-metrics               source_labels:               - __meta_kuberes_service_name           - job_name: node-exporter             kuberes_sd_configs:             - namespaces:                 names:                 - kube-system               role: endpoints             relabel_configs:             - action: keep               regex: node-exporter               source_labels:               - __meta_kuberes_service_name             - source_labels:               - __meta_kuberes_pod_node_name               target_label: node             - source_labels:               - __meta_kuberes_pod_host_ip               target_label: host_ip    ...省略...     service:     # 上面定义的receivors、processors、exporters以及extensions需要在这里配置,不然不起作用       extensions:       - health_check       pipelines:         metrics:           exporters:           - prometheusremoterite           processors:           - memory_limiter           - batch           receivers:           - prometheus kind: ConfigMap metadata:   annotations:     meta.helm.sh/release-name: opentelemetry-collector-hua     meta.helm.sh/release-namespace: kube-system   labels:     app.kuberes.io/instance: opentelemetry-collector-hua     app.kuberes.io/name: opentelemetry-collector-hua   name: opentelemetry-collector-hua   namespace: kube-system  # 安装opentelemetry # kubectl apply -f  opentelemetry-install.yaml apiVersion: apps/v1 kind: Deployment metadata:   labels:     app.kuberes.io/instance: opentelemetry-collector-hua     app.kuberes.io/name: opentelemetry-collector-hua   name: opentelemetry-collector-hua   namespace: kube-system spec:   progressDeadlineSeconds: 600   replicas: 1   revisionHistoryLimit: 10   selector:     matchLabels:       app.kuberes.io/instance: opentelemetry-collector-hua       app.kuberes.io/name: opentelemetry-collector-hua   strategy:     rollingUpdate:       maxSurge: 25%       maxUnavailable: 25%     type: RollingUpdate   template:     metadata:       labels:         app.kuberes.io/instance: opentelemetry-collector-hua         app.kuberes.io/name: opentelemetry-collector-hua     spec:       containers:       - mand:         - /otelcol         - --config=/conf/relay.yaml         - --metrics-addr=0.0.0.0:8888         - --mem-ballast-size-mib=819         env:         - name: MY_POD_IP           valueFrom:             fieldRef:               apiVersion: v1               fieldPath: status.podIP         image: images.huazai./release/opentelemetry-collector:0.27.0         imagePullPolicy: IfNotPresent         livenessProbe:           failureThreshold: 3           httpGet:             path: /             port: 13133             scheme: HTTP           periodSeconds: 10           suessThreshold: 1           timeoutSeconds: 1         name: opentelemetry-collector-hua         ports:         - containerPort: 4317           name: otlp           protocol: TCP         readinessProbe:           failureThreshold: 3           httpGet:             path: /             port: 13133             scheme: HTTP           periodSeconds: 10           suessThreshold: 1           timeoutSeconds: 1         resources:           limits:             cpu: "1"             memory: 2Gi           requests:             cpu: 500m             memory: 1Gi         volumeMounts:         - mountPath: /conf         # 上面创建的给oepntelnemetry用的configmap           name: opentelemetry-collector-configmap-hua         - mountPath: /etc/otel-collector/secrets/etcd-cert/           name: etcd-tls           readOnly: true       dnsPolicy: ClusterFirst       restartPolicy: Alays       schedulerName: default-scheduler       securityContext: {}       # sa这里自行创建吧       serviceAount: opentelemetry-collector-hua       serviceAountName: opentelemetry-collector-hua       terminationGracePeriodSeconds: 30       volumes:       - configMap:           defaultMode: 420           items:           - key: relay             path: relay.yaml            # 上面创建的给oepntelnemetry用的configmap           name: opentelemetry-collector-hua         name: opentelemetry-collector-configmap-hua       - name: etcd-tls         secret:           defaultMode: 420           secretName: etcd-tls             # 检查opentelemetry运行情况。如果opentelemetry与vm在同一个k8s集群,请写service那一套,不要使用lb(受制于云上  # 4层监听器的后端服务器暂不能支持同时作为客户端和服务端)  [root@kube-control-1 ~]# kubectl get po -n kube-system  |grep opentelemetry-collector-hua opentelemetry-collector-hua-647c6c64c7-j6p4b   1/1     Running   0          8h  安装检查

所有的组件安装完成后,在浏览器输入http://lb:8481/select/0/vmui,然后在server url输入;http://lb:8481/select/0/prometheus。最后再输入对应的指标就可以查询数据了,左上角还可以开启自动刷新!

总结

整个安装过程还是比较简单的。一旦安装完成后,即可存储多个k8s集群的监控数据。vm是支持基于PromeQL的MetricsQL的,也能够作为grafana的数据源。想想之前需要手动在每个k8s集群单独安装prometheus,还要去配置存储,需要查询数据时,要单独打开每个集群的prometheus UI是不是显得稍微麻烦一点呢。如果你也觉得vm不错,动手试试看吧!

全文参考 https://github./VictoriaMetrics/VictoriaMetrics/tree/cluster https://docs.victoriametrics./ https://opentelemetry.io/docs/ https://prometheus.io/docs/prometheus/latest/configuration/configuration/

 

Copyright © 2016-2025 www.robotxin.com 人工智能机器人网 版权所有 Power by