时间:2026-02-09 14:36
人气:
作者:admin
本小节继续来描述istio对于流量的各种操作
对标nginx的mirror功能,复制一份流量到对应的地址去,通常用来做从线上环境引流至其他环境做测试或者分析
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
name: backend-vs
namespace: default
spec:
hosts:
- backend-service
- api.wilsontest.com
http:
- mirror:
host: backend-service
subset: v1
mirrorPercentage:
value: 100
route:
- destination:
host: backend-service
subset: v0
流量先到v0版本,istio-proxy复制一份流量到v1版本。如果不想1比1复刻,可以调整mirrorPercentage百分比功能
如果mirror host的目标不存在,怎么发现该错误及时调整host配置呢?
配置超时/重试的原因主要是为了解决:
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: backend-retry
spec:
hosts:
- backend-service
http:
- route:
- destination:
host: backend-service
timeout: 1s
retries:
attempts: 3 # 最大重试次数
perTryTimeout: 1s # 每次尝试的超时
retryOn: # 触发重试的条件
- 5xx
- gateway-error
- connect-failure
- refused-stream
有位老哥说了,如果一套qps很高的集群,一旦发生重试,那就意味着短时间之内上游服务的qps至少翻一倍(第一波请求不成功,很快第二波请求就要来了),那这时候上游服务就有被冲垮的风险
说的没错,重试是为了提高请求的成功率,但是不可避免增加系统负载,并且增加请求的响应时间,如果大量重试,那就会导致重试风暴,带来更大的问题
为了避免重试风暴,在配置策略的时候应该考虑合理的重试次数
retries:
attempts: 3 # 最大重试次数
perTryTimeout: 1s # 每次尝试的超时
重试3次,每次间隔1s,然后就应该报错,介入查看了
超时时间逐层递减,前端超时 > 网关超时 > 服务超时
frontend: timeout: 5s
nginx-test: timeout: 3s
backend-service: timeout: 2s
简而言之,就是重试失败之后不是马上重试,而是等一段时间再重试
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
name: backend-vs
namespace: default
spec:
hosts:
- backend-service
- api.wilsontest.com
http:
- retries:
attempts: 10
perTryTimeout: 1s
retryOn: 5xx,connect-failure
route:
- destination:
host: backend-service
subset: v0
istio-proxy自带了指数退避与随机退避,初始25ms
为了探索istio-proxy是否带有指数退避与随机退避的特点,特意设置attempts: 10(日常用可以设置小一点,比如笔者通常设置为3)
设置后端报错代码,只要报错5xx即可,所以我直接将代码的关键字改错,应该会报语法错误或者方法找不到之类的
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/tornado/web.py", line 1846, in _execute
result = method(*self.path_args, **self.path_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/test.py", line 9, in get
self.writ(ret)
^^^^^^^^^
AttributeError: 'TestFlow' object has no attribute 'writ'
都准备好了,开始测试:
curl -s -H 'host: api.wilsontest.com' 10.22.12.178:30785/testattempts: 10[2026-02-05T06:51:41.322Z] "GET /test HTTP/1.1" 500 - upstream=10.244.0.73:10000 duration=1ms route=default
[2026-02-05T06:51:41.332Z] "GET /test HTTP/1.1" 500 - upstream=10.244.0.73:10000 duration=1ms route=default
[2026-02-05T06:51:41.369Z] "GET /test HTTP/1.1" 500 - upstream=10.244.0.73:10000 duration=2ms route=default
[2026-02-05T06:51:41.441Z] "GET /test HTTP/1.1" 500 - upstream=10.244.0.73:10000 duration=1ms route=default
[2026-02-05T06:51:41.463Z] "GET /test HTTP/1.1" 500 - upstream=10.244.0.73:10000 duration=2ms route=default
[2026-02-05T06:51:41.480Z] "GET /test HTTP/1.1" 500 - upstream=10.244.0.73:10000 duration=1ms route=default
[2026-02-05T06:51:41.660Z] "GET /test HTTP/1.1" 500 - upstream=10.244.0.73:10000 duration=1ms route=default
[2026-02-05T06:51:41.787Z] "GET /test HTTP/1.1" 500 - upstream=10.244.0.73:10000 duration=1ms route=default
[2026-02-05T06:51:41.804Z] "GET /test HTTP/1.1" 500 - upstream=10.244.0.73:10000 duration=1ms route=default
[2026-02-05T06:51:41.978Z] "GET /test HTTP/1.1" 500 - upstream=10.244.0.73:10000 duration=1ms route=default
[2026-02-05T06:51:42.116Z] "GET /test HTTP/1.1" 500 - upstream=10.244.0.73:10000 duration=2ms route=default
| 序号 | 时间戳 | 与上一次间隔 |
|---|---|---|
| 1 | 41.322 | — |
| 2 | 41.332 | +10ms |
| 3 | 41.369 | +37ms |
| 4 | 41.441 | +72ms |
| 5 | 41.463 | +22ms |
| 6 | 41.480 | +17ms |
| 7 | 41.660 | +180ms |
| 8 | 41.787 | +127ms |
| 9 | 41.804 | +17ms |
| 10 | 41.978 | +174ms |
| 11 | 42.116 | +138ms |
指数+随机,初始 backoff:~25ms、指数增长、加入 jitter(随机抖动)
经过这次简单的测试:
熔断是为了保护后端服务不被流量风暴淹没,保护系统整体稳定
目标:如果后端检测5xx,超过3次,就将该pod踢下线,30s之后又加回来
apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
name: backend-dr
namespace: default
spec:
host: backend-service
subsets:
- labels:
version: v0
name: v0
trafficPolicy:
outlierDetection:
baseEjectionTime: 30s
consecutive5xxErrors: 3
interval: 5s
maxEjectionPercent: 100
baseEjectionTime: 30s:服务被下线的时间,30sconsecutive5xxErrors: 3:触发熔断的条件,有3次5xxinterval: 5s:检测间隔,5smaxEjectionPercent: 100,被下线的服务比例,100%后端backend服务依然会报错500,先访问3次,curl -s -H 'host: api.wilsontest.com' 10.22.12.178:30785/test
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/tornado/web.py", line 1846, in _execute
result = method(*self.path_args, **self.path_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/test.py", line 9, in get
self.writ(ret)
第四次再访问
no healthy upstream
符合预期,第四次服务直接被熔断了,并且由于backend的pod只有1个,istio下线了,导致nginx没有upstream
首先基于http1.1,每次发起http并不是短链了,而是长连接。为了不让每次都产生3次握手与4次挥手的连接消耗,istio-proxy与后端服务backend之间会维护一个长连接
配置在DestinationRule上
apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
name: backend-dr
namespace: default
spec:
host: backend-service
subsets:
- labels:
version: v0
name: v0
trafficPolicy:
connectionPool:
http:
http1MaxPendingRequests: 1
maxRequestsPerConnection: 5
http1MaxPendingRequests: 1与maxRequestsPerConnection: 5是为了方便测试,改得非常的小http1MaxPendingRequests: 1:等待“可用连接”的 HTTP 请求数量,如果没有可用连接,最多允许1个,超出就报503maxRequestsPerConnection: 5:一条 TCP 连接上最多处理多少个 HTTP 请求使用wrk压测工具,用20个并发,同时发送20个连接,向目标url发送请求,持续1s
▶ wrk -t20 -c20 -d1s -H 'Host: api.wilson.com' http://10.22.12.178:30785/test
Running 1s test @ http://10.22.12.178:30785/test
20 threads and 20 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 10.66ms 3.18ms 21.85ms 76.73%
Req/Sec 93.55 16.33 171.00 80.75%
1990 requests in 1.10s, 650.09KB read
Non-2xx or 3xx responses: 92
Requests/sec: 1808.21
Transfer/sec: 590.70KB
检查日志
...
[2026-02-06T07:37:08.168Z] "GET /test HTTP/1.1" 200 - upstream=10.244.0.73:10000 duration=5ms route=default
[2026-02-06T07:37:08.169Z] "GET /test HTTP/1.1" 503 UO upstream=- duration=0ms route=default
[2026-02-06T07:37:08.169Z] "GET /test HTTP/1.1" 0 DC upstream=10.244.0.73:10000 duration=4ms route=default
...

至此,本文结束
在下才疏学浅,有撒汤漏水的,请各位不吝赐教...
本文来自博客园,作者:it排球君,转载请注明原文链接:https://www.cnblogs.com/MrVolleyball/p/19595103
本文版权归作者和博客园共有,欢迎转载,但未经作者同意必须在文章页面给出原文连接,否则保留追究法律责任的权利。