时间:2025-11-18 10:46
人气:
作者:admin
jaeger的架构演变
在之前的描述中,一直使用jaeger:all-in-one来做数据存储与展示,jaeger:all-in-one就是将collector、query、ui、storage等等功能的大杂烩,在调试与测试环境中,非常方便,但是在生产环境肯定是不能这样用,本节就来 将其拆分成对应的子模块

下面我们来详细描述一下整个过程
import tornado.httpserver as httpserver
import tornado.web
from tornado.ioloop import IOLoop
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.resources import SERVICE_NAME, Resource
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.trace import get_tracer
trace.set_tracer_provider(
TracerProvider(resource=Resource.create({SERVICE_NAME: "hello-otlp"}))
)
span_processor = BatchSpanProcessor(OTLPSpanExporter(endpoint="http://127.0.0.1:14318/v1/traces"))
trace.get_tracer_provider().add_span_processor(span_processor)
def traced(name):
def decorator(func):
def wrapper(*args, **kwargs):
tracer = get_tracer(__name__)
with tracer.start_as_current_span(name):
return func(*args, **kwargs)
return wrapper
return decorator
class TestFlow(tornado.web.RequestHandler):
def get(self):
views()
self.finish('hello world')
@traced("phase-1")
def views():
views_sub_2()
views_sub_3()
@traced("phase-2")
def views_sub_2():
pass
@traced("phase-3")
def views_sub_3():
pass
def applications():
urls = []
urls.append([r'/', TestFlow])
return tornado.web.Application(urls)
def main():
app = applications()
server = httpserver.HTTPServer(app)
server.bind(10000, '0.0.0.0')
server.start(1)
IOLoop.current().start()
if __name__ == "__main__":
try:
main()
except KeyboardInterrupt as e:
IOLoop.current().stop()
finally:
IOLoop.current().close()
docker run -d --name jaeger-collector \
-p 14250:14250 \
-p 14268:14268 \
-p 14317:4317 \
-p 14318:4318 \
-e SPAN_STORAGE_TYPE=elasticsearch \
-e ES_SERVER_URLS=http://10.22.12.178:9200 \
-e ES_USERNAME=elastic \
-e LOG_LEVEL=debug \
jaegertracing/jaeger-collector:1.72.0
这里使用es来充当storage
docker run -d --name jaeger-es \
-e bootstrap.memory_lock=true \
-e discovery.type=single-node \
-p 9200:9200 \
-p 9300:9300 \
-e xpack.security.enabled=false \
-e xpack.security.http.ssl.enabled=false \
elastic/elasticsearch:9.1.2
docker run -d --name jaeger-query \
-p 16686:16686 \
-p 16687:16687 \
-e SPAN_STORAGE_TYPE=elasticsearch \
-e ES_SERVER_URLS=http://10.22.12.178:9200 \
-e ES_USERNAME=elastic \
-e LOG_LEVEL=debug \
jaegertracing/jaeger-query:1.72.0
来看下效果:
curl 127.0.0.1:10000http://127.0.0.1:16686/查看很好,数据已经正常上报了
在jaeger-collector上做一层otel-collector做数据采集

对应的修改:
...
span_processor = BatchSpanProcessor(OTLPSpanExporter(endpoint="http://127.0.0.1:4318/v1/traces"))
...
docker run -d --name=otel-collector \
-v ./otel-collector-config.yaml:/etc/otelcol/config.yaml \
-p 4317:4317 \
-p 4318:4318 \
otel/opentelemetry-collector:latest
配置完成,有位老哥说了,为啥要这么配置,本来我直接发到jaeger-collector就行了,现在多加一层otel-collector,多做了一层无用功,完全没必要啊
这位老哥的思路非常清晰,现在来仔细观察下otel-collector与jaeger-collector的区别
| otel-collector | jaeger-collector | |
|---|---|---|
| 作用范围 | traces、metrics | 只支持traces |
| 协议 | OTLP、prometheus、zipkin等多种协议 | jaeger thrift、jaeger grpc,新版本也支持OTLP |
| 后端支持 | 可以发到支持otel的后端,比如tempo、prometheus、logging,甚至是jaeger-collector | 只能发到 Jaeger Collector |

修改otel-collector-config.yaml
receivers:
...
hostmetrics:
collection_interval: 10s
scrapers:
cpu: {}
memory: {}
disk: {}
filesystem: {}
network: {}
exporters:
...
prometheus:
endpoint: "0.0.0.0:9464"
namespace: otelcol
service:
...
metrics:
receivers: [hostmetrics]
exporters: [prometheus]
暴露9464端口,等prometheus来拉取
docker run -d --name=otel-collector \
-v ./otel-collector-config.yaml:/etc/otelcol/config.yaml \
-p 4317:4317 \
-p 4318:4318 \
-p 9464:9464 \
otel/otel-collector:latest
prometheus.yml
global:
scrape_interval: 5s
scrape_configs:
- job_name: "otel-collector"
static_configs:
- targets: ["10.22.12.178:9464"]
docker run -d --name prometheus \
-p 9090:9090 \
-v ./prometheus.yml:/etc/prometheus/prometheus.yml \
prom/prometheus:v3.5.0
检查prometheus,metrics数据已经获取

将traces数据转换为metrics,比如文中有3段span phase-1 phase-2 phase-3,分别将它们的耗时时间转换成metrics存入prometheus,便于分析
1)修改otel-collector-config.yaml
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
hostmetrics:
collection_interval: 10s
scrapers:
cpu: {}
memory: {}
disk: {}
filesystem: {}
network: {}
connectors:
spanmetrics:
dimensions:
- name: operation
exporters:
otlp:
endpoint: 10.22.12.178:14317
tls:
insecure: true
prometheus:
endpoint: "0.0.0.0:9464"
namespace: otelcol
service:
pipelines:
traces:
receivers: [otlp]
exporters: [otlp, spanmetrics]
metrics:
receivers: [hostmetrics, spanmetrics]
exporters: [prometheus]
2)重新运行镜像
docker run -d --name=otel-collector \
-v ./otel-collector-config.yaml:/etc/otelcol-contrib/config.yaml \
-p 4317:4317 \
-p 4318:4318 \
-p 9464:9464 \
otel/opentelemetry-collector-contrib:0.132.3
这里需要非常小心了,由于需要对数据处理,使用了spanmetrics插件,而该插件只能在opentelemetry-collector-contrib才有,如果用opentelemetry-collector是没有的
3)上报trace数据: curl 127.0.0.1:10000,查看prometheus


耗时也是能够对应起来的
1)先修改下采集程序,注入attribute
...
def traced(name):
def decorator(func):
def wrapper(*args, **kwargs):
tracer = get_tracer(__name__)
with tracer.start_as_current_span(name) as span:
span.set_attribute("addr", "cd") # 注入属性
return func(*args, **kwargs)
return wrapper
return decorator
...
2)修改otel-collector配置
otel-collector-config.yaml
...
connectors:
spanmetrics:
dimensions:
- name: operation
- name: addr # 提取属性
...
3)上报trace数据: curl 127.0.0.1:10000,查看prometheus

jaeger-collector --> es storage --> jaeger-UIotel-collector,使得整个数据采集更灵活,不但可以采集traces、也可以采集metricsotel-collector不但做数据转发,也可以做数据修改至此,本文结束
在下才疏学浅,有撒汤漏水的,请各位不吝赐教...
本文来自博客园,作者:it排球君,转载请注明原文链接:https://www.cnblogs.com/MrVolleyball/p/19236255
本文版权归作者和博客园共有,欢迎转载,但未经作者同意必须在文章页面给出原文连接,否则保留追究法律责任的权利。
