2 comments
Relatedly: Has anyone profiled the performance and reliability characteristics of rsyslogd (Linux and FreeBSD distributed syslogger, maybe other platforms too) in its mode where it’s shipping logs to a central node? I’ve configured and used it with relatively small (high single digit nodes, bursts of activity to a million or two requests per minute or so) set-ups but have wondered if there’s a reason it’s not a more common solution for distributed logging and tracing (yes it doesn’t solve the UI problem for those, but it does solve collecting your logs)<p>Like… has anyone done a Jepsen-like stress test on rsyslogd and shared the results? I’ve half-assedly looked before and not been able to find anything.
> Continuously capturing low-overhead performance profiles in production<p>It suprises me that anything designed by the OTel community could ever meet 'low-overhead' expectations.
The reference implementation of the profiler [1] was originally built by the Optimyze team that Elastic then acquired (and donated to OTEL). That team is very good at what they do. For example, they invented the .eh_frame walking technique to get stack traces from binaries without frame pointers enabled.<p>Some of the OGs from that team later founded Zymtrace [2] and they're doing the same for profiling what happens <i>inside</i> GPUs now!<p>[1] <a href="https://github.com/open-telemetry/opentelemetry-ebpf-profiler" rel="nofollow">https://github.com/open-telemetry/opentelemetry-ebpf-profile...</a><p>[2] <a href="https://zymtrace.com/article/zero-friction-gpu-profiler/" rel="nofollow">https://zymtrace.com/article/zero-friction-gpu-profiler/</a>
OTel Profiling SIG maintainer here: I understand your concern, but we’ve tried our best to make things efficient across the protocol and all
involved components.<p>Please let us know if you find any issues with what we are shipping right now.
Anything to actually add?