Spring Boot OpenTelemetry

Spring Boot + OpenTelemetry: A Full Tutorial for Distributed Tracing

Ever wonder why a single user request can take half a second in one service and five seconds in another, with zero visibility into the gaps? Distributed tracing bridges the blind spots between microservices. In this guide, you’ll instrument a Spring Boot application with OpenTelemetry leveraging both zero-code auto-instrumentation and manual spans to see every RPC call, database query, and downstream HTTP request in a single timeline.

When your app spans multiple services, a tiny delay in one call can blast your overall response time and finding that delay by hunting through logs is slow and error-prone. OpenTelemetry gives you a clear, end-to-end timeline of every HTTP request, database query, and background job. With this visibility you can:

  • Quickly locate the exact service or database call causing slowdowns
  • Spot error spikes or retry storms before they impact users
  • Attach business details (order IDs, user emails) directly to traces
  • Fix problems in minutes instead of days

This isn’t theoretical, these practices come from real production systems running at scale. Implement them once, and you’ll never waste time guessing where your app is slowing down.

Prerequisites & Project Setup

Before you write a single line of code, ensure:

  • JDK 17+ for robust TLS 1.3 support and sealed-class DTOs if you choose.
  • Maven 3.8+ to leverage the latest BOM import semantics.
  • Spring Boot 3.1.x starter project.

POM excerpt, using the 1.30.x OpenTelemetry BOM:

<dependencyManagement>
  <dependencies>
    <dependency>
      <groupId>io.opentelemetry</groupId>
      <artifactId>opentelemetry-bom</artifactId>
      <version>1.30.0</version>
      <type>pom</type>
      <scope>import</scope>
    </dependency>
  </dependencies>
</dependencyManagement>

<dependencies>
  <!-- Spring Boot web -->
  <dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-web</artifactId>
  </dependency>
  <!-- OpenTelemetry SDK & API -->
  <dependency>
    <groupId>io.opentelemetry</groupId>
    <artifactId>opentelemetry-api</artifactId>
  </dependency>
  <dependency>
    <groupId>io.opentelemetry</groupId>
    <artifactId>opentelemetry-sdk</artifactId>
  </dependency>
  <!-- OTLP exporter -->
  <dependency>
    <groupId>io.opentelemetry</groupId>
    <artifactId>opentelemetry-exporter-otlp</artifactId>
  </dependency>
  <!-- Optional: Micrometer bridge -->
  <dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-tracing-bridge-otel</artifactId>
  </dependency>
</dependencies>

This BOM locks compatible versions of api, sdk, and exporters—avoiding classpath conflicts down the road.

Auto-Instrumentation with the Java Agent

You can get nearly complete coverage without touching your code—ideal for a fast proof-of-concept or retrofitting legacy services. Simply download the opentelemetry-javaagent.jar from Maven Central or the GitHub release, then launch your service with:

-javaagent:path/to/opentelemetry-javaagent.jar
-Dotel.exporter.otlp.endpoint=http://localhost:4317
-Dotel.service.name=order-service

That’s it. The agent instruments common libraries—Spring MVC, RestTemplate, JDBC, and more—by weaving in span start/end calls at runtime. You’ll immediately see spans for incoming HTTP requests, @Repository JDBC calls, and outgoing RestTemplate calls in Jaeger.

Manual Spans & Context Propagation

Sometimes you need to trace a business transaction spanning multiple non-HTTP steps, like Kafka processing or async work. Manual spans give you that fine-grained control:

@Service
public class PaymentService {
  private static final Tracer tracer = GlobalOpenTelemetry.getTracer("payment-service");

  public void processPayment(PaymentRequest req) {
    Span parent = tracer.spanBuilder("processPayment")
                        .setSpanKind(SpanKind.INTERNAL)
                        .startSpan();
    try (Scope scope = parent.makeCurrent()) {
      parent.setAttribute("payment.id", req.getId());

      Span downstream = tracer.spanBuilder("callBillingService")
                               .setParent(Context.current())
                               .startSpan();
      try (Scope s2 = downstream.makeCurrent()) {
        billingClient.charge(req);
      } finally {
        downstream.end();
      }

      // additional business logic...
    } finally {
      parent.end();
    }
  }
}

Use attributes to attach metadata for filtering (user IDs, amounts), choose the correct SpanKind to differentiate internal work from client/server calls, and always propagate the current context so child spans link correctly.

Exporters & Backends

Metrics are only useful if you can query or visualize them. Send traces via OTLP to Jaeger, and metrics via Micrometer to Prometheus.

OTLP to Jaeger

otel:
  exporter:
    otlp:
      endpoint: http://localhost:4317
      timeout: 10s
      protocol: grpc

Startup flags:

-Dotel.traces.exporter=otlp
-Dotel.metrics.exporter=prometheus

Prometheus Metrics

Spring Boot Actuator exposes /actuator/prometheus automatically with the Micrometer bridge. In application.yml:

management:
  endpoints:
    web:
      exposure:
        include: health,info,prometheus

Visualizing Traces in Jaeger

Jaeger’s UI lets you spot service bottlenecks in seconds. Spin up Jaeger with Docker:

docker run -d --name jaeger \
  -e COLLECTOR_ZIPKIN_HTTP_PORT=9411 \
  -p 16686:16686 -p 4317:4317 jaegertracing/all-in-one:latest

Then visit http://localhost:16686, search for order-service, and explore:

  • Flame graph: shows time spent in each span hierarchy.
  • Service dependency map: verifies your microservice topology.
  • Span details: reveals attributes, events, and errors.

Performance Impact & Sampling

Full tracing on every request in a high-QPS environment can overwhelm your backend. Sampling lets you control data volume without losing sight of issues.

Probabilistic Sampling

otel:
  sampling:
    probability: 0.25  # trace 25% of requests

Tail-Based Sampling

Configure this in your collector to sample spans after they complete, focusing on errors or high-latency traces. Adjust rates based on critical versus bulk data paths.

Advanced: Custom Metrics & Alerts

Beyond traces, you may need business-level KPIs like orders processed per minute or error rates by endpoint:

@Component
public class OrderMetrics {
  private final Counter orderCounter;
  private final Timer processingTimer;

  public OrderMetrics(MeterRegistry registry) {
    this.orderCounter = Counter.builder("orders.processed")
                               .description("Total orders processed")
                               .register(registry);
    this.processingTimer = Timer.builder("orders.processing.time")
                                .publishPercentiles(0.5,0.95,0.99)
                                .register(registry);
  }

  public void record(Order order, Runnable process) {
    orderCounter.increment();
    processingTimer.record(process);
  }
}

Push these metrics to Prometheus and configure alerts (e.g., P95 latency > 2 s triggers PagerDuty).

Testing Your Instrumentation

It’s one thing to spin up Jaeger locally; it’s another to validate spans in CI. Use the opentelemetry-sdk-testing to assert spans programmatically:

@ExtendWith(OpenTelemetryExtension.class)
class TracingTest {
  @RegisterExtension
  static InMemorySpanExporter spanExporter = InMemorySpanExporter.create();

  @Test
  void testHttpSpan(MockMvc mvc) throws Exception {
    mvc.perform(get("/api/orders/123"))
       .andExpect(status().isOk());
    List<SpanData> spans = spanExporter.getFinishedSpanItems();
    assertThat(spans).anySatisfy(span ->
      assertThat(span.getName()).isEqualTo("GET /api/orders/{id}")
    );
  }
}

This ensures your auto-instrumentation covers REST endpoints and manual spans fire correctly.

Deployment Best Practices

Tracing in production demands resilience. Collector outages, network glitches, or misconfigured exporters shouldn’t take down your service.

  • Secure OTLP gRPC with mTLS to protect your trace data.
  • Run multiple OTLP collectors behind a load balancer.
  • Set short exporter timeouts so blocked calls don’t stall requests.
  • Wrap the OTLP exporter in a circuit breaker (Resilience4j) to drop spans under extreme load.