Ever wonder why a single user request can take half a second in one service and five seconds in another, with zero visibility into the gaps? Distributed tracing bridges the blind spots between microservices. In this guide, you’ll instrument a Spring Boot application with OpenTelemetry leveraging both zero-code auto-instrumentation and manual spans to see every RPC call, database query, and downstream HTTP request in a single timeline.
When your app spans multiple services, a tiny delay in one call can blast your overall response time and finding that delay by hunting through logs is slow and error-prone. OpenTelemetry gives you a clear, end-to-end timeline of every HTTP request, database query, and background job. With this visibility you can:
- Quickly locate the exact service or database call causing slowdowns
- Spot error spikes or retry storms before they impact users
- Attach business details (order IDs, user emails) directly to traces
- Fix problems in minutes instead of days
This isn’t theoretical, these practices come from real production systems running at scale. Implement them once, and you’ll never waste time guessing where your app is slowing down.
Prerequisites & Project Setup
Before you write a single line of code, ensure:
- JDK 17+ for robust TLS 1.3 support and sealed-class DTOs if you choose.
- Maven 3.8+ to leverage the latest BOM import semantics.
- Spring Boot 3.1.x starter project.
POM excerpt, using the 1.30.x OpenTelemetry BOM:
<dependencyManagement>
<dependencies>
<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-bom</artifactId>
<version>1.30.0</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<dependencies>
<!-- Spring Boot web -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<!-- OpenTelemetry SDK & API -->
<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-api</artifactId>
</dependency>
<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-sdk</artifactId>
</dependency>
<!-- OTLP exporter -->
<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-exporter-otlp</artifactId>
</dependency>
<!-- Optional: Micrometer bridge -->
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-tracing-bridge-otel</artifactId>
</dependency>
</dependencies>
This BOM locks compatible versions of api, sdk, and exporters—avoiding classpath conflicts down the road.
Auto-Instrumentation with the Java Agent
You can get nearly complete coverage without touching your code—ideal for a fast proof-of-concept or retrofitting legacy services. Simply download the opentelemetry-javaagent.jar from Maven Central or the GitHub release, then launch your service with:
-javaagent:path/to/opentelemetry-javaagent.jar -Dotel.exporter.otlp.endpoint=http://localhost:4317 -Dotel.service.name=order-service
That’s it. The agent instruments common libraries—Spring MVC, RestTemplate, JDBC, and more—by weaving in span start/end calls at runtime. You’ll immediately see spans for incoming HTTP requests, @Repository JDBC calls, and outgoing RestTemplate calls in Jaeger.
Manual Spans & Context Propagation
Sometimes you need to trace a business transaction spanning multiple non-HTTP steps, like Kafka processing or async work. Manual spans give you that fine-grained control:
@Service
public class PaymentService {
private static final Tracer tracer = GlobalOpenTelemetry.getTracer("payment-service");
public void processPayment(PaymentRequest req) {
Span parent = tracer.spanBuilder("processPayment")
.setSpanKind(SpanKind.INTERNAL)
.startSpan();
try (Scope scope = parent.makeCurrent()) {
parent.setAttribute("payment.id", req.getId());
Span downstream = tracer.spanBuilder("callBillingService")
.setParent(Context.current())
.startSpan();
try (Scope s2 = downstream.makeCurrent()) {
billingClient.charge(req);
} finally {
downstream.end();
}
// additional business logic...
} finally {
parent.end();
}
}
}
Use attributes to attach metadata for filtering (user IDs, amounts), choose the correct SpanKind to differentiate internal work from client/server calls, and always propagate the current context so child spans link correctly.
Exporters & Backends
Metrics are only useful if you can query or visualize them. Send traces via OTLP to Jaeger, and metrics via Micrometer to Prometheus.
OTLP to Jaeger
otel:
exporter:
otlp:
endpoint: http://localhost:4317
timeout: 10s
protocol: grpc
Startup flags:
-Dotel.traces.exporter=otlp -Dotel.metrics.exporter=prometheus
Prometheus Metrics
Spring Boot Actuator exposes /actuator/prometheus automatically with the Micrometer bridge. In application.yml:
management:
endpoints:
web:
exposure:
include: health,info,prometheus
Visualizing Traces in Jaeger
Jaeger’s UI lets you spot service bottlenecks in seconds. Spin up Jaeger with Docker:
docker run -d --name jaeger \ -e COLLECTOR_ZIPKIN_HTTP_PORT=9411 \ -p 16686:16686 -p 4317:4317 jaegertracing/all-in-one:latest
Then visit http://localhost:16686, search for order-service, and explore:
- Flame graph: shows time spent in each span hierarchy.
- Service dependency map: verifies your microservice topology.
- Span details: reveals attributes, events, and errors.
Performance Impact & Sampling
Full tracing on every request in a high-QPS environment can overwhelm your backend. Sampling lets you control data volume without losing sight of issues.
Probabilistic Sampling
otel:
sampling:
probability: 0.25 # trace 25% of requests
Tail-Based Sampling
Configure this in your collector to sample spans after they complete, focusing on errors or high-latency traces. Adjust rates based on critical versus bulk data paths.
Advanced: Custom Metrics & Alerts
Beyond traces, you may need business-level KPIs like orders processed per minute or error rates by endpoint:
@Component
public class OrderMetrics {
private final Counter orderCounter;
private final Timer processingTimer;
public OrderMetrics(MeterRegistry registry) {
this.orderCounter = Counter.builder("orders.processed")
.description("Total orders processed")
.register(registry);
this.processingTimer = Timer.builder("orders.processing.time")
.publishPercentiles(0.5,0.95,0.99)
.register(registry);
}
public void record(Order order, Runnable process) {
orderCounter.increment();
processingTimer.record(process);
}
}
Push these metrics to Prometheus and configure alerts (e.g., P95 latency > 2 s triggers PagerDuty).
Testing Your Instrumentation
It’s one thing to spin up Jaeger locally; it’s another to validate spans in CI. Use the opentelemetry-sdk-testing to assert spans programmatically:
@ExtendWith(OpenTelemetryExtension.class)
class TracingTest {
@RegisterExtension
static InMemorySpanExporter spanExporter = InMemorySpanExporter.create();
@Test
void testHttpSpan(MockMvc mvc) throws Exception {
mvc.perform(get("/api/orders/123"))
.andExpect(status().isOk());
List<SpanData> spans = spanExporter.getFinishedSpanItems();
assertThat(spans).anySatisfy(span ->
assertThat(span.getName()).isEqualTo("GET /api/orders/{id}")
);
}
}
This ensures your auto-instrumentation covers REST endpoints and manual spans fire correctly.
Deployment Best Practices
Tracing in production demands resilience. Collector outages, network glitches, or misconfigured exporters shouldn’t take down your service.
- Secure OTLP gRPC with mTLS to protect your trace data.
- Run multiple OTLP collectors behind a load balancer.
- Set short exporter timeouts so blocked calls don’t stall requests.
- Wrap the OTLP exporter in a circuit breaker (Resilience4j) to drop spans under extreme load.






