Spring Boot AOP + LLM: Otomatik Hata Analiz Sistemi
Production'da bug'ları debug etmek zaman alıyor. Controller'dan service layer'a kadar full call chain'i trace edip, toplanan hata loglarını LLM ile analiz ederek otomatik root cause tespiti yapan bir sistem geliştirdim. Spring Boot AOP + Ollama kombinasyonu ile observability'yi bir üst seviyeye taşıdım.
Problem: Production Bug Debugging Süreci
Tipik bir production hata senaryosu şöyle gelişir:
- Kullanıcı hata bildiriyor
- Log dosyalarını manuel olarak tarıyorsun
- Stack trace'den exception'ı görüyorsun
- Ama hangi servisin, hangi metodu çağırırken, hangi parametrelerle hata verdiğini bulamıyorsun
- Log level'ı DEBUG'a alıp tekrar test ediyorsun
- Sorunu bulana kadar saatler harcıyorsun
Bu süreç hem zaman alıcı hem de frustrating. Özellikle mikroservis mimarisinde call chain birden fazla servisi kapsadığında, hata kaynağını bulmak çok zorlaşıyor.
Çözüm: AOP Tabanlı Dynamic Tracing + AI Analysis
Sistem Mimarisi:
- Spring AOP: Her method call'u intercept et, trace kaydet
- Trace ID Correlation: Request'ten response'a kadar tüm call'ları ilişkilendir
- On-demand Activation:
x-trace-enable: trueheader ile aktifleştir - LLM Analysis: Toplanan trace'leri Ollama'ya gönder, root cause tespit et
- UI Dashboard: Sonuçları görselleştir
Implementation: Step by Step
1. Trace Entity ve Model
@Entity
@Table(name = "api_traces")
@Data
public class ApiTrace {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;
@Column(name = "trace_id", nullable = false, unique = true)
private String traceId; // UUID for correlation
@Column(name = "request_path")
private String requestPath; // /api/declaration/save
@Column(name = "http_method")
private String httpMethod; // POST, GET, etc.
@Column(name = "class_name")
private String className; // com.example.DeclarationController
@Column(name = "method_name")
private String methodName; // saveDeclaration
@Column(columnDefinition = "TEXT")
private String parameters; // JSON serialized params
@Column(columnDefinition = "TEXT")
private String result; // Return value (if exists)
@Column(columnDefinition = "TEXT")
private String exception; // Exception message + stack trace
@Column(name = "execution_time_ms")
private Long executionTimeMs;
@Column(name = "created_at")
private LocalDateTime createdAt;
@Column(name = "user_id")
private String userId;
@Column(name = "session_id")
private String sessionId;
}
2. AOP Aspect Implementation
@Aspect
@Component
@Slf4j
public class ApiTraceAspect {
@Autowired
private ApiTraceRepository traceRepository;
@Autowired
private ObjectMapper objectMapper;
private static final ThreadLocal TRACE_ID = new ThreadLocal<>();
// Intercept all Controller methods
@Around("execution(* com.experilabs..controller..*(..))")
public Object traceController(ProceedingJoinPoint joinPoint) throws Throwable {
return traceMethod(joinPoint, "CONTROLLER");
}
// Intercept all Service methods
@Around("execution(* com.experilabs..service..*(..))")
public Object traceService(ProceedingJoinPoint joinPoint) throws Throwable {
return traceMethod(joinPoint, "SERVICE");
}
// Intercept all Repository methods
@Around("execution(* com.experilabs..repository..*(..))")
public Object traceRepository(ProceedingJoinPoint joinPoint) throws Throwable {
return traceMethod(joinPoint, "REPOSITORY");
}
private Object traceMethod(ProceedingJoinPoint joinPoint, String layer) throws Throwable {
// Check if tracing is enabled via header
HttpServletRequest request =
((ServletRequestAttributes) RequestContextHolder.getRequestAttributes())
.getRequest();
String traceHeader = request.getHeader("x-trace-enable");
if (!"true".equalsIgnoreCase(traceHeader)) {
// Tracing disabled, proceed normally (no performance impact)
return joinPoint.proceed();
}
// Generate or reuse trace ID
String traceId = TRACE_ID.get();
if (traceId == null) {
traceId = UUID.randomUUID().toString();
TRACE_ID.set(traceId);
}
// Capture method info
String className = joinPoint.getSignature().getDeclaringTypeName();
String methodName = joinPoint.getSignature().getName();
Object[] args = joinPoint.getArgs();
ApiTrace trace = new ApiTrace();
trace.setTraceId(traceId);
trace.setClassName(className);
trace.setMethodName(methodName);
trace.setRequestPath(request.getRequestURI());
trace.setHttpMethod(request.getMethod());
trace.setUserId(getCurrentUserId());
trace.setSessionId(request.getSession().getId());
try {
// Serialize parameters (sanitize sensitive data)
trace.setParameters(sanitizeAndSerialize(args));
} catch (Exception e) {
trace.setParameters("Serialization failed: " + e.getMessage());
}
long startTime = System.currentTimeMillis();
Object result = null;
Exception exception = null;
try {
// Execute the actual method
result = joinPoint.proceed();
// Capture result (limit size)
try {
String resultJson = objectMapper.writeValueAsString(result);
if (resultJson.length() > 5000) {
resultJson = resultJson.substring(0, 5000) + "... (truncated)";
}
trace.setResult(resultJson);
} catch (Exception e) {
trace.setResult("Result serialization failed");
}
return result;
} catch (Exception e) {
exception = e;
// Capture full stack trace
StringWriter sw = new StringWriter();
e.printStackTrace(new PrintWriter(sw));
trace.setException(sw.toString());
throw e;
} finally {
long endTime = System.currentTimeMillis();
trace.setExecutionTimeMs(endTime - startTime);
trace.setCreatedAt(LocalDateTime.now());
// Save trace asynchronously (don't block main flow)
CompletableFuture.runAsync(() -> {
try {
traceRepository.save(trace);
log.debug("Trace saved: {} - {}.{} ({}ms)",
traceId, className, methodName, trace.getExecutionTimeMs());
} catch (Exception e) {
log.error("Failed to save trace", e);
}
});
// Clean up ThreadLocal if this is the last method in chain
if (layer.equals("CONTROLLER")) {
TRACE_ID.remove();
}
}
}
private String sanitizeAndSerialize(Object[] args) throws JsonProcessingException {
// Remove sensitive data (password, token, etc.)
List
3. LLM Error Analysis Service
@Service
@Slf4j
public class LlmErrorAnalysisService {
private static final String OLLAMA_API_URL = "http://localhost:11434/api/generate";
private static final String MODEL = "mistral-small:24b";
@Autowired
private RestTemplate restTemplate;
@Autowired
private ObjectMapper objectMapper;
@Autowired
private ApiTraceRepository traceRepository;
public ErrorAnalysisResult analyzeTrace(String traceId) {
// Fetch all traces with this ID (full call chain)
List traces = traceRepository.findByTraceIdOrderByCreatedAtAsc(traceId);
if (traces.isEmpty()) {
return ErrorAnalysisResult.notFound();
}
// Build context for LLM
String traceContext = buildTraceContext(traces);
// Prompt for LLM
String prompt = String.format(
"Analyze this API trace and identify the root cause of the error.\n\n" +
"TRACE DATA:\n%s\n\n" +
"Provide:\n" +
"1. Root Cause (1-2 sentences)\n" +
"2. Exact location (class + method)\n" +
"3. Suggested fix (code snippet if possible)\n" +
"4. Prevention tips\n\n" +
"Format your response as JSON:\n" +
"{\n" +
" \"rootCause\": \"...\",\n" +
" \"location\": \"ClassName.methodName\",\n" +
" \"suggestedFix\": \"...\",\n" +
" \"preventionTips\": [\"tip1\", \"tip2\"]\n" +
"}",
traceContext
);
try {
// Call Ollama API
Map request = new HashMap<>();
request.put("model", MODEL);
request.put("prompt", prompt);
request.put("stream", false);
request.put("format", "json");
request.put("options", Map.of("temperature", 0.1)); // Low temp for factual analysis
log.info("Sending trace {} to LLM for analysis", traceId);
long startTime = System.currentTimeMillis();
ResponseEntity response = restTemplate.postForEntity(
OLLAMA_API_URL,
request,
String.class
);
long duration = System.currentTimeMillis() - startTime;
log.info("LLM analysis completed in {}ms", duration);
// Parse LLM response
JsonNode root = objectMapper.readTree(response.getBody());
String llmOutput = root.get("response").asText();
// Clean JSON (remove markdown fences if present)
llmOutput = llmOutput.replaceAll("```json\\n?", "").replaceAll("```\\n?", "").trim();
ErrorAnalysisResult result = objectMapper.readValue(llmOutput, ErrorAnalysisResult.class);
result.setTraceId(traceId);
result.setAnalysisTimeMs(duration);
return result;
} catch (Exception e) {
log.error("LLM analysis failed for trace " + traceId, e);
return ErrorAnalysisResult.error("Analysis failed: " + e.getMessage());
}
}
private String buildTraceContext(List traces) {
StringBuilder context = new StringBuilder();
context.append("=== API CALL CHAIN ===\n\n");
for (int i = 0; i < traces.size(); i++) {
ApiTrace trace = traces.get(i);
context.append(String.format("%d. %s.%s\n",
i + 1,
trace.getClassName(),
trace.getMethodName()
));
context.append(String.format(" Parameters: %s\n", trace.getParameters()));
context.append(String.format(" Execution Time: %dms\n", trace.getExecutionTimeMs()));
if (trace.getException() != null) {
context.append(String.format(" ❌ EXCEPTION:\n%s\n",
limitStackTrace(trace.getException(), 30))); // First 30 lines
} else if (trace.getResult() != null) {
context.append(String.format(" ✅ Result: %s\n",
trace.getResult().substring(0, Math.min(200, trace.getResult().length()))));
}
context.append("\n");
}
return context.toString();
}
private String limitStackTrace(String stackTrace, int maxLines) {
String[] lines = stackTrace.split("\n");
if (lines.length <= maxLines) {
return stackTrace;
}
return String.join("\n", Arrays.copyOf(lines, maxLines)) +
"\n... (" + (lines.length - maxLines) + " more lines)";
}
}
@Data
class ErrorAnalysisResult {
private String traceId;
private String rootCause;
private String location;
private String suggestedFix;
private List preventionTips;
private Long analysisTimeMs;
private String error;
public static ErrorAnalysisResult notFound() {
ErrorAnalysisResult result = new ErrorAnalysisResult();
result.setError("Trace not found");
return result;
}
public static ErrorAnalysisResult error(String message) {
ErrorAnalysisResult result = new ErrorAnalysisResult();
result.setError(message);
return result;
}
}
4. REST Controller
@RestController
@RequestMapping("/api/trace")
@Slf4j
public class TraceController {
@Autowired
private ApiTraceRepository traceRepository;
@Autowired
private LlmErrorAnalysisService analysisService;
@GetMapping("/{traceId}")
public ResponseEntity> getTrace(@PathVariable String traceId) {
List traces = traceRepository.findByTraceIdOrderByCreatedAtAsc(traceId);
return ResponseEntity.ok(traces);
}
@GetMapping("/{traceId}/analyze")
public ResponseEntity analyzeTrace(@PathVariable String traceId) {
ErrorAnalysisResult result = analysisService.analyzeTrace(traceId);
if (result.getError() != null) {
return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR).body(result);
}
return ResponseEntity.ok(result);
}
@GetMapping("/recent")
public ResponseEntity> getRecentErrors(
@RequestParam(defaultValue = "20") int limit) {
Pageable pageable = PageRequest.of(0, limit, Sort.by("createdAt").descending());
List traces = traceRepository.findByExceptionIsNotNull(pageable);
return ResponseEntity.ok(traces);
}
}
Performance Optimization: On-Demand Tracing
AOP her method call'da çalışıyor ama tracing sadece header ile aktifleşiyor. Bu sayede production performansı korunuyor:
Kullanım Örnekleri:
# Normal request (no tracing)
curl -X POST http://localhost:8080/api/declaration/save \
-H "Content-Type: application/json" \
-d '{"data": "..."}'
# Request with tracing enabled
curl -X POST http://localhost:8080/api/declaration/save \
-H "Content-Type: application/json" \
-H "x-trace-enable: true" \
-d '{"data": "..."}'
# Get trace details
curl http://localhost:8080/api/trace/{traceId}
# Analyze trace with LLM
curl http://localhost:8080/api/trace/{traceId}/analyze
Gerçek Dünya Örneği: NullPointerException Root Cause
Senaryo:
Kullanıcı beyanname kaydederken hata alıyor. Trace ID: a7f3c2d1-4b5e-9876-abcd-1234567890ef
Trace Chain:
1. DeclarationController.saveDeclaration
Parameters: [DeclarationDTO@a1b2c3]
Execution Time: 45ms
✅ Result: Success
2. DeclarationService.save
Parameters: [DeclarationDTO@a1b2c3]
Execution Time: 42ms
❌ EXCEPTION: NullPointerException
3. CompanyService.validateCompany
Parameters: [null]
Execution Time: 2ms
❌ EXCEPTION: NullPointerException at line 85
LLM Analysis Result:
{
"rootCause": "CompanyService.validateCompany received null parameter because DeclarationDTO.companyId was null. No null check before service call.",
"location": "DeclarationService.save (line ~120)",
"suggestedFix": "Add null check:\n\nif (dto.getCompanyId() == null) {\n throw new ValidationException(\"Company ID is required\");\n}\ncompanyService.validateCompany(dto.getCompanyId());",
"preventionTips": [
"Add @NotNull validation on DeclarationDTO.companyId field",
"Use Optional for nullable IDs",
"Add unit test for null company scenario",
"Enable JSR-303 validation in controller"
],
"analysisTimeMs": 1850
}
LLM sadece hatayı bulmakla kalmadı, fix önerisi ve prevention tips de verdi! Bu, junior developer'ların bile hızlıca problemi çözmesini sağlıyor.
UI Dashboard (React Frontend)
Trace'leri ve LLM analizlerini görselleştirmek için basit bir React dashboard geliştirdim:
import React, { useState } from 'react';
import axios from 'axios';
export default function TraceAnalyzer() {
const [traceId, setTraceId] = useState('');
const [traces, setTraces] = useState([]);
const [analysis, setAnalysis] = useState(null);
const [loading, setLoading] = useState(false);
const fetchTrace = async () => {
setLoading(true);
try {
const response = await axios.get(`/api/trace/${traceId}`);
setTraces(response.data);
} catch (error) {
console.error('Failed to fetch trace', error);
}
setLoading(false);
};
const analyzeTrace = async () => {
setLoading(true);
try {
const response = await axios.get(`/api/trace/${traceId}/analyze`);
setAnalysis(response.data);
} catch (error) {
console.error('Failed to analyze trace', error);
}
setLoading(false);
};
return (
🔍 API Trace Analyzer
setTraceId(e.target.value)}
placeholder="Enter Trace ID..."
/>
{traces.length > 0 && (
📞 Call Chain
{traces.map((trace, idx) => (
{idx + 1}
{trace.className}.{trace.methodName}
{trace.executionTimeMs}ms
Parameters: {trace.parameters}
{trace.exception && (
❌ Exception: {trace.exception}
)}
))}
)}
{analysis && (
🤖 AI Analysis
🔍 Root Cause
{analysis.rootCause}
📍 Location
{analysis.location}
💡 Suggested Fix
{analysis.suggestedFix}
🛡️ Prevention Tips
{analysis.preventionTips.map((tip, idx) => (
- {tip}
))}
Analysis completed in {analysis.analysisTimeMs}ms
)}
);
}
Production Deployment Considerations
⚠️ Dikkat Edilecekler:
- Database size: Trace'ler hızla büyüyor, retention policy belirleyin (örn: 7 gün)
- Sensitive data: Password, token gibi bilgileri mutlaka sanitize edin
- Performance: Async save kullanın, main flow bloke etmeyin
- GDPR compliance: User data içeren trace'leri anonim hale getirin
- LLM cost: Local Ollama kullanın, cloud LLM'e güvenmeyin
✅ Best Practices:
- Trace retention: 7 gün (production), 30 gün (staging)
- Auto-cleanup job: Daily cron ile eski trace'leri sil
- Index optimization: trace_id, created_at, exception üzerinde index
- Circuit breaker: LLM analysis fail ederse fallback mechanism
- Rate limiting: Trace save işlemini limit'le (abuse prevention)
Gelecek İyileştirmeler
1. Distributed Tracing (OpenTelemetry)
Mikroservisler arası trace'leri OpenTelemetry ile ilişkilendireceğim. Böylece cross-service call chain'i de görebileceğiz.
2. Automatic Fix Application
LLM'in önerdiği fix'i otomatik olarak pull request açacak şekilde entegre edeceğim. GitHub Actions ile CI/CD pipeline'a bağlayacağım.
3. Anomaly Detection
Trace pattern'lerini ML ile analiz edip anomaly detection yapacağım. Örn: Sudden spike in execution time → Alert.
Sonuç
Spring Boot AOP + Local LLM kombinasyonu ile production debugging sürecini devrim niteliğinde iyileştirdim:
- ✅ Full call chain visibility: Controller'dan repository'ye tüm flow görünür
- ✅ On-demand activation: Production performance etkilenmiyor
- ✅ AI-powered root cause detection: Otomatik hata analizi
- ✅ Actionable suggestions: Code fix + prevention tips
- ✅ $0 cost: Local Ollama, cloud API yok
Debug süresi saatlerden dakikalara düştü. Junior developer'lar bile complex bug'ları LLM yardımıyla çözebiliyor.