| This is the final part of a three-part series by Markus Ezell. Part 1 can be found here hereand part 2 here. |
In the first article, we looked at the Java developer’s dilemma: the gap between flashy prototypes and the reality of enterprise production systems. In the second article, we explore why new types of applications are needed, and how AI is changing the landscape of enterprise software. This article focuses on what these changes mean for architecture. If apps look different, the way we build them must change too.
Traditional Java Enterprise Stack
Enterprise Java applications have always been about architecture. A typical system is built on a set of layers. At the bottom is persistence, often using JPA or JDBC. Business logic works on top of this, enforcing rules and processes. At the top are REST or messaging endpoints that expose services to the outside world. Cross-cutting concerns such as transactions, security, and observability are implemented through the stack. This model has proven its durability. It has taken Java from the early servlet days to modern frameworks such as Quarkus, Spring bootand Micronaut.
The success of this structure comes from clarity. Each layer has a clear responsibility. The application is predictable and maintainable because you know where to add logic, where to enforce policies, and where to connect monitoring. Adding AI does not remove these layers. But it adds new assumptions, because AI behavior does not fit the precise assumptions of deterministic software.
New layers in AI-powered applications
AI is changing architecture by introducing layers that did not previously exist in deterministic systems. Three of the most important are fuzzy verification, context-sensitive guardrails, and observability of model behavior. In practice, you’ll encounter more components, but validation and observability are the foundation of what makes AI safe in production.
Validation and handrails
Traditional Java implementations assume that input can be validated. You can check whether the number is within the range, whether the string is not empty, or whether the order matches the scheme. Once you validate it, you can process it deterministically. With AI output, this assumption no longer holds. The form may generate text that appears correct but is misleading, incomplete, or malicious. The system cannot trust him blindly.
This is where validation and guardrails come in. They form a new architectural layer between the model and the rest of the application. Guardrails can take different forms:
- Schema validation: If you expect a JSON object with three fields, you must verify that the form output matches this schema. A missing or distorted field should be treated as an error.
- Policy checks: If your domain prohibits certain outputs, such as disclosing sensitive data, returning personal identifiers, or creating offensive content, policies should filter that output.
- Scope and type of implementation: If the model produces a numeric score, you will need to ensure that the score is valid before passing it to your business logic.
Businesses already know what happens when verification is missing. SQL injections, cross-site scripting, and other vulnerabilities have taught us that unverified input is dangerous. AI outputs are another type of untrusted input, even if they come from within your own system. It is obligatory to treat them with suspicion.
In Java, this class can be created using familiar tools. You can write bean validation annotations, schema checks, or even custom CDI traps that are run after every AI call. The important part is architectural: validation should not be hidden in helper methods. It should be a visible and explicit layer in the stack so that it can be rigorously maintained, evolved, and tested over time.
Observability
Observability has always been critical in enterprise systems. Logs, metrics, and traces allow us to understand how applications behave in production. With AI, observability becomes even more important because behavior is not deterministic. The model may provide different answers tomorrow than it does today. Without vision, you cannot explain or correct the cause.
Observability for AI means more than just recording a score. Requires:
- Track claims and responses: Capture what was submitted to the form and what was returned, ideally using identifiers that link them to the original request
- Registration context: Store data retrieved from vector databases or other sources so you know what affected the model’s answer
- Track cost and response time: Monitor how often forms are called, how long they take, and their cost
- Drift notification: Identify when the quality of answers changes over time, which may indicate model updating or decreased performance on specific data
For Java developers, this is consistent with current practices. We are already integrated Open telemetryRegulated recording frameworks and standard issuers e.g micrometer. The difference is that we now need to apply these tools to AI signals. The prompt is similar to the input event. A typical response is similar to a final dependency. Observability becomes an additional layer that penetrates the stack, capturing the inference process itself.
Consider a Quarkus application that integrates with OpenTelemetry. You can create scopes for each AI call; Add attributes for model name, number of tokens, access time, and cache hits; And export these metrics to Grafana or another monitoring system. This makes AI behavior visible in the same dashboards your operations team is already using.
Mapping new layers of familiar practices
The main idea is that these new layers do not replace the old layers. They extend it. Dependency injection still works. You must enter the guardrail component into the Service in the same manner as you would enter a validator or logger. Fault tolerance libraries like MicroProfile Fault Tolerance or Resilience4j are still useful. You can end AI calls using timeouts, retries, and circuit breakers. Monitoring frameworks such as Micrometer and OpenTelemetry are still relevant. You just point them to new signals.
By treating validation and observability as layers, rather than as custom patches, you maintain the same architectural system that has always defined enterprise Java. This discipline is what keeps systems maintainable as they grow and evolve. Teams know where to look when something fails, and they know how to scale the architecture without introducing fragile breakthroughs.
Example of flow
Imagine a REST endpoint that answers customer questions. The flow looks like this:
1. The request comes to the REST layer.
2. The context generator retrieves related documents from the vector store.
3. The claim is collected and sent to a local or remote form.
4. The result is passed through the guardrail layer which verifies the validity of the structure and content.
5. Observability hooks record the prompt, context, and response for later analysis.
6. The validated result flows into the business logic and is returned to the client.
This flow has clear layers. Each one can develop independently. You can swap the vector store, upgrade the model, or tighten guardrails without rewriting the entire system. This unity is exactly what enterprise Java architectures have always valued.
A concrete example might be the use of LangChain4j in Quarkus. You can define an AI service interface, annotate it with model binding, and enter it into your resource class. About this service, you can add a guardrail that enforces a scheme using Jackson. You can add an OpenTelemetry scope that records the router and tokens used. None of this requires abandoning the Java system. It’s the same stack thinking we’ve always used, now being applied to AI.
Implications for architects
For architects, the main implication is that AI does not eliminate the need for architecture. If anything, it adds to it. Without clear boundaries, AI becomes a black box in the middle of the system. This is not acceptable in an enterprise environment. By defining guardrails and observability as clear layers, you make AI components manageable like any other part of the stack.
This is what evaluation means in this context: systematically measuring how an AI component behaves, using testing and monitoring that goes beyond traditional validation processes. Rather than predicting specific outcomes, assessments look at structure, boundaries, appropriateness and compliance. They combine automated testing, coordinated prompts, and occasional human review to build confidence that the system is behaving as intended. In enterprise settings, evaluation becomes a recurring activity rather than a one-time verification step.
Evaluation itself becomes an architectural concern that goes beyond just the models themselves. Hamel Hussein describes the evaluation as a Top notch system, not an add-on. For Java developers, this means building evaluation into CI/CD, just like in unit and integration tests. Continuous evaluation of claims, retrieval and output becomes part of the publishing portal. This extends to what we already do with integration test suites.
This approach also helps in acquiring skills. Teams already know how to think in terms of overarching layers, services, and interests. By framing AI integration in the same way, you can lower the barrier to adoption. Developers can apply familiar practices to unfamiliar behavior. This is crucial for recruitment. Companies should not rely on a small group of AI specialists. They need large teams of Java developers who can apply their existing skills with only moderate retraining.
There is also the aspect of judgment. When regulators or auditors ask how your AI system works, you need to show more than a diagram with a “Contact LLM here” box. You need to expose the verification layer that verifies the output, the guardrails that enforce policies, and the observability that records decisions. This is what turns artificial intelligence from an experiment into a production system that can be trusted.
Looking forward
The architectural transformations described here are only the beginning. More layers will emerge as AI adoption matures. We will see specialized and per-user caching layers to control cost, fine-grained access control to determine who can use models, and new forms of testing to verify behavior. But the basic lesson is clear: AI requires us to add structure, not remove it.
Java’s history gives us confidence. We’ve already gone from monolithic to distributed systems, from concurrent to reactive programming, and from on-premises to the cloud. Each transformation added layers and patterns. And each time, the ecosystem adapts. The arrival of artificial intelligence is no different. It’s another step in the same journey.
For Java developers, the challenge is not to throw away what we know, but to extend it. The transformation is real, but it’s not strange. Java’s history of layered architectures, dependency injection, and end-to-end services gives us the tools to work with them. The result is not one-off prototypes or demos, but reliable, auditable applications that are ready for the long life cycles that organizations demand.
In our book, Applied artificial intelligence for enterprise Java developmentwe explore these architectural transformations in depth with concrete examples and patterns. From recovery pipelines with Docling to guardrail testing and observability integration, we show how Java developers can take the ideas described here and turn them into production-ready systems.







