Java

Testing Mistral OCR 4 with Spring AI 2.0, Spring Boot and Angular

Marco Molteni Updated 2026-06-25 8 min read

A full-stack document intelligence demo using Angular, Spring Boot, Spring AI 2.0, and Mistral OCR 4 to extract markdown, layout blocks, tables, images, confidence signals, and structured JSON.

Modern OCR is no longer just about extracting text from PDFs.

The more interesting shift is that OCR can now return structured, reviewable, application-ready data: markdown, layout blocks, tables, images, bounding boxes, confidence signals, and JSON that can be consumed by another system.

That is what I wanted to test with Mistral OCR 4, Spring AI 2.0, Spring Boot, and Angular.

The result is Luce.Docu AI, a small full-stack demo for document ingestion workflows.

The application accepts a document, sends it through a Spring Boot backend, calls Mistral OCR 4, and renders the result in an Angular UI that feels more like a real application than a simple playground.

The main idea is simple:

OCR is becoming structured document intelligence.

Instead of stopping at plain text extraction, we can start building workflows where documents are parsed, reviewed, validated, transformed, and prepared for search, automation, or RAG.

Demo videos

I recorded two short demos for this post.

The first one shows the happy path with a clean document. The second one tests a more difficult case: a manuscript-style image.

The goal is not to run a scientific benchmark. The goal is to see how the OCR output feels inside a real application workflow: upload, extraction, review, JSON export, and inspection of structured blocks.

What the application does

The demo app is intentionally small, but it covers the full flow from frontend upload to backend extraction and result visualization.

The application can extract and display:

Markdown per page
Layout and content blocks
Tables
Extracted images
Structured JSON
Raw OCR response data for debugging

The frontend lets the user choose between three modes:

raw: mostly markdown output
full: blocks, tables, headers, footers, and images
smart: full extraction plus structured JSON annotations

This split is useful because it shows the progression from classic OCR to document intelligence.

Sometimes markdown is enough.
Sometimes you want layout-aware output.
Sometimes you want schema-driven JSON that another system can consume directly.

That is the part I find most interesting.

Next step would to use the data extracted and store it in a DB or send to an LLM for analysis.

Application flow

The application flow looks like this:

Angular frontend
  -> upload PDF or image
  -> POST /api/ocr

Spring Boot backend
  -> convert file to Base64 data URL
  -> call Mistral OCR 4
  -> normalize response

Angular result view
  -> summary
  -> markdown
  -> tables
  -> images
  -> blocks
  -> JSON
  -> raw response

The Angular application does not call Mistral directly.

The API key stays on the backend, and the frontend only talks to my own Spring Boot REST API. This is the same pattern I would use in a production application.

The frontend owns the user experience.
The backend owns the integration, security boundary, validation, and response normalization. The backend can open integration possibilities through the API.

Architecture overview

The repository has two clear parts:

frontend/: Angular 22 UI with drag-and-drop upload, PDF preview, result tabs, and search/filtering for extracted blocks
backend/: Spring Boot 4.x backend with Spring AI 2.0 and a direct RestClient integration for OCR 4-specific parameters

There are also two backend OCR paths:

SpringAiOcrClient
MistralAdvancedOcrClient

The first one demonstrates the official low-level Spring AI MistralOcrApi.

The second one is the path used by the Angular UI, because OCR 4 exposes newer request options that are easier to pass through a direct HTTP request.

That is an important design decision.

I wanted to show both:

how far the official Spring AI wrapper already gets you
how to keep moving when a model adds request fields faster than a framework abstraction

This is a common situation when building applications. The framework gives you a clean starting point, but sometimes you still need to drop one level lower to access newer provider-specific options.

I do not see those two approaches as competing. In practice, I often want both in the same codebase.

In this case Spring AI 2.0 don't implement all the new features of Mistral OCR 4 (Document Blocks) and I had to use the more low level RestClient.

Why Mistral OCR 4 instead of basic OCR?

Traditional OCR is often treated as a simple text extraction step.

You upload a PDF or image.
You get text back.
You store it somewhere.

That is useful, but it is not enough for many real document workflows.

In this demo, I wanted to test the features that make OCR output more application-friendly:

include_blocks for structural blocks
bounding boxes for visual grounding
confidence scores for review workflows
image extraction for richer documents
schema-based document annotations for structured JSON

That changes the shape of the workflow:

Document
  -> OCR
  -> structured output
  -> validation / review
  -> storage, search, automation, or RAG

Once OCR returns more than text, the application can reason about titles, tables, headers, footers, signatures, or uncertain regions instead of treating the whole document as one flat string.

This is where OCR starts to feel less like a utility and more like an ingestion layer for document intelligence.

Spring Boot backend setup

The backend is intentionally small.

It uses:

Spring Boot
spring-boot-starter-web
validation support
Spring AI 2.0
Mistral integration
RestClient for the advanced OCR 4 request

Configuration is kept in application.yml, with the model defaulting to:

mistral-ocr-4-0

The main endpoint is a multipart upload endpoint:

POST /api/ocr

The controller accepts:

file
mode
pages
includeImageBase64
language
customPrompt

The uploaded file is converted into a Base64 data URL before it is sent to Mistral.

That keeps the frontend simple and ensures the API key stays on the server side, where it belongs.

There is also a URL-based endpoint:

POST /api/ocr/url

That endpoint is useful for quick cURL demos, testing public PDFs, and creating screenshots for the blog post.

Adding Spring AI 2.0 and the Mistral dependency

The backend uses Spring AI 2.0 through the Spring AI BOM and the spring-ai-mistral-ai dependency:

<dependencyManagement>
  <dependencies>
    <dependency>
      <groupId>org.springframework.ai</groupId>
      <artifactId>spring-ai-bom</artifactId>
      <version>2.0.0</version>
      <type>pom</type>
      <scope>import</scope>
    </dependency>
  </dependencies>
</dependencyManagement>

<dependency>
  <groupId>org.springframework.ai</groupId>
  <artifactId>spring-ai-mistral-ai</artifactId>
</dependency>

This is where Spring AI is useful for Java developers.

Even when the use case is specialized, like OCR, the application still stays in familiar Spring Boot territory:

configuration properties
controllers
validation
typed request and response objects
logging
error handling
HTTP clients
security boundaries

That matters because most real AI applications are not isolated scripts. They are backend systems that need to integrate with existing services, frontend applications, databases, queues, logs, and monitoring.

Calling the official Spring AI `MistralOcrApi`

One backend path is there specifically to demonstrate the official Spring AI OCR wrapper.

The SpringAiOcrClient builds a MistralOcrApi.OCRRequest, chooses DocumentURLChunk or ImageURLChunk, and calls the low-level API client directly.

That version is exposed through:

POST /api/ocr/spring-ai/url

I like keeping this path in the project because it shows the cleanest starting point for Java developers who want to try OCR with Spring AI before adding custom request handling.

It also shows a realistic development pattern:

Start with the framework abstraction, then drop down one level when you need provider-specific features.

That pattern is especially useful in AI development because provider APIs evolve quickly.

A framework abstraction can simplify the common path, but the newest model options are not always available immediately through the higher-level API.

Adding OCR 4 options with `RestClient`

The Angular UI uses the MistralAdvancedOcrClient.

This client sends a direct request to /v1/ocr with a richer request body.

The request includes fields like:

{
  "model": "mistral-ocr-4-0",
  "include_blocks": true,
  "include_image_base64": true,
  "extract_header": true,
  "extract_footer": true,
  "table_format": "html",
  "confidence_scores_granularity": "page"
}

In smart mode, the backend also sends:

document_annotation_prompt
document_annotation_format

The second field contains a JSON schema that asks for values such as:

summary
document_type
title
language
entities
dates
amounts
line_items
key_value_pairs
tables_summary
action_items
confidence_notes
resources

This is the most interesting part of the demo for me.

OCR is not the end of the pipeline anymore. OCR becomes the first step in producing structured business data.

Instead of only asking:

What text is inside this document?

we can start asking:

What kind of document is this?
What are the important entities?
Are there dates, amounts, resources, or action items?
Which parts should be reviewed by a human?
Can another system consume the result directly?

That is a much more powerful workflow.

Angular drag-and-drop upload UI

The frontend is built with Angular 22 and has a polished upload flow instead of a bare form.

For a document intelligence UI, the user experience is important. The user should be able to inspect the document, review the output, compare different views, and understand where the extracted data came from. These features have been added to the demo.

Rendering markdown, tables, images, blocks, and JSON

Once the backend returns the OCR response, the Angular app splits it into several views:

summary
markdown
tables
images
blocks
json
raw

This structure makes the output much easier to inspect.

Markdown is useful for quick reading.
Tables are useful for structured visual data.
Images are useful for richer documents.
Blocks are useful for layout-aware inspection.
JSON is useful for downstream processing.
Raw output is useful when debugging the exact OCR response shape.

The app also computes simple stats such as page count, image count, block count, and active mode.

For blocks, it supports search and type filtering, which is especially helpful when testing longer documents.

Another useful detail is the preview-to-block relationship.

Mistral OCR returns the position of the text blocks, in the UI we can highlight them in the preview. That is what the user should expect from a modern OCR:

not just “what was extracted”, but “where did it come from?”

That connection is important for review workflows, redaction, auditing, citations, and human validation.

Clean document vs manuscript image

I tested two types of input.

The first one is a clean document. This is the happy path and a good way to verify that the application flow works end to end.

The second one is a manuscript style image. This is a more interesting case because the layout is less predictable and the extraction is naturally harder.

This kind of test is useful because it shows why structure and confidence matter.

Limitations of this demo

This project is not a benchmark.

I tested a few representative documents to understand the developer workflow and the shape of the OCR response, but I did not run a large evaluation across many document types.

There are also a few important points to keep in mind:

Confidence scores are useful signals, but they are not a guarantee that the extracted content is correct.
Structured JSON should still be validated before it is used in a business workflow.
Sensitive documents require careful handling of privacy, retention, and compliance. Mistral offers a self-hosted option.
For production use, I would add persistence, audit logs, retry handling, rate-limit handling, and human review for low-confidence fields.

The value of the demo is that it shows how OCR can become the first step in a safer document ingestion pipeline.

Possible real-world use cases

This kind of workflow can be used in many applications:

Document ingestion
Enterprise search
RAG pipelines
Contract review
Invoice extraction
Receipt processing
Product sheet ingestion
Research paper analysis
Old letter archiving
Compliance workflows
Human-in-the-loop validation

The common idea is the same:

Raw document
  -> structured extraction
  -> review
  -> validated data
  -> downstream system

That downstream system could be a database, a search engine, a vector index, a workflow engine, or another backend service.

This is why I think OCR is becoming more interesting for application developers.

It is no longer only about converting scanned pages into text. It is becoming part of the data ingestion layer.

Final thoughts

This demo started as a small experiment, but it confirmed a bigger idea:

OCR is becoming structured data.

Plain text extraction is still useful, but it is not the full story anymore.

The real value comes when OCR can return structure, location, confidence, and data that applications can process.

With Mistral OCR 4, Spring AI 2.0, Spring Boot, and Angular, we can build workflows where documents are not just uploaded and stored. They can be understood, transformed, reviewed, indexed, and connected to other systems.

Testing Mistral OCR 4 with Spring AI 2.0, Spring Boot and Angular

Demo videos

What the application does

Application flow

Architecture overview

Why Mistral OCR 4 instead of basic OCR?

Spring Boot backend setup

Adding Spring AI 2.0 and the Mistral dependency

Calling the official Spring AI `MistralOcrApi`

Adding OCR 4 options with `RestClient`

Angular drag-and-drop upload UI

Rendering markdown, tables, images, blocks, and JSON

Clean document vs manuscript image

Limitations of this demo

Possible real-world use cases

Final thoughts

Useful links

Continue reading

Demo videos

What the application does

Application flow

Architecture overview

Why Mistral OCR 4 instead of basic OCR?

Spring Boot backend setup

Adding Spring AI 2.0 and the Mistral dependency

Calling the official Spring AI MistralOcrApi

Adding OCR 4 options with RestClient

Angular drag-and-drop upload UI

Rendering markdown, tables, images, blocks, and JSON

Clean document vs manuscript image

Limitations of this demo

Possible real-world use cases

Final thoughts

Useful links

Enjoyed this post?

Continue reading

You could be interested in More Java deep dives

Java and Kotlin in VS Code: Fix Maven Interop and Imports

Spring Boot Versions Explorer

Spring Boot 4.0 Null Safety: JSpecify Ends a Decade of Fragmentation?

Java 26 - the new features

Why WordPress bots keep knocking on your Java server

Lombok: The Batman of the JRE that saved Java from Boilerplate

Calling the official Spring AI `MistralOcrApi`

Adding OCR 4 options with `RestClient`