Redproof: AI red-teaming & EU AI Act security testing

The clock that's already running

Aug 2
2026

Most of the EU AI Act becomes operative for high-risk and general-purpose AI systems. By then adversarial testing is no longer a nice-to-have. Procurement teams in regulated sectors already ask for a red-team report before they sign anything. Around 3,200 Dutch businesses fall directly in scope, and the work itself takes weeks. The real risk is leaving it too late.

What we test

We test the whole system, not just the model.

The base model is rarely the weak point. The real gaps sit in your prompts, your retrieval pipeline, and the tools your agent is allowed to call. That is where we go looking.

LLM01

Prompt injection

Direct and indirect, including multi-turn jailbreaks and payloads buried in the documents, web pages, and tool output your agent quietly trusts.

LLM02

Sensitive information disclosure

Coaxing the system into leaking secrets, training data, internal errors, or other users' information.

LLM05

Improper output handling

Unsafe or unescaped output — XSS, SSRF, leaked stack traces — that the app around the model trusts and renders.

LLM06

Excessive agency

Getting an agent to call APIs, move money, or take actions it was never meant to take.

LLM07

System prompt leakage

Extracting your system prompt and the tool schema that is supposed to protect it.

Bespoke

Business-logic abuse

The exploits specific to your product — your pricing, your workflow, your data boundaries — beyond the OWASP checklist.

How an engagement runs

A fixed, predictable sequence.

Scope

We agree the target system, the threat model, and the rules of engagement, all in writing.

Attack

Automated breadth, then manual depth where the real findings hide.

Triage

Each finding ranked by severity, with a working proof of exploit.

Report

Plain-language findings mapped to OWASP LLM and the relevant EU AI Act articles.

Re-test

You patch, we re-run, and your evidence shows the fix held.

Pricing

Fixed-scope packages. No "contact sales" maze.

Most teams start with a Full Engagement, then move to a quarterly retainer as the product changes.

Baseline

Baseline Scan

from €3,500

Automated testing of one AI feature (for example, your chatbot or document summarizer)
Findings report with severity
3–5 days

Most start here

Full Engagement

from €8,000

Automated + manual custom attacks
OWASP LLM + EU AI Act mapping
Remediation guidance
2–3 weeks

Agents

Agent Engagement

from €15,000

Full engagement on a tool-using agent
Tool-misuse & action-safety testing
Re-test included
3–4 weeks

Ongoing

Retainer

from €1,500 / quarter

Re-test as your system changes
New-attack coverage
Always-current evidence

Enterprise vendors start around €15k for one engagement, and a junior often does the actual testing. With Redproof the person who understands your system is the person testing it. Priced for the stage you are at, not theirs.

Who's behind it

Not a scanner with a logo.

Redproof is the practice of a production AI engineer who builds and evaluates large models for a living. Most security shops point a tool at your endpoint and email you the printout. Redproof works the way an attacker actually does, because building and breaking these systems is the day job here, not a side service.

The person who scopes your engagement is the person who runs the attacks and writes the report. No handoff to a junior, no account manager in the middle. As the work grows, Redproof brings in vetted specialists for the larger jobs, but the standard holds on every test: senior hands, start to finish.

Break it in private.
Prove it in public.