Your portfolio is invisible to AI — and Cloudflare is why

Last week I opened my Cloudflare dashboard and found something that stopped me cold.

Under "Control AI crawlers" — a panel I had never noticed — there was a toggle set to Block on all pages. Every AI crawler that had tried to index gabana.dev over the past months had been silently turned away at the door. Perplexity. GPTBot. ClaudeBot. Gemini. All of them.

My portfolio had been invisible to every AI answer engine. By default.

I fixed it that morning. Then I fixed something else that had been quietly killing my site's performance. This is the full story of both — because I think most developers building portfolios have the same two problems and have no idea.

The AI indexing problem

When someone asks Perplexity "who builds SaaS for gaming lounges in Nairobi?" or ChatGPT "Laravel product engineers in East Africa" — how does the AI know what to say? The same way Google does: it crawls the web, reads pages, and builds an index.

AI engines have their own crawlers. OpenAI sends GPTBot. Anthropic sends ClaudeBot. Perplexity sends PerplexityBot. Google uses GoogleOther for its AI products. These bots visit your site, read your content, and that reading is what makes you citable when someone asks a relevant question.

Cloudflare, by default, blocks them all.

The setting is called "Block AI training bots" and it ships enabled. The intention is reasonable — it stops companies from scraping your content to train their models without permission. But there's a critical distinction Cloudflare collapses into one toggle:

Training crawlers — bots that harvest content to train AI models. Blocking these is a legitimate choice.
Indexing crawlers — bots that read your content so AI engines can cite you in answers. Blocking these makes you invisible.

Cloudflare's default blocks both. If you want to be found by AI answer engines, you have to explicitly allow the indexing crawlers through.

The fix: three things, in order

1. Turn off the block in Cloudflare. Dashboard → Overview → "Block AI training bots" → change to "Do not block (allow crawlers)". Also set "Manage your robots.txt" to "Disable robots.txt configuration" — otherwise Cloudflare silently adds noai directives to your robots.txt, overriding whatever you wrote there.

2. Update your robots.txt to explicitly welcome them. Not mentioning a crawler is not the same as welcoming it. Be explicit:

User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: GoogleOther
Allow: /

User-agent: anthropic-ai
Allow: /

User-agent: cohere-ai
Allow: /

3. Make sure your content is actually worth indexing. The crawlers getting through is just step one. What they find when they arrive determines whether AI engines cite you. Structured content, clear answers to specific questions, and JSON-LD schema markup are what make a page citable rather than just readable. For the full technical walkthrough, see How AI systems actually read your site — and for the broader strategy context, Your product exists. Does the AI know?

The caching problem hiding underneath

While auditing the site I checked the response headers on gabana.dev:

cache-control: no-cache, private
set-cookie: XSRF-TOKEN=...
set-cookie: gabana-session=...
cf-cache-status: DYNAMIC

Every page. Including the work case studies and insights articles — pages with no forms, no user accounts, nothing dynamic whatsoever.

The cause: Laravel's web middleware stack runs on every route by default. StartSession starts a session, sets two cookies, and marks the response no-cache, private. Cloudflare sees the cookies and the no-cache directive and marks the response DYNAMIC — meaning it fetches fresh from the server for every single visitor.

A portfolio site serving identical HTML to every visitor was hitting the origin server on every request. Cache rate: 26%.

The fix: surgical middleware removal

The right fix isn't a Cloudflare workaround. It's stopping Laravel from creating sessions for pages that don't need them.

In Laravel 11, you can exclude specific middleware from specific routes:

Route::withoutMiddleware([
    \Illuminate\Session\Middleware\StartSession::class,
    \Illuminate\Cookie\Middleware\AddQueuedCookiesToResponse::class,
    \Illuminate\Foundation\Http\Middleware\ValidateCsrfToken::class,
    \Illuminate\View\Middleware\ShareErrorsFromSession::class,
])->middleware('public.cache')->group(function () {
    Route::get('/work', [HomeController::class, 'workIndex']);
    Route::get('/work/{slug}', [HomeController::class, 'caseStudyShow']);
    Route::get('/insights', [InsightsController::class, 'index']);
    Route::get('/insights/{slug}', [InsightsController::class, 'show']);
});

The public.cache middleware is a small class that sets the right cache headers:

$response->setPublic();
$response->setSharedMaxAge(14400); // Cloudflare edge: 4 hours
$response->setMaxAge(300);         // Browser: 5 minutes

One important note: Laravel 11 uses ValidateCsrfToken, not VerifyCsrfToken. Using the wrong class name causes the exclusion to silently fail — the middleware runs anyway and the cookies still get set. I lost an hour to this.

With the session middleware removed from read-only routes, the response headers became:

cache-control: public, max-age=300, s-maxage=14400, stale-while-revalidate=60
cf-cache-status: MISS  ← first visitor
cf-cache-status: HIT   ← everyone after

Then a single Cloudflare Cache Rule to make it cache HTML (Cloudflare doesn't cache HTML by default — only static assets):

When: URI path starts with /work or /insights
Cache eligibility: Eligible for cache
Edge TTL: 14400 seconds (4 hours)
Browser TTL: 300 seconds (5 minutes)

What this means for AEO specifically

The two problems connect. AI crawlers don't just need permission to visit — they need to be able to read the page quickly and completely. A site returning DYNAMIC responses with no caching is slower and less reliable for crawlers. A cached, fast, openly-accessible page is more likely to be fully indexed.

Beyond that: AI engines weight pages they can crawl consistently. A page that's sometimes slow, sometimes cookie-gated, sometimes blocked by a WAF rule is a lower-quality signal than one that returns a clean 200 with public cache headers every time.

The crawlers are trying to understand who you are and what you know. Make it easy for them.

The defaults assumption

What I keep thinking about is how many developers have this exact configuration and don't know it. Cloudflare is the default choice for DNS and CDN. Laravel is the default choice for PHP backends. Both have shipped with settings that, in combination, make your site uncacheable and invisible to AI engines.

Neither setting is wrong in isolation. Blocking AI training crawlers is a reasonable default for a production app. Session middleware on all routes is a reasonable default for a web framework. The problem only appears when you look at the full stack and ask: what is actually happening when a request comes in?

That question — what is actually happening — is the one worth asking about every layer of your infrastructure. The answer is usually more interesting than the defaults suggest.