# SEO & AI Visibility Checklist

> A comprehensive implementation contract for maximum visibility to AI agents, search engines, social media crawlers, and automated verification bots.

---

## Architecture

- Build landing pages with a static site generator (e.g. Astro, Hugo, Next.js static export)
- Pre-render all public pages to pure HTML at build time
- No content should require JavaScript to be visible
- JavaScript may be used for progressive enhancement only
- Progressive enhancement scripts must live inside the document (no scripts after `</html>`)

---

## 1. robots.txt

Create a `robots.txt` at your site root with explicit allow blocks for all major crawlers:

```
User-agent: *
Allow: /

User-agent: Googlebot
Allow: /

# Controls whether content is used for Gemini AI training — not a traditional crawler
User-agent: Google-Extended
Allow: /

User-agent: GPTBot
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: Claude-User
Allow: /

User-agent: Claude-SearchBot
Allow: /

# Meta's AI training crawler — accounts for ~50% of all AI crawling traffic
User-agent: Meta-ExternalAgent
Allow: /

# Controls whether content is used for Apple Intelligence AI training
User-agent: Applebot-Extended
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Perplexity-User
Allow: /

User-agent: Bytespider
Allow: /

User-agent: CCBot
Allow: /

User-agent: cohere-ai
Allow: /

Sitemap: https://<your-domain>/sitemap.xml
```

---

## 2. llms.txt

Serve an AI content discovery file at `/llms.txt` following the spec at [llmstxt.org](https://llmstxt.org/).

Required:
- H1 with your project or company name (this is the **only required element**)

Optional (recommended):
- Blockquote with a one-line summary
- General content paragraphs or lists with background information
- H2-delimited sections containing link lists in `[Name](URL): Description` format
- A `## Optional` section for lower-priority resources (this is the only H2 name the spec assigns special meaning to)

The spec does not prescribe other H2 section names — choose names that fit your site (e.g. `## Products`, `## Documentation`, `## API Reference`).

Generate this from your product data source so it stays in sync automatically.

---

## 3. JSON-LD Structured Data

Embed three schemas in `<head>`:

### Organization
```json
{
  "@context": "https://schema.org",
  "@type": "Organization",
  "@id": "https://<your-domain>/#organization",
  "name": "<Your Company>",
  "url": "https://<your-domain>/",
  "logo": { "@type": "ImageObject", "url": "https://<your-domain>/logo.png" },
  "description": "<Your description>",
  "sameAs": ["<social-url-1>", "<social-url-2>"],
  "contactPoint": { "@type": "ContactPoint", "email": "<your-email>", "contactType": "Customer Support" },
  "knowsAbout": ["<domain-1>", "<domain-2>"]
}
```

### WebSite
```json
{
  "@context": "https://schema.org",
  "@type": "WebSite",
  "name": "<Your Company>",
  "url": "https://<your-domain>/",
  "publisher": { "@id": "https://<your-domain>/#organization" }
}
```

### ItemList (homepage only)
```json
{
  "@context": "https://schema.org",
  "@type": "ItemList",
  "numberOfItems": <count>,
  "itemListElement": [
    {
      "@type": "ListItem",
      "position": 1,
      "item": {
        "@type": "SoftwareApplication",
        "name": "<Product Name>",
        "url": "<product-url>",
        "applicationCategory": "<category>",
        "description": "<description>",
        "creator": { "@id": "https://<your-domain>/#organization" }
      }
    }
  ]
}
```

Requirements:
- Organization and WebSite on every public page
- ItemList on homepage only
- Cross-reference with `@id`
- Keep `numberOfItems` data-driven
- Add `alternateName` to Organization for name disambiguation (e.g. variations of your company name)
- Add `legalName` to Organization
- Validate at [validator.schema.org](https://validator.schema.org/) and [Google Rich Results Test](https://search.google.com/test/rich-results)

### BreadcrumbList

Add to every page except the homepage:

```json
{
  "@context": "https://schema.org",
  "@type": "BreadcrumbList",
  "itemListElement": [
    { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://<your-domain>/" },
    { "@type": "ListItem", "position": 2, "name": "<Page Name>", "item": "https://<your-domain>/<page>" }
  ]
}
```

BreadcrumbList is one of Google's actively supported rich result types and generates visible breadcrumb trails in desktop search results.

### SiteNavigationElement
```json
{
  "@context": "https://schema.org",
  "@type": "SiteNavigationElement",
  "name": ["Products", "About", "Contact"],
  "url": [
    "https://<your-domain>/products",
    "https://<your-domain>/about",
    "https://<your-domain>/contact"
  ]
}
```

This provides machine-readable navigation structure. Note: Google generates sitelinks algorithmically based on site structure, internal linking, and user behavior — this schema may provide a supplementary signal but is not guaranteed to influence sitelinks.

### FAQPage (homepage)
```json
{
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "mainEntity": [
    {
      "@type": "Question",
      "name": "What is <Your Company>?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "<Your answer>"
      }
    }
  ]
}
```

Requirements:
- FAQ content must be **visible on the page** (not just in schema)
- Use `<details>` / `<summary>` elements for expandable FAQ sections
- Include questions about your company, products, and common queries
- Add a question for each product with a link to try it

> **Note:** Since August 2023, Google restricts FAQ rich results to well-known government and health authority websites. For other sites, FAQPage schema will not generate visible rich results in Google, but may still help AI systems and other search engines understand your content.

### SoftwareApplication (per-product pages)
```json
{
  "@context": "https://schema.org",
  "@type": "SoftwareApplication",
  "name": "<Product Name>",
  "description": "<Product description>",
  "applicationCategory": "<Category>Application",
  "url": "<product-url>",
  "creator": { "@id": "https://<your-domain>/#organization" }
}
```

Create individual product pages at `/products/<slug>` with per-product schema.

---

## 4. Product Pages

Create a `/products` index page listing all products, and individual pages for each product:

- `/products` — overview of all products grouped by category
- `/products/<product-slug>` — dedicated page per product with:
  - Product name, description, status (live/in development)
  - Category and type
  - Link to the product's website
  - SoftwareApplication JSON-LD schema
  - Back links to `/products` and homepage

Requirements:
- Generate pages from your product data source (single source of truth)
- Link to `/products` from your homepage footer
- Include all product pages in `sitemap.xml`
- Each product page must have unique `<title>`, `<meta description>`, and canonical URL

---

## 5. Sitemap

Create a `sitemap.xml` at your site root listing all indexable pages:

```xml
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://<your-domain>/</loc>
    <lastmod>YYYY-MM-DD</lastmod>
  </url>
</urlset>
```

Requirements:
- Maximum 50,000 URLs and 50MB per sitemap file
- Only include canonical, indexable URLs (200 status, no `noindex`)
- Use accurate `<lastmod>` dates (only update when content actually changes)
- Google ignores `<priority>` and `<changefreq>` — do not include them
- For large sites (50,000+ URLs), use a sitemap index file
- Generate dynamically from your page/data source when possible
- Reference your sitemap in `robots.txt`

---

## 6. Open Graph Meta Tags

Required on every public page:

```html
<meta property="og:type" content="website" />
<meta property="og:url" content="https://<your-domain>/<page>" />
<meta property="og:site_name" content="<Your Company>" />
<meta property="og:locale" content="en_US" />
<meta property="og:title" content="<Page Title>" />
<meta property="og:description" content="<Page Description>" />
<meta property="og:image" content="https://<your-domain>/og-image.png" />
<meta property="og:image:width" content="1200" />
<meta property="og:image:height" content="630" />
<meta property="og:image:type" content="image/png" />
<meta property="og:image:alt" content="<Descriptive alt text>" />
```

Requirements:
- `og:url`, `og:title`, `og:description` must be page-specific
- `og:image` must be exactly 1200x630px
- Use absolute HTTPS URLs for `og:image`
- Specify explicit dimensions so platforms don't need to probe the image
- For blog posts or articles, use `og:type` of `article` instead of `website`

---

## 7. Twitter/X Card Meta Tags

Use `name=` attributes, NOT `property=`. Twitter's parser does fall back to Open Graph tags when Twitter-specific tags are missing.

```html
<meta name="twitter:card" content="summary_large_image" />
<meta name="twitter:title" content="<Page Title>" />
<meta name="twitter:description" content="<Page Description>" />
<meta name="twitter:image" content="https://<your-domain>/og-image.png" />
<meta name="twitter:image:alt" content="<Descriptive alt text>" />
<meta name="twitter:site" content="@<your-handle>" />
<meta name="twitter:creator" content="@<your-handle>" />
```

---

## 8. Essential Head Tags

Every public page must include:

```html
<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8" />
  <meta name="viewport" content="width=device-width, initial-scale=1.0" />
  <link rel="canonical" href="https://<your-domain>/<page>" />
  <link rel="icon" type="image/svg+xml" href="/favicon.svg" />
  <link rel="apple-touch-icon" href="/apple-touch-icon.png" />
  <meta name="description" content="<page description>" />
  <meta name="author" content="<Your Company>" />
  <meta name="theme-color" content="<#hex-color>" />
  <title><Page Title></title>
</head>
```

Notes:
- `<meta charset="UTF-8">` must be within the first 1024 bytes of the document
- `<link rel="canonical">` must use an absolute HTTPS URL matching the page's own URL
- `<meta name="robots" content="index, follow">` is optional — this is the default behavior
- Use `<meta name="robots" content="noindex">` for pages that should not be indexed (utility pages, search results, staging environments)
- Do not use `user-scalable=no` or `maximum-scale=1` in the viewport — users must be able to zoom
- `<meta name="keywords">` is ignored by Google (since 2009) and not used for ranking by Bing — omit it

---

## 9. Semantic HTML

- Single `<h1>` per page
- `<h2>` for sections, `<h3>` for items
- Proper landmarks: `<nav>`, `<main>`, `<header>`, `<section>`, `<article>`, `<footer>`
- `aria-label` on navigation and key sections
- `aria-hidden="true"` on decorative-only elements
- `rel="noopener"` on external links with `target="_blank"`
- Fully responsive layout

---

## 10. Performance

- No framework runtime required for public-page content rendering
- No JavaScript dependency for visible content
- Add `loading="lazy"` to below-fold images (do NOT lazy-load the hero/LCP image)
- Add `fetchpriority="high"` to the hero/LCP image
- Add `<link rel="preconnect">` for critical third-party origins
- Add `<link rel="preload">` for critical fonts and above-fold assets
- If using Google Fonts, add `&display=swap` to the URL
- Use `defer` or `async` on non-critical scripts; prefer `type="module"`
- Enable gzip or brotli compression on your server
- Minimize DOM size (aim for under 800 elements; investigate if over 1400)
- Monitor Core Web Vitals:
  - **LCP** (Largest Contentful Paint): under 2.5s
  - **CLS** (Cumulative Layout Shift): under 0.1
  - **INP** (Interaction to Next Paint): under 200ms — TBT (Total Blocking Time) is a lab-only proxy used by Lighthouse
- Test at [PageSpeed Insights](https://pagespeed.web.dev/)

---

## 11. Accessibility

- Add a skip-to-content link as the first focusable element: `<a href="#main" class="skip-link">Skip to content</a>`
- All form inputs must have associated `<label>` or `aria-label`
- Tap targets must be at least 44x44px on touch devices
- Do not use `tabindex` values greater than 0
- All `<button>` elements must have explicit `type="button"` or `type="submit"`
- All `<video>` elements must have `<track kind="captions">`
- Do not autoplay media without `muted`
- All `<table>` elements should have `<caption>` and `<th scope="col">`
- Animations must respect `prefers-reduced-motion`
- Interactive surfaces must maintain WCAG AA contrast (4.5:1 for normal text, 3:1 for large text)

---

## 12. Multilingual / International

If your site has localized versions:

```html
<link rel="alternate" hreflang="en" href="https://<your-domain>/en/<page>" />
<link rel="alternate" hreflang="es" href="https://<your-domain>/es/<page>" />
<link rel="alternate" hreflang="x-default" href="https://<your-domain>/<page>" />
```

Requirements:
- Every page in the hreflang set must reference itself
- Include `hreflang="x-default"` as a fallback
- Use valid ISO 639-1 language codes
- Add `dir="rtl"` to `<html>` for right-to-left languages

---

## 13. PWA & App Metadata

```html
<link rel="manifest" href="/manifest.json" />
<meta name="theme-color" content="<#hex-color>" />
<link rel="apple-touch-icon" href="/apple-touch-icon.png" />
```

If creating a `manifest.json`:
```json
{
  "name": "<Your App Name>",
  "short_name": "<Short>",
  "start_url": "/",
  "display": "standalone",
  "background_color": "<#hex>",
  "theme_color": "<#hex>",
  "icons": [
    { "src": "/icon-192.png", "sizes": "192x192", "type": "image/png" },
    { "src": "/icon-512.png", "sizes": "512x512", "type": "image/png" }
  ]
}
```

---

## 14. Security Headers

Set via your hosting platform (e.g. Cloudflare `_headers` file, Vercel `vercel.json`, Netlify `_headers`):

```
/*
  X-Content-Type-Options: nosniff
  X-Frame-Options: DENY
  Referrer-Policy: strict-origin-when-cross-origin
  Permissions-Policy: camera=(), microphone=(), geolocation=()
  Strict-Transport-Security: max-age=31536000; includeSubDomains
```

> **Warning about HSTS `preload`:** Adding `preload` submits your domain to a browser-hardcoded list that is nearly irreversible (removal takes months). Only add it after confirming all subdomains support HTTPS. See [hstspreload.org](https://hstspreload.org/).

For Content-Security-Policy, start strict and loosen only as needed:

```
  Content-Security-Policy: default-src 'self'; script-src 'self'; style-src 'self' 'unsafe-inline'; img-src 'self' https: data:; font-src 'self'; frame-ancestors 'none'
```

> Note: Avoid `'unsafe-inline'` for `script-src` where possible. Use nonce-based or hash-based CSP instead. `'unsafe-inline'` for `style-src` is often necessary for frameworks that inject styles. If you use third-party scripts (analytics, tag managers, chat widgets), add their origins to `script-src`.

Also set:
- Immutable cache on fingerprinted assets (e.g. `/_astro/*`, `/_next/static/*`)
- Explicit `Content-Type` for `llms.txt`, `robots.txt`, `sitemap.xml`
- Appropriate cache durations for images and static assets

---

## 15. Required Assets

| Asset | Purpose | Dimensions |
|-------|---------|------------|
| `robots.txt` | Crawler access control | — |
| `llms.txt` | AI agent content discovery | — |
| `sitemap.xml` | Search engine page discovery | — |
| `og-image.png` | Social media preview image | 1200x630 |
| `favicon.svg` | Browser tab icon | — |
| `apple-touch-icon.png` | iOS home screen icon | 180x180 |
| `manifest.json` | PWA manifest (optional) | — |
| Logo PNG | Press/download logo | — |

---

## Verification Checklist

### Before deploy
1. Run your build command and confirm it succeeds
2. Inspect the built HTML — all content must be present in raw HTML without JavaScript
3. Confirm no `<script>` is emitted after `</html>` in built output
4. Verify `robots.txt`, `llms.txt`, and `sitemap.xml` in the build output
5. Confirm `og-image.png` is exactly 1200x630
6. Disable JavaScript in browser and confirm all public pages remain fully readable
7. Test `prefers-reduced-motion: reduce` — all animations and marquees must stop
8. Verify page-specific metadata differs between pages (title, description, canonical, OG tags)
9. Validate JSON-LD at [validator.schema.org](https://validator.schema.org/)
10. Validate structured data at [Google Rich Results Test](https://search.google.com/test/rich-results)

### After deploy
1. `curl -s https://<your-domain>/` — all content visible in raw HTML
2. `curl -s https://<your-domain>/robots.txt` — all AI bots allowed
3. `curl -s https://<your-domain>/llms.txt` — formatted markdown
4. `curl -s https://<your-domain>/sitemap.xml` — all pages listed
5. Verify the site is on HTTPS and HTTP redirects to HTTPS
6. Verify canonical URL matches the actual URL (no www/non-www mismatch)
7. Test LinkedIn preview with [Post Inspector](https://www.linkedin.com/post-inspector/)
8. Test Twitter/X card preview
9. Share URL on WhatsApp, Discord, Slack, iMessage and verify rich unfurls
10. Run [PageSpeed Insights](https://pagespeed.web.dev/) and verify Core Web Vitals pass
11. Submit sitemap in [Google Search Console](https://search.google.com/search-console)
12. Submit sitemap in [Bing Webmaster Tools](https://www.bing.com/webmasters)
13. Use URL Inspection tool in Search Console to verify indexing
14. Purge CDN/Cloudflare cache if metadata changed

---

*Generated by the [SourceStrongAI SEO Audit Tool](https://sourcestrongai.com/seo)*
