Pages

25 June, 2026

SitecoreAI Pathway — Migrating from a Sitecore Website (Step-by-Step Walkthrough)

In my previous post, I compared SitecoreAI Pathway with the old XM to XM Cloud Migration Tool at a high level. In this post, I am going deeper — walking through the Sitecore Website migration path step by step, with screenshots from my actual testing.

This is the path most Sitecore developers and administrators will follow when migrating from an existing Sitecore XM or XP instance to SitecoreAI (XM Cloud). It involves extracting source data using the XMComponentExtraction tool, exporting your target site structure, and letting Pathway's AI handle the content audit, mapping, and migration.

I am sharing this as someone still exploring the tool — these are my hands-on findings, not expert advice. If you spot something I have missed or interpreted differently, I would love to hear from you.

Prerequisites Before You Begin

Before starting the migration in Pathway, make sure these are in place:

  • Target SitecoreAI site structure — Your target environment should already have templates, renderings, component templates, and page designs configured. Pathway maps content to existing structures; it does not create new ones.
  • Media assets migrated — All in-scope media should be moved from source to target first. You can use the XM to XM Cloud Migration Tool for this, as Pathway does not handle media library migration for the Sitecore website path.
  • .NET 9.0 Runtime — Required on the machine where you will run the XMComponentExtraction console app.
  • PowerShell 7+ — Required for running the export-structure.ps1 script.
  • CM access — You need access to the source Sitecore CM instance for deploying the extraction handler and running the tool.

Step 1 — Select CMS (Sitecore Website)

After installing Pathway from the Sitecore Cloud Portal Marketplace, click on it from the Apps section on your Home page. You will land on the migration creation screen.

Select "Sitecore website" as your source. The description says "Extract content from a Sitecore XM website." Give your migration a name and description, then click Next.

Step 1 - Choose Your Source: Sitecore website selected with migration name and description fields
Step 1 — Selecting "Sitecore website" as the source and entering migration details

Notice there is also an "Any website" option here — that is the web crawling path I will cover in the next blog post.

Step 2 — Configure SitecoreAI Instance

Step 2 is the most involved part of the setup. There are three things happening on this screen:

1. Configure target environment — On the left, select your target SitecoreAI environment and site. The site path auto-populates based on your selection and gets locked after you initialize the migration.

2. Upload target SitecoreAI structure — In the middle section, upload your xmc-structure.json file. This tells Pathway what templates, renderings, and page designs exist in your target environment so it knows what to map to.

3. Generate storage for source data — Below that, Pathway generates an Azure Blob Storage URL. This is where your extracted source data will be uploaded. Copy this URL — you will need it when running the XMComponentExtraction tool.

Step 2 - Configure SitecoreAI Instance showing environment selection, structure upload, and Azure Blob URL
Step 2 — Configuration screen with target environment, structure upload, and Azure Blob Storage URL

Click "Migration Initialized" to start. You will see a confirmation message "Migration initialized successfully." The Migration Summary panel on the right shows all the details.

When you click "Read instruction here" next to the SitecoreAI Structure section, a popup appears with instructions to download the CMSExportStructure package.

CMSExportStructure download popup with instructions
Download CMSExportStructure package — this generates the xmc-structure.json file

Preparing the Target Structure (xmc-structure.json)

This is a step that happens outside of Pathway. You need to generate the xmc-structure.json file that describes your target site's structure.

The approach I followed:

  1. Create a Sitecore package from the target SitecoreAI environment containing the relevant items — Components, Pages, Renderings, and Presentation.
  2. Download and extract the package zip file.
  3. In the extracted folders, you will see GUID-named subfolders — this is the standard Sitecore serialization format. Each folder represents an item.
  4. Place these in the corresponding folders expected by the export script.
  5. Run the PowerShell script: pwsh -NoProfile -ExecutionPolicy Bypass -File .\export-structure.ps1
Extracted package folder structure showing Components, Pages, Presentation, Renderings folders with GUID subfolders
Extracted folder structure — Components, Pages, Presentation, Renderings with GUID-named item folders

The script reads these serialized items and outputs the xmc-structure.json file. Here is what the JSON structure looks like:

xmc-structure.json showing Pages, Renderings, and PageDesignMappings structure
xmc-structure.json — contains Pages, Renderings, and PageDesignMappings from the target site

The JSON contains four main sections: Pages (page templates with their IDs and fields), Renderings (components like "Related Blog Articles" and "Reviews" with their template types), PageDesignMappings (linking page templates to page designs), and PageDesigns.

Alternatively, if you are using Sitecore Content Serialization (SCS), you can serialize items directly instead of creating a Sitecore package.

Once the JSON is ready, upload it back in the Pathway configuration screen. You will see a confirmation: "SitecoreAI structure file uploaded successfully."

Configuration screen showing SitecoreAI structure uploaded successfully
Structure uploaded successfully — note the confirmation message and the Migration Summary showing "Uploaded" status

Extracting Source Data — XMComponentExtraction

Now for the source side. The XMComponentExtraction tool extracts page data as JSON from your Sitecore XM/XP instance. It supports On-Prem, PaaS, and Container environments.

The tool has two parts:

  • ExtractorHandler.ashx — A handler that exposes endpoints using the Sitecore Item API on the CM server.
  • Extractor.App.Con.exe — A .NET 9.0 console application that calls the handler and uploads extracted data to Azure Blob Storage.

Important prerequisite: Before running the extraction, update the Sitecore.Services.SecurityPolicy setting in .\App_Config\Sitecore\Services.Client\Sitecore.Services.Client.config to ServicesOnPolicy. This grants access to Entity and Item Services that the handler needs. Remember to revert to ServicesLocalOnlyPolicy after extraction is complete — this is a security consideration.

To install the handler, copy ExtractorHandler.ashx to your CM's wwwroot/sitecore/admin directory. Verify it works by navigating to: /sitecore/admin/ExtractorHandler.ashx?action=getallsitenames

Then run the console app with the required parameters:

.\Extractor.App.Con.exe --cmHostName=sc1041cm.dev.local --userName=admin --password=b --uploadUrl="<Azure Blob SAS URL from Pathway>"

The tool prompts you to select the site name and enter the language code. Then it processes the items and uploads them to Azure Blob Storage.

PowerShell console showing XMComponentExtraction running - login, site selection, processing, and completion
XMComponentExtraction console output — login, site selection (website), language (en), processing, and "Extractor App Ended"

Once this completes, the source page data is in Azure Blob Storage. From this point, Pathway handles everything within the SitecoreAI environment — you do not need to keep your source system connection active.

Step 3 — Content Audit

Back in Pathway, Step 3 is where the AI takes over. There are two actions here: Grouping and Template Mapping.

Grouping — Click the Grouping button and the AI analyzes all extracted pages, grouping them by source template. In my test with a simple demo site, it found 1 group ("Sitecore Experience Hub") with 1 page (Home). For real-world sites with hundreds of pages, this is the step where Pathway's value becomes clear — instead of mapping every page individually, the AI identifies common patterns and groups similar pages together.

Template Mapping — Next, click Template Mapping. The AI matches each group to the most appropriate target template. You can click "View template match details" to see the AI's reasoning — why it chose a particular mapping based on page structure and content. In my case, the "Sitecore Experience Hub" group was mapped to the "Page" target template.

Content Audit screen showing Grouping and Template Mapping completed, with Sitecore Experience Hub group
Content Audit — Grouping completed (1 group, 1 page), Template Mapping completed, showing "Sitecore Experience Hub" group

Step 4 — Map Content

In Step 4, the AI maps source components to target SitecoreAI components. You can review the suggested mappings and adjust them manually if needed.

Clicking "View page details" shows you the specifics — the page path, page ID, template name, and template ID. This helps you verify that the mapping makes sense for your content model.

Map Content screen showing Component Mapping completed with Page details popup
Map Content — Component Mapping completed, with Page details showing the Home page mapped to "Page" template

Step 5 — Migration

The final step — click the Migration button and watch the real-time dashboard.

Here is where I have to be honest about my experience. In my test with a simple demo site (1 page), the migration result was 0 succeeded, 1 failed. The Home page at /sitecore/content/Home failed to migrate.

Migration results showing 0 succeeded and 1 failed for /sitecore/content/Home
Migration result — 0 succeeded, 1 failed. The Home page did not migrate successfully.

I want to be transparent about this because it reflects the reality of working with a tool that is still in beta. The failure could be due to several factors — how the demo site was set up, a mismatch in the component mapping, or a limitation in the current version. This is a simple test site and not a complex real-world scenario, so the failure might be specific to my setup.

In contrast, my "Any Website" migration (using the web crawling path with my blog nehemiahj.com) migrated all 50 pages successfully — 50 succeeded, 0 failed. I will cover that walkthrough in my next post.

What I Learned

A few takeaways from this walkthrough:

The prerequisite setup takes time. Between deploying the XMComponentExtraction handler, changing SecurityPolicy settings, preparing the target structure JSON, and managing the Azure Blob URL, there are several moving parts. Plan for this upfront rather than expecting it to be quick.

The CMSExportStructure step needs attention. Getting the right items serialized and placed in the correct folder structure for the PowerShell script is important. If your JSON is incomplete or missing renderings, the AI will not have enough information to map correctly.

AI reasoning is available but not customizable. You can see why the AI made certain mapping decisions through the "View template match details" option. But you cannot guide or instruct the AI — for example, telling it "these are product pages, not blog posts." This is something I hope Sitecore adds in a future update.

Failures happen and that is okay. The tool is in beta. Not every migration will succeed on the first run, and understanding why it failed is part of the learning process. Post-migration review and refinement should be expected.

The Security Policy change is easy to forget. Reverting ServicesOnPolicy back to ServicesLocalOnlyPolicy after extraction is a security step that should not be skipped. Consider adding it to your migration checklist.

Up Next

In the next post, I will walk through the "Any Website" migration path — using Pathway's built-in web crawler to migrate content from my blog (nehemiahj.com). That test had a much better outcome (50/50 success), and the web crawling approach is simpler to set up since it does not require XMComponentExtraction or Azure Blob Storage.

Useful Links:

29 April, 2026

SitecoreAI Pathway vs XM to XM Cloud Migration Tool


In my previous blog post, I wrote about SitecoreAI Pathway when it was announced at Sitecore Symposium 2025. I mentioned that I would explore the tool once it was available. Since then, I have been hands-on with Pathway and recently presented my findings at the Sitecore User Group Coimbatore (SUGCBE) in April 2026. This post is an updated comparison between the XM to XM Cloud Migration Tool (which I have been writing about throughout 2025) and the new SitecoreAI Pathway, now informed by my actual experience using the tool.

If you have been following my migration tool series (Part 1, Part 2, Part 3), you know how the old tool works. This post explains what Pathway does differently and why Sitecore is moving in this direction.

Quick Recap: XM to XM Cloud Migration Tool

The XM to XM Cloud Migration Tool was a technical utility that helped move content, media, and users from Sitecore XP/XM to XM Cloud. It was a "Lift & Shift" approach - it moved your data as-is from source to target.

It had two modes: GUI and CLI. You would point it to your source CM instance, select the content, media, or users to migrate, and it would transfer them to the target XM Cloud environment. Simple and effective for what it was designed to do.

However, it had clear limitations:

  • It only moved data, not the website structure
  • It preserved the existing monolithic content models without modernizing them
  • No support for component-first restructuring
  • No content modeling cleanup
  • Sitecore-to-Sitecore only - no support for other CMS platforms
  • Required direct access to the source CM instance

The tool is now limited to as-is migration only — primarily used for migrating media library assets from source to target. Pathway is the new recommended approach for content migration.

What is SitecoreAI Pathway?

SitecoreAI Pathway is an AI-powered content migration tool available as a marketplace app through the Sitecore Cloud Portal. Unlike the old migration tool, Pathway does not just move data from point A to point B. It uses AI to analyze your existing site structure, intelligently map your legacy templates and renderings to modern SitecoreAI components, and transform your content model during migration.

Think of it this way: the old tool was a moving truck that transported your furniture as-is. Pathway is more like an interior designer who reorganizes and modernizes your furniture to fit a new, better-designed house.

Two Migration Paths

One of the biggest things I discovered during my exploration is that Pathway supports two distinct migration paths:

1. Sitecore Website — For existing Sitecore XM/XP customers. You use the XMComponentExtraction tool to extract source page data as JSON files and upload them to Azure Blob Storage. Pathway then groups pages by source template. This path requires .NET 9.0, PowerShell 7+, and CM access.

2. Any Website — This is the one that surprised me. Pathway has a built-in web crawler that can scrape any public website. Just provide a URL or sitemap XML link, and Pathway handles the rest — no Azure Blob setup, no extraction tools. I tested this with my own blog (nehemiahj.com) and it crawled 50 pages, grouped them by page design and HTML structure, and migrated them successfully.

Both paths share a common requirement: your target SitecoreAI site structure must already be in place — templates, components, page designs. Pathway maps content to existing structures; it does not create new ones.

Side-by-Side Comparison

Aspect XM to XM Cloud Migration Tool SitecoreAI Pathway
Approach Lift & Shift AI-powered intelligent mapping & transformation
Source Systems Sitecore XP/XM only Sitecore XP/XM and any public website (via built-in web crawler)
What it Migrates Content, Media, Users Page content & structure with AI-mapped components (Media still needs the old tool)
Content Model Preserves existing structure as-is Modernizes content model to headless-friendly components
AI Involvement None AI groups pages, maps components, provides reasoning for mapping decisions
Human Review Select what to migrate, then it runs Human-in-the-loop: review and validate AI mappings before execution
Prerequisites on Target XM Cloud environment ready Target site structure, templates, components, and page designs must be created first
Multi-language Supported Single language per migration run (beta limitation)
Interface Desktop app (GUI) or CLI Web-based (Sitecore Cloud Portal) + downloadable extraction packages
Speed Depends on content volume Sitecore claims up to 70% reduction in migration time through AI automation
Accuracy 1:1 copy (100% of what it copies) AI-mapped content needs manual review — post-migration refinement is expected
Cost Free (part of XM Cloud) Iincluded (free) with Sitecore 360 / SitecoreAI subscription

How SitecoreAI Pathway Works (High Level)

The migration process in Pathway follows these key stages:

  1. Install & Create Migration - Install Pathway from the Sitecore Cloud Portal Marketplace. Create a new migration and select your source type — "Sitecore website" or "Any website."
  2. Configure Target - Select your target SitecoreAI environment and site. Upload your target site structure (xmc-structure.json) generated using the CMSExportStructure PowerShell script.
  3. Extract Source Data - For Sitecore websites: deploy the XMComponentExtraction handler and console app to extract page data as JSON to Azure Blob Storage. For any website: click "Start Web Crawling" and Pathway's crawler discovers and extracts pages automatically via the sitemap.
  4. Content Audit — AI Page Grouping - Pathway's AI analyzes the extracted pages and groups similar pages together. For Sitecore sites, grouping is based on source templates. For any website, it analyzes page design and HTML structure.
  5. Template Mapping - The AI matches each group to the most appropriate target template. You can view the reasoning behind each mapping decision — why it chose a particular template based on page structure and content.
  6. Component Mapping - The AI maps source components to target SitecoreAI components. Review and adjust manually if needed.
  7. Execute Migration - Run the migration with a real-time dashboard showing succeeded vs failed pages. Retry failed pages as needed.

What Pathway Does NOT Migrate

It is important to know the boundaries. Pathway does NOT handle:

  • Media library assets - You still need the XM to XM Cloud Migration Tool for this
  • Templates and component code - Must be recreated in SitecoreAI by developers
  • Personalization rules
  • Workflow states
  • Analytics data
  • Custom modules and integrations
  • Datasources with child items

This means the old migration tool is not obsolete. You will likely use both tools together: the old tool for media migration and Pathway for intelligent content migration.

Beta Limitations to Keep in Mind

Since Pathway is still in beta, there are some constraints I encountered during my testing:

  • 50 URL limit — Each migration run supports a maximum of 50 URLs. For larger sites, you will need multiple runs.
  • Single language per run — Pathway currently supports one language per migration. Multilingual sites will need separate runs for each language.
  • No JavaScript-rendered content — The web crawler works with static HTML only. Pages that rely heavily on client-side JavaScript rendering will not be fully captured.
  • No gated or authenticated pages — The crawler cannot access pages behind login walls or IP restrictions (unless you whitelist the crawler's IP).
  • No AI customization yet — You can see the AI's reasoning for its mapping decisions, which is great for transparency. But there is no way to guide or customize the AI instructions yet. If your team knows the source system well, you cannot tell the AI "these are product pages, not blog posts" — that capability is not available today.
  • No re-run capability — If something goes wrong mid-migration, you may need to start from the beginning rather than resuming where you left off.

My Hands-On Experience

I tested Pathway using both migration paths. For the "Any Website" path, I used my blog nehemiahj.com. The crawler picked up 50 pages from the sitemap, the AI grouped them and named each group based on the blog post content (like "XM Migration CLI Guide," "Commerce Cache Troubleshooting"), and mapped them all to the "Page" target template. Migration result: 50 succeeded, 0 failed. The migrated pages appeared in the Content Editor organized by year and month, with URL-friendly names preserved.

The AI grouping is where Pathway really shines. Instead of mapping every page individually, the AI identifies common patterns and groups similar pages together. This dramatically reduces repetitive mapping effort, especially for sites with hundreds of pages sharing similar structures.

The prerequisite setup for the Sitecore website path is more involved — you need to deploy the XMComponentExtraction handler, configure SecurityPolicy changes, set up Azure Blob Storage, and prepare the target structure JSON. It works, but plan time for getting all the pieces in place.

Key Takeaway

The XM to XM Cloud Migration Tool and SitecoreAI Pathway serve different purposes. The old tool is a reliable data mover, now primarily used for media migration. Pathway is a content modernizer that uses AI to transform your legacy content model into a headless-friendly architecture.

For organizations migrating to SitecoreAI, the recommended approach is:

  1. Design your target site structure in SitecoreAI (templates, components, page designs)
  2. Migrate media assets using the XM to XM Cloud Migration Tool
  3. Migrate and transform content using SitecoreAI Pathway
  4. Review and refine migrated content — AI mapping is powerful but post-migration cleanup is expected

In my upcoming posts, I will share detailed walkthroughs of both migration paths with screenshots — the Sitecore website extraction process and the web crawling approach. Stay tuned.

Useful Links:

10 April, 2026

Fixing Coveo Analytics InvalidToken and ExpiredToken Errors in Sitecore XM/XP

If you are using Coveo for Sitecore and seeing InvalidToken or ExpiredToken errors in your Coveo analytics logs, this post covers four issues I ran into and how I fixed them. These are specific to Sitecore XP sites using the Coveo REST proxy for analytics, but the patterns may help anyone dealing with Coveo analytics initialization problems.

The Setup

The site had Coveo analytics initialized in multiple places - a shared JavaScript file (coveo-analytics.js) and several cshtml Razor views (CoveoPageViewAnalytics.cshtml, ProductDetails.cshtml, ThankYou.cshtml). Each file loaded the Coveo analytics script using the standard IIFE loader and called coveoua('init', apiKey) independently.

Here is a quick comparison of the two approaches:

Aspect coveo-analytics.js cshtml Razor Views
API Key Source Hardcoded in JS with client-side hostname check Server-side via Settings.GetSetting("CoveoAnalyticsKey", "")
Init Timing Immediate - runs as soon as script loads Deferred - waits for CoveoSearchEndpointInitialized event
Analytics Routing Direct to Coveo cloud Through Sitecore proxy (/coveo/rest/ua/v15)
Sends Events No - just initializes for other components to use Yes - sends page view with metadata

On pages where both the JS file and a cshtml view loaded together, things broke. Here are the four problems and fixes.

Problem 1: Analytics Bypassing Sitecore Proxy

The coveo-analytics.js file was sending analytics events directly to analytics.cloud.coveo.com instead of routing through the Sitecore proxy at /coveo/rest/ua/v15. The cshtml files had the proxy override but the JS file did not.

The fix is to override baseUrl on the Coveo analytics client prototype inside the onLoad callback (you need to wait for the script to load before the prototype is available):

coveoua('onLoad', function () {
    Object.defineProperty(
        coveoanalytics.CoveoAnalyticsClient.prototype,
        'baseUrl',
        { get() { return '/coveo/rest/ua/v15'; } }
    );
    coveoua('init', apiKey);
});

This makes sure all analytics traffic goes through the Sitecore proxy, same as the cshtml views.

Problem 2: InvalidToken from Multiple Initialization

This was the main issue. On pages where both coveo-analytics.js and a cshtml view loaded, coveoua('init') was being called twice. Each call resets the analytics client and creates a new session token. Any events queued from the first initialization become invalid - hence the InvalidToken error.

There was also a secondary issue: the Object.defineProperty call for the proxy override would throw a TypeError on the second call because the property was defined as non-configurable by default.

The fix is a two-flag initialization guard using a shared namespace on window. Why two flags? Because the IIFE script injection is synchronous but the init call happens asynchronously inside onLoad. A single flag would either skip the IIFE or skip the init at the wrong time depending on load order.

window.SITE = window.SITE || {};

// Flag 1: Guard the IIFE script injection (synchronous)
if (!window.SITE._coveoScriptInjected) {
    window.SITE._coveoScriptInjected = true;

    (function (c, o, v, e, O, u, a) {
        a = 'coveoua'; c[a] = c[a] || function () { (c[a].q = c[a].q || []).push(arguments) };
        c[a].t = Date.now(); u = o.createElement(v); u.async = 1; u.src = e;
        O = o.getElementsByTagName(v)[0]; O.parentNode.insertBefore(u, O)
    })(window, document, 'script', 'https://static.cloud.coveo.com/coveo.analytics.js/2/coveoua.js');
}

// Flag 2: Guard the init call (asynchronous, inside onLoad)
coveoua('onLoad', function () {
    if (!window.SITE._coveoInitialized) {
        window.SITE._coveoInitialized = true;
        Object.defineProperty(
            coveoanalytics.CoveoAnalyticsClient.prototype,
            'baseUrl',
            { get() { return '/coveo/rest/ua/v15'; } }
        );
        coveoua('init', apiKey);
    }
});

This pattern works regardless of which file loads first - JS or cshtml. The first one to run sets the flag and does the initialization. The rest skip it. Event-sending code like coveoua('send', 'view', metadata) or coveoua('ec:addProduct', ...) does not need guards - those use the internal queue and will execute correctly once init completes.

I applied this guard pattern across all four files that had Coveo analytics initialization.

Problem 3: ExpiredToken and InvalidToken from Bot Traffic

While debugging the above, I also noticed two other errors in the Coveo logs that turned out to be caused by Googlebot crawling the site. This is worth calling out because in today's landscape of bots and AI crawlers, bot behavior is constantly changing. Bots render JavaScript, execute search queries, and trigger analytics events just like real users do. If you are not filtering bot traffic from your analytics endpoints, you end up with polluted data - expired tokens from stale bot renders, invalid sessions from crawlers re-initializing your analytics client, and noise that makes it harder to trust your actual user metrics. Keeping bot traffic out of your analytics pipeline is not optional anymore.

The first was a 419 ExpiredToken on the search proxy (/coveo/rest/search/v2). Sitecore generates a search JWT at page render time with a 24-hour TTL. By the time Googlebot crawled and executed the page, the token was expired. This is a server-side issue - the frontend cannot fix it.

The second was a 400 InvalidToken on the analytics proxy (/coveo/rest/ua/v15). This was a different root cause - the CoveoAnalyticsKey Sitecore setting on one of the sites was misconfigured. It had a search JWT (issued by SearchApi, with a queryExecutor role) instead of a static analytics API key (the xx... format). The analytics endpoint does not accept search tokens, so it returned InvalidToken.

Quick check: If your CoveoAnalyticsKey Sitecore setting starts with xx, you are good. If it looks like a long JWT string, it is wrong - you need the static analytics API key from your Coveo admin console.

Filtering Bot Traffic from Analytics

To stop bots from generating noise in your Coveo analytics, you have a few options:

Approach How Notes
Cloudflare WAF Block Googlebot on /coveo/rest/ua/* only Best option if Cloudflare is in your stack. Do NOT block /coveo/rest/search/ - that will break SEO.
IIS URL Rewrite Match User-Agent containing "Googlebot" on ^coveo/rest/ua/.* and abort Works at the server level before Sitecore processes the request
JavaScript Check navigator.userAgent for bot patterns before calling initCoveoUa() Secondary layer - Googlebot can execute JS, so not fully reliable

The key point: only block bots from the analytics path (/coveo/rest/ua/). The search path (/coveo/rest/search/) needs to stay open for Googlebot to index your search-driven content like product pages and knowledge base articles.

And do not just think about Googlebot. Bingbot, SEMrush, Ahrefs, AI training crawlers - they all behave differently, and they change their patterns over time. None of them have any legitimate reason to send commerce analytics events like add-to-cart or purchase. Filtering them out keeps your Coveo ML models trained on real user behavior, not crawler noise.

Problem 4: Google Translate Users Getting Failed Analytics (CORS Preflight)

This one was not a bot issue at all. I noticed 400 errors on OPTIONS /coveo/rest/ua/v15/analytics/custom in IIS logs, with a real browser User-Agent (iPhone Safari) and a Referer from *.translate.goog.

When a user browses your site through Google Translate, the page is served from *.translate.goog (e.g. yourdomain-com.translate.goog). Any JavaScript on that page making requests back to your actual domain is now cross-origin. The browser sends an HTTP OPTIONS preflight request before the actual analytics POST.

Here is the flow:

  1. User visits yourdomain-com.translate.goog
  2. JS tries to POST analytics to yourdomain.com/coveo/rest/ua/v15/analytics/custom
  3. Browser detects cross-origin and sends OPTIONS preflight first
  4. Sitecore returns 400 (Coveo proxy handler does not handle OPTIONS)
  5. Browser blocks the actual analytics POST
  6. Commerce events (add-to-cart, purchase, page view) are never recorded for translated sessions

These are real users losing analytics tracking - not bots. The Cloudflare bot-blocking rule from Problem 3 should NOT block this traffic.

Why Not Cloudflare?

I initially considered Cloudflare for this fix, but it does not work well here. Cloudflare Transform Rules can add headers but cannot change the HTTP status code. A WAF Custom Rule with "Return fixed response" can return a 204, but it cannot echo back the dynamic Origin header value - which is required for CORS. The origin changes per site (e.g. example-com.translate.goog), so you cannot hardcode it.

Fix: Sitecore httpRequestBegin Pipeline Processor

The fix that worked is a custom Sitecore httpRequestBegin pipeline processor that intercepts OPTIONS preflight requests for the Coveo analytics path and returns 204 with the correct CORS headers before the request reaches the Coveo proxy handler.

public class HandleCoveoCors : HttpRequestProcessor
{
    public override void Process(HttpRequestArgs args)
    {
        var request = HttpContext.Current.Request;
        var response = HttpContext.Current.Response;
        var origin = request.Headers["Origin"] ?? "";

        if (request.HttpMethod == "OPTIONS" &&
            request.Path.StartsWith("/coveo/rest/") &&
            origin.EndsWith(".translate.goog"))
        {
            response.StatusCode = 204;
            response.AddHeader("Access-Control-Allow-Origin", origin);
            response.AddHeader("Access-Control-Allow-Methods", "POST, OPTIONS");
            response.AddHeader("Access-Control-Allow-Headers",
                "Content-Type, Authorization");
            response.AddHeader("Access-Control-Max-Age", "86400");
            response.End();
        }
    }
}

A few notes on the implementation:

  • Why origin.EndsWith(".translate.goog"): Google Translate generates subdomains dynamically per site. Matching the suffix covers all translated variants without hardcoding each one.
  • Why 204 and not 200: 204 No Content is the correct HTTP status for a successful OPTIONS preflight with no response body. Browsers accept both, but 204 is the right semantic choice.
  • Why pipeline processor: It handles both the status code and the dynamic Origin header correctly, and keeps the logic server-side alongside the Coveo proxy handler it is protecting.

Summary

Four issues, four fixes:

  1. Analytics bypassing proxy - Add baseUrl override via Object.defineProperty inside onLoad
  2. InvalidToken from duplicate init - Two-flag initialization guard (_coveoScriptInjected + _coveoInitialized) on a shared window namespace
  3. Bot traffic causing token errors - Block bots from analytics proxy only (not search), and verify your CoveoAnalyticsKey Sitecore setting has a static xx... key
  4. Google Translate CORS preflight - Sitecore httpRequestBegin pipeline processor to handle OPTIONS requests from *.translate.goog with proper CORS headers

Hope this helps if you are dealing with similar Coveo analytics issues in Sitecore. If you have questions, leave a comment below.

blockquote { margin: 0; } blockquote p { padding: 15px; background: #eee; border-radius: 5px; } blockquote p::before { content: '\201C'; } blockquote p::after { content: '\201D'; }