5. The Data Layer
What dataLayer.push really does, GTM's internal data model and recursive merge, the GA4 ecommerce schema, SPA patterns, and the classic data layer mistakes.
Ask ten implementers what the data layer is and you'll get "an array you push events into". True, and it misses the point. The data layer is two things wearing one name: a message queue (the array) and a state model (inside GTM) that the queue feeds. Nearly every data layer bug comes from not knowing which of the two you're talking to.
The mechanism: a hijacked push
Before gtm.js loads, dataLayer is a plain JavaScript array. Pushing to it
does nothing but store a message — which is the trick: messages queue up until
GTM is ready. When the container boots, it replaces the array's push
method with its own. Conceptually:
// before gtm.js: just an array, push only stores
window.dataLayer = window.dataLayer || [];
// when gtm.js boots, roughly this happens:
var oldPush = dataLayer.push;
dataLayer.push = function (message) {
oldPush.apply(dataLayer, arguments); // still lands in the array (a log)
gtmProcess(message); // merge into the internal model,
}; // then evaluate triggers (Ch. 6)First it drains everything already queued, in order; from then on every push is processed live. Two consequences:
- Pushing before the snippet is not just legal, it's the pattern for data that must exist at page load (user state, page metadata — more below).
- After boot, the array is only a log. Typing
dataLayerin the console shows you what was pushed, not what GTM currently knows. Those differ, because of the model:
The internal data model
GTM keeps a private state object — the docs call it the abstract data model. Every pushed message is recursively merged into it, and it lives until the next full page load:
dataLayer.push({ user: { id: "U-1", plan: "free" } });
dataLayer.push({ user: { plan: "pro" } });
// the array (the log) now holds two messages, but GTM's model holds:
// user = { id: "U-1", plan: "pro" } ← merged, not replacedA Data Layer Variable (Chapter 6) reads from this model using dot notation —
user.plan → "pro". That's "Version 2" of the variable, the default; the
legacy "Version 1" predates the recursive model and reads top-level keys
literally. You want Version 2 essentially always.
The merge has one sharp edge: arrays merge index by index, like objects with numeric keys. Push 3 items, then push 1 item, and the model still contains the old items at indexes 1–2. This is the famous stale-ecommerce bug, and it's why the standard pattern resets before each ecommerce push:
dataLayer.push({ ecommerce: null }); // wipe the subtree first
dataLayer.push({
event: "purchase",
ecommerce: { /* fresh data */ }
});Equally important is when values are read. Each message is processed atomically: merge, then evaluate triggers, and any tag fired by that message reads variables as of that moment — a snapshot. Which means data must be in the model before, or at latest inside, the push that carries the event:
// WRONG — tag fires on the event, lead_value isn't in the model yet
dataLayer.push({ event: "generate_lead" });
dataLayer.push({ lead_value: 500 });
// RIGHT — one message: merged first, then triggers evaluate
dataLayer.push({ event: "generate_lead", lead_value: 500 });That single rule explains the majority of "the variable is undefined in my
tag" tickets.
The event key
A push with an event key is a message and a trigger signal — Custom
Event triggers (Chapter 6) match on its name. A push without one merges
data silently; nothing fires. Both are useful: silent pushes stage data, event
pushes act on it.
Two companions exist for the "push, then leave the page" race:
dataLayer.push({
event: "generate_lead",
lead_type: "contact_form",
eventCallback: function () { window.location = "/thank-you"; },
eventTimeout: 2000 // ms; navigate anyway if tags hang
});eventCallback runs after the event's tags have fired (per loaded container —
beware double calls when two GTM containers are installed), and eventTimeout
is its safety valve. Modern transports reduced the need — GA4 ships events via
sendBeacon, which survives navigation (Chapter 2) — but not every vendor tag
does, so the pattern is still the safe default for form → redirect flows.
The ecommerce schema
GA4 defines the de-facto vocabulary: an ecommerce object with standard event
names (view_item, add_to_cart, begin_checkout, purchase…) and an
items array. The full purchase shape:
dataLayer.push({ ecommerce: null });
dataLayer.push({
event: "purchase",
ecommerce: {
transaction_id: "T-1024", // must be unique — GA4 dedupes on it
value: 138.0, // order total (one number, not a sum you
currency: "EUR", // hope matches the items — make it match)
coupon: "SUMMER10",
items: [
{
item_id: "SKU-1",
item_name: "Pixel Hoodie",
price: 89.0,
quantity: 1,
item_category: "apparel",
},
{
item_id: "SKU-2",
item_name: "Sticker Pack",
price: 24.5,
quantity: 2,
},
],
},
});Here's why this schema matters beyond GA4: one push feeds every platform.
Your Meta purchase tag, TikTok tag, and Google Ads conversion tag all read the
same ecommerce object through variables and translate it to their own
vocabulary (value/currency map almost 1:1 everywhere). That's the data
layer's whole reason to exist — the site pushes facts once, in one shape, and
tags do the per-vendor dialect work. The alternative — every vendor's snippet
scraping its own numbers — is the tag soup data mess from Chapter 1.
Page-load data: push before the snippet
Anything the server already knows at render time should be in the data layer before GTM boots — remember, the snippet preserves an existing array:
<script>
window.dataLayer = window.dataLayer || [];
dataLayer.push({
page_type: "product",
login_state: "logged_in",
user_id: "U-1", // your own ID — never an email in plaintext
});
</script>
<!-- GTM snippet AFTER, so this is in the model before any trigger fires -->No event key needed — this is staging data, available from the very first
trigger (even Consent Initialization). This pattern is the backbone of every
serious implementation: the application states facts; GTM consumes them.
Reading the same facts by scraping the DOM (CSS selectors, element text) is the
fallback when you can't touch the codebase — and it breaks on the next redesign,
silently. Push > scrape, every time you have the choice.
SPAs: where assumptions go to die
Two opposite persistence rules, and each trips people in a different way:
- Classic multi-page site: the data layer dies on every navigation — full page load, fresh array, fresh model. Nothing persists between pages without cookies/storage. (A purchase event pushed and then navigated away from, without beacon/callback, simply evaporates.)
- Single-page app: the data layer never resets — no page load, no
reset. The
page_type: "product"you pushed three routes ago is still in the model on the checkout "page". Stale state, the SPA disease.
The working SPA pattern:
-
Hook your router (Next.js, React Router, …) and on every route change push one event carrying fresh, complete page data — overwriting the stale keys:
dataLayer.push({ event: "virtual_page_view", page_path: "/checkout", page_type: "checkout", page_title: "Checkout", }); -
Fire your GA4/pixel "pageview" tags on that custom event — instead of the built-in Page View trigger (which fires once per actual page load — exactly once per SPA session).
-
Mind the double-fire on initial load: the container boot fires
gtm.jsand many routers emit an initial route event. Pick one source of truth for the first pageview and suppress the other.
GTM's built-in History Change trigger (gtm.historyChange) can detect
route changes without touching app code — useful for quick wins, but it can't
know your page_type or user state. The router-push pattern is the real fix;
History Change is the patch.
Naming and governance
- snake_case, GA4-style event names (
generate_lead,add_to_cart) — not because GA4 is sacred, but because one consistent dialect beats three. Event names are case-sensitive in triggers:Purchase≠purchase. - Write the spec down. A data layer is a contract between the app and the container; a one-page table (event name · when it fires · keys · example push) per project saves every future debugging session. We keep one per client.
- Namespace custom keys when the site is large (
plt_user_tier), so a plugin or third-party script pushing generic keys can't collide with yours.
The classic mistakes
A checklist distilled from years of audits — all of these are real and common:
dataLayer = [{...}]after GTM loaded — assignment replaces the array, discarding the hijackedpush. Messages go into a dumb array forever after; GTM hears nothing. Alwayspush, never assign (thewindow.dataLayer = window.dataLayer || []idiom is the only safe assignment, and only because it preserves an existing array).- Event pushed before its data (separate pushes, wrong order) — tags read
undefined. One message, or data first. - No
ecommerce: nullreset — index-merged stale items contaminate the next event. - Debugging the array instead of the model —
dataLayerin the console is the log; Preview mode (Chapter 8) shows the model per message. - Case/typo drift between push and trigger —
purchasevsPurchasefires nothing and errors nowhere. - DOM scraping where a push was possible — survives until the next CSS refactor.
- PII in the data layer — plaintext emails in a model that every tag (and every tag vendor) can read. Push your own user IDs; hash what platforms need hashed (Chapters 13–14, 17).
The data layer is the input. What consumes it — triggers evaluating every message, variables resolving at fire time, tags and their sequencing — is Chapter 6 — Tags, Triggers & Variables.