Tag: ecommerce operations

  • How to Find and Eliminate Duplicate SKUs in Your Product Catalog

    How to Find and Eliminate Duplicate SKUs in Your Product Catalog

    How to Find and Eliminate Duplicate SKUs in Your Product Catalog

    Duplicate SKUs are one of the most damaging catalog quality problems because they are invisible until they cause a serious incident — wrong products shipping to customers, inventory counts that do not match reality, Google Shopping disapprovals from duplicate product identifiers. And they are extremely common. Most catalogs that have grown over several years without strict governance have duplicate records that have never been identified.

    How Duplicate SKUs Form

    Duplicate SKUs rarely appear overnight. They accumulate through specific, predictable patterns:

    • Multiple people creating SKUs manually without a centralised reference — two team members independently create records for the same product using different SKU formats
    • Supplier data imports without duplicate checking — supplier feeds include products already in your catalog, creating a second record with the supplier’s SKU alongside your existing record
    • Platform migrations — importing products from one system to another creates duplicates when migration logic fails to match existing records
    • SKU reuse after product discontinuation — a discontinued product’s SKU is reassigned to a new product, creating historical confusion even if it is not technically a current duplicate
    • Variant management errors — each size/colour combination of a product is created as a standalone product instead of as variants, with new SKUs each time

    The Business Impact of Duplicate SKUs

    ProblemImpact
    Inventory split across duplicate recordsStock appears lower than it is in each record — triggers false out-of-stock, missed sales, unnecessary reorders
    Wrong record picked for ordersWrong product ships, or the correct product ships but from the wrong inventory pool
    Google Shopping feed errorsTwo products with the same GTIN in one feed cause disapprovals for both
    Two site listings for same productCustomers see duplicate listings, sales and reviews are split, cannibalisation of the same keyword
    Split sales historyPerformance reports undercount actual sales, making reorder and pricing decisions unreliable

    Step-by-Step: Finding Duplicate SKUs

    Method 1: Exact SKU duplicates

    Export your full product list with SKU as the first column. Sort by SKU. Any SKU that appears more than once is a confirmed exact duplicate. Flag all instances and review which is the canonical record (typically the older record with more order history).

    Method 2: GTIN-based duplicates

    Export your product list with GTIN column. Sort by GTIN. Any GTIN that appears more than once (excluding variant records that legitimately share a parent GTIN) is a duplicate product with different SKUs. These are the most common type of duplicate in catalogs built from multiple supplier data imports.

    Method 3: Name similarity duplicates

    Sort your product list by Product Name. Review near-matches — “Columbia Rain Jacket Men Navy L” and “Columbia Waterproof Jacket Men Navy L” may be the same product entered twice with slightly different names. These require manual review because naming differences can be legitimate variants.

    The Duplicate Detector automates all three detection methods across your full catalog and returns a prioritised list of confirmed and suspected duplicates for review.

    How to Prevent Duplicate SKUs From Recurring

    1. Unique SKU enforcement — configure your platform or PIM to reject any SKU that already exists in the system. This is the single most effective prevention mechanism.
    2. Centralised SKU generation — assign SKUs through one system using a defined naming convention rather than allowing team members to create them manually. A sequential number generator or structured code generator (BRAND-CAT-001) eliminates manual SKU conflicts.
    3. GTIN validation at import — when importing supplier data, check incoming GTINs against your existing catalog before creating new records. If a GTIN already exists, update the existing record rather than creating a new one.
    4. Never reuse discontinued SKUs — mark discontinued products as inactive rather than deleting them, and never reassign their SKUs to new products. SKU history must remain intact for order history and audit purposes.

    Run a duplicate check monthly using the Duplicate Detector as part of your routine catalog maintenance. For the full catalog health audit process, see How to Audit Your Product Catalog in One Weekend.

    Frequently Asked Questions

    What causes duplicate SKUs in ecommerce catalogs?

    The most common causes are: multiple team members creating SKUs without a centralised system, importing supplier data without checking for existing records, platform migrations that duplicate product records, and SKU conventions that allow reuse after product discontinuation. Supplier data imports are the most frequent source of externally-introduced duplicates in mid-to-large catalogs.

    What problems do duplicate SKUs cause?

    Inventory miscounts (stock split across two records), wrong products in orders (system picks wrong duplicate), Google Shopping feed errors (duplicate GTIN causes disapprovals for both records), site search returning two listings for the same product, and split sales history that makes performance reporting unreliable.

    How do I prevent duplicate SKUs from being created?

    Three mechanisms work together: unique SKU enforcement at the system level (reject any SKU that already exists), centralised SKU generation (all SKUs assigned by one system using a defined convention), and GTIN validation at import (check incoming product GTINs against existing records before creating new ones). Implementing all three eliminates the most common duplicate creation paths.

  • Single Source of Truth for Product Data: What It Actually Means (And How to Build One)

    “Single source of truth” is one of those phrases almost every product team agrees with in theory.

    TL;DR: One spreadsheet is considered the main file. Shopify has the latest images.

    In practice, it usually means something much messier.

    One spreadsheet is considered the main file. Shopify has the latest images. A supplier sheet has newer technical specs. Marketing has updated descriptions in another document. Someone exported a CSV last week and adjusted it “just for this channel.” Everyone is working with product data, and everyone thinks their version is the correct one.

    That is exactly why this topic matters. The real problem is rarely that teams have no product data. The real problem is that they have too many competing versions of product truth.

    If you are new to PIM as a category, start first with What Is PIM? The 2026 Guide for Ecommerce Brands & Retailers or PIM Basics. This article is the next step: understanding what product-data authority actually looks like in day-to-day operations.

    What “single source of truth” actually means

    A single source of truth does not mean that only one file exists. It does not mean one system does everything. And it definitely does not mean “whatever happens to be live right now.”

    What it really means is simple:

    There is one authoritative system for product information, and everyone knows which fields, rules, and workflows are controlled there.

    That system becomes the place where product truth is maintained, checked, updated, and distributed.

    What it is

    • One authoritative home for structured product information
    • A system where changes are visible and accountable
    • A place with rules around who can edit, approve, and publish
    • A controlled source that feeds channels consistently
    • A way to fix issues once instead of correcting the same fact in five places

    What it is not

    • One giant spreadsheet everyone edits carefully
    • A folder full of CSV exports
    • A marketplace listing that happens to be visible first
    • A storefront admin treated as the unofficial master
    • A team agreement that lives only in people’s heads

    The distinction matters because storage and authority are not the same thing. A spreadsheet can hold data. A storefront can display data. A DAM can hold assets. But none of those automatically become the authoritative layer for product truth.

    The real problem is not data. It is authority.

    Most product operations teams do not suffer from a lack of product data. They suffer from too many “authoritative” copies.

    • Marketing updates descriptions in one place
    • Merchandising manages categories somewhere else
    • Operations works from supplier files
    • Ecommerce edits what is visible in Shopify
    • Marketplace teams keep channel-specific exports

    Each source may be correct in context. The problem appears when those versions drift apart.

    That is why “single source of truth” is really a question of authority design. You are deciding which system is allowed to be final for which kind of product information.

    Why spreadsheets break down as a source of truth

    Spreadsheets are good at helping teams start. They are fast, flexible, and familiar. That is exactly why teams keep stretching them beyond their natural role.

    But once a spreadsheet becomes the system behind your product catalog, the weaknesses become operational, not just annoying.

    • No real ownership enforcement
    • Weak control over who edits what
    • Validation that is usually light or manual
    • No proper publishing state
    • No category-aware completeness logic
    • No reliable way to govern variants at scale
    • No controlled channel-output layer

    Yes, Google Sheets has version history. But version history is not the same thing as an authoritative operating model. It helps you see what changed. It does not define which structure is canonical, which team owns which fields, or whether incomplete product data should be publishable at all.

    If spreadsheets are still your main operating layer, also read PIM vs spreadsheets: when your Excel-based product catalog becomes a liability.

    What a real single source of truth looks like day to day

    In practical terms, a working source of truth changes how people behave.

    • There is one canonical product record, not several “master” versions
    • Teams stop asking which file is current
    • Changes become visible and accountable
    • Structured fields are governed instead of guessed
    • Channels are fed from the same maintained record
    • Fixes happen upstream instead of being patched repeatedly downstream

    That last point matters a lot. A real source of truth does not just reduce confusion. It changes the direction of work. Teams stop reconciling differences after the fact and start maintaining correctness at the source.

    Why structure matters so much

    Many teams talk about source of truth as if it were only a process decision. It is also a structure decision.

    If your attribute model is weak, your source of truth will stay weak. If your category logic is inconsistent, your source of truth will stay inconsistent. If parent and variant relationships are unclear, your source of truth will create downstream confusion no matter how disciplined the team is.

    That is why this topic connects directly to Product Data Modeling for PIM and Product Taxonomy Guide. Authority is not only about where data lives. It is also about how that data is structured and controlled.

    Where PIM fits into a single source of truth

    PIM exists specifically to act as the authoritative layer for product information.

    That does not mean PIM replaces ERP, DAM, or storefronts. It means PIM becomes the governed layer where product information is structured, enriched, validated, approved, and prepared for distribution.

    In a healthy setup, the contract is clear:

    • Some systems feed data into PIM
    • PIM governs the authoritative product-information layer
    • Other systems consume approved data from PIM

    Once that contract is clear, product information stops drifting so easily.

    PIM does not magically create truth. It enforces where truth is maintained.

    If you want the category comparison behind this, go next to PIM vs MDM vs DAM vs PXM: What to Use (and When).

    Ownership matters more than software

    No system can become a real source of truth without ownership.

    That usually means:

    • clear owners for attribute groups
    • defined approval roles
    • shared rules for what “ready to publish” means
    • clarity about who can create, update, approve, and publish changes

    This is why “single source of truth” is not just a platform feature. It is an operating model backed by software.

    If your team needs the language around this, send readers to the PIM Glossary.

    Common mistakes teams make

    • Treating Shopify as the source of truth. It may be the publishing layer, but that does not automatically make it the right place to govern all product structure.
    • Letting exports become editable masters. CSVs should be outputs, not unofficial core systems.
    • Ignoring variants in ownership design. Variant-level confusion spreads quickly into listings, imagery, and identifiers.
    • Assuming everyone knows the rules. If the rules are implicit, they are not operationally reliable.
    • Confusing version history with governance. Knowing who changed something is useful. It is not the same as controlling what should exist and where.

    Why identifiers and structured fields support authority

    Authority gets stronger when key fields are structured properly.

    For example, GTIN is the global identifier used to uniquely identify trade items. That kind of identifier becomes much easier to trust when it is governed as part of a structured product record instead of scattered across sheets, channel exports, and ad hoc custom fields.

    The same is true for custom fields in storefront platforms. Shopify’s own metafield-definition documentation explains that definitions act as templates specifying what part of the store a metafield applies to and what values it can have. That is useful, but it still needs a broader product-data operating model behind it if the business wants real catalog authority.

    In other words: structure supports authority, but structure alone does not replace governance.

    How LynkPIM supports a single source of truth

    LynkPIM fits in the part of the stack where product information needs to become governed, consistent, and channel-ready.

    That means helping teams:

    • define ownership at attribute and category level
    • track changes and approvals
    • validate product data before publishing
    • distribute consistent product information across channels
    • reduce the number of unofficial “master” files in daily work

    The result is not only cleaner data. It is more confidence that what is live is actually correct.

    For action-oriented next steps, point people to the PIM Readiness Assessment, Catalog Health Score, and the main Features and Solutions pages.

    Final takeaway

    A single source of truth is not a slogan. It is a decision about authority, backed by structure, ownership, and workflow.

    If your team still depends on spreadsheets, exports, shared drives, and memory to keep product information aligned, then the issue is not that you lack data. It is that your product truth is spread too thin.

    Once that happens, the smartest move is not to keep policing the chaos harder. It is to create one governed layer where product information can actually be trusted.

    FAQs

    Does single source of truth mean one system does everything?

    No. It means one system is authoritative for product information, while other systems may still provide inputs or consume approved outputs.

    Why can’t a spreadsheet be the source of truth?

    A spreadsheet can store data, but it does not reliably enforce ownership, validation, approval states, or governed multichannel output once product operations become more complex.

    Is Shopify my source of truth if my store is live there?

    Not necessarily. Shopify can be the publishing layer, but many businesses still need a separate authoritative layer for structured product data, governance, and channel control.

    What’s the difference between version history and source of truth?

    Version history helps you see what changed. A source of truth defines where product authority lives, who owns what, and how approved data should flow to channels.

    What makes a source of truth fail?

    Usually unclear ownership, weak product structure, uncontrolled exports, and the habit of letting multiple systems behave like unofficial masters at the same time.