Automating a Content Inventory

Some websites have tens of thousands of pages, each with its own set of text, images, documents, and applications. How can you systematically make sure every page is up-to-date and has high-quality content that represents your brand well? The answer is simple: perform a content inventory.

But before you fire up Excel and prepare for an all-nighter (or three), think about automating your content inventories. Automation will help make the process go faster, which is a big plus, and will ensure you don’t miss any pages. It will also make it easier to group pages in logical categories, which allows you to evaluate what kind of content you have on your site, and what you’re lacking.

Why do a content inventory?

A content inventory is the first step toward performing a content audit, and you need a content audit in order to create a content strategy. In other words, you have to know what’s on your site before you can evaluate it, and you have to know the state of your content before you can change or recycle it.

Free Bonus: There’s even more good stuff on why you should perform a content inventory in our whitepaper, “How to Do a Content Inventory.”

What goes into a content inventory?

Before you start the inventory, make sure you know what you want to get out of it. You’ll also want to know the scope of the inventory—are you combing through the whole site, or are you focusing on one area or date range? advises you to include the following raw data in a basic content inventory (feel free to add more detail as you see fit):

  • Unique Content ID—also called a page ID. Use this information to establish a clear labeling system for each page.
  • Title, or page title. This is what your page is called at the top of your browser.
  • URL
  • File Format (HTML, PDF, DOC, TXT…)
  • Author or Provider—who created the page? Sales, marketing, etc.?
  • Physical location—is it in the content management system, on the server, etc.?
  • Meta Description
  • Meta Keywords
  • Categories/ Tags
  • Dates (created, revised, accessed)

Automation comes in handy when you’re ready to obtain these data.

How to Automate Your Content Inventory

Write Your Own Script

If you’ve got some coding skills, you can set up your own script and run a content inventory. One benefit of doing it this way is you can customize it to retrieve the data in which you’re most interested.

Site Crawler

A site crawler is a common way to automate your content inventory. Also called a bot, spider, and other vaguely menacing names, a site crawler is a program that browses your entire site and records the information on each page. You can then export that information into Excel or another spreadsheet software for an (almost) instant list.

There are dozens of site crawlers. When picking one, make sure yours meets these criteria:

  • The tool meets your security needs.
  • The tool will be able to meet the scope of your project. Bigger sites require more robust software, naturally.
  • The tool will collect all the data you need it to.

Most importantly, the program should be easy to use, and intuitive. Programs that no one will be able to figure out or want to use are no good, obviously. It may also give you an incomplete or inaccurate inventory, if you execute the program incorrectly. Moreover, “a content inventory is an ongoing process. As content is created, edited, deleted, or moved, you need to adjust your spreadsheet accordingly,” says CMSWire. You’ll therefore need a program that’s simple enough to use repeatedly.

 Content Management System

Sometimes your own CMS can do the job. For example, WordPress (which this site uses) has a plugin that lets you run a content inventory on existing pages.

Unfortunately, the content inventory process can’t be fully automated. Once your site crawler, code, or CMS dumps your data into a spreadsheet, you’ll need to go through and flesh out any missing details. This will ease the content auditing, as well as the content strategy creation.

Happy tallying!


William Flanagan

CEO & Founder, Audienti. Former VP-Cognio, Founder-sentitO/Verso, SALIX/Tellabs, PrimaryAccess/3Com, CompuServe. Expert in data-driven marketing.

Newsletter Signup

Signup for our general newsletter.
  • You can unsubscribe at any time.