Skip to main content


Accurately tagged digital content is the foundation for its reuse and discoverability across multiple channels: XML tags differentiate text from photos, headers from body copy, captions from photo credits, charts from graphs, and so on.

Turning content into data

Aptara has created metadata for more than 1.2 million newspaper pages and more than 1 million legal, medical, scientific, and educational documents.

Using our proprietary software, our skilled tagging teams assign tags to and capture metadata from:

  • Bibliographic data
  • Subject terms and keywords
  • Journal and book titles
  • Authors and editors
  • Proceedings
  • Volume, issue, and page numbers
  • ISSNs and ISBNs
  • Copyright information
  • Article and chapter history

Semantic tagging

Preparing content for platforms that do not yet exist means investing in content enrichment, or “smart” content.

Most content exists in unstructured, “flat” files with long lines of text—Word documents, PDFs, and application files such as Adobe InDesign. Files such as these are difficult to process automatically because they require interpretation by human beings.

XML approaches this problem by structuring format and style. But semantic tagging brilliantly addresses not just what content looks like, but what it means.

Imagine what a K-12 publisher could do with content correlated to the Common Core State Standards. Or how easily interactivity could be added to eBook content that has been parsed semantically.

Because computers can understand semantic tags, they can turn your legacy and newly created content into smart content, making it easy to discover via online search and easy to repurpose for new content products.