Choosing the Right Platform for Your Digital Archive

Please note: This blogpost is a summary of a GCDI workshop that I co-led with Filipa Calado on March 17, 2021.  

During my tenure as a Digital Fellow, I have led my Intro to Omeka workshop multiple times. One of the recurring questions in that context has been: Why should I choose Omeka over WordPress to build my digital archive?  This blog post is an attempt to answer that question, with a tweak: When should I choose Omeka over WordPress or a different platform to build my digital archive? Each of the web development tools that we will survey in this post has its place in the digital scholarship toolkit and deciding among them is just a matter of what your project demands and what resources you have available. 

What follows is not meant to be an exhaustive survey of the web platforms you can use to build a digital collection, exhibition, or archive, but rather an introductory guide and an orienting tool for scholars just getting started on their projects. In this regard, before we dive into a comparison among some of the tools that can be used (or repurposed) to build your archive, I want to stress the importance of project planning in building any scholarly digital project. In our case, some of the preliminary questions you might want to reflect on include:

  • What do I mean by “digital archive”? (You can find some inspiring thoughts on this subject here and here)
  • What are my pre-existing digital skills and how much time/resources do I have to invest to develop new ones or familiarize with a new platform? 
  • How important is structured metadata to my project?
  • How do I want to display the items in my collection(s)?
  • What kind of media/files do I want to host in my archive? 
  • What are my priorities in terms of discoverability and accessibility?
  • What are my plans with regards to sustainability?
  • Who is my audience? Who are my interlocutors?

According to your answers, you might find out that accessibility matters to you more than metadata, or that a low-maintenance and more easily sustainable platform is more important to you than using a software with a shallow learning curve. 

All of the platforms covered in this workshop are open source and most of them are polyfunctional Content Management Systems (CMS) that rely on an underlying database management service (such as MySQL) that holds and in which the content of your website is organized. CMS also usually rely on a set of theming files that control the look and feels of the web pages through which the content in your database will be displayed. As such, most of these platforms also require some familiarity with the basic principles of HTML/CSS and web hosting. It’s also important to note that some of these tools can work in combination with each other.

Omeka 

Sample sites: Colored Conventions; Omeka Classic Directory; Omeka Semantic Directory. 

I’m going to start with Omeka, because it’s considered the Gold Standard in many scholarly communities. Omeka is a free CMS and a web publishing system built by and for scholars that is used by hundreds of archives, historical societies, libraries, museums, and individual researchers and teachers to create searchable online databases and scholarly online interpretations of their digital collections. Omeka is developed by the Roy Rosenzweig Center for History and New Media at George Mason University – the same institution that brought valuable tools such as Zotero and Tropy to us. If you have a digital collection of primary sources that you want to publish online in a scholarly way, you’ll want to consider Omeka. Omeka allows you to describe items according to archival standards (using Dublin Core), sort them in collections, import and export descriptive information from other systems, and to create as many interpretive online exhibits as you like from those items.


While Omeka has become the go-to stand-alone lightweight CMS for many personal or class digital archives, it has its use even in large institutions that use very robust and expensive CMSs, because it allows the possibility to import collections from these larger (often expensive) platforms and make them accessible and available on the web. Dublin Core makes linked open data possible and collections more discoverable. In other words, through Dublin Core, Omeka and the databases on which it relies are able to talk to one another.

omeka slide

Unfortunately with easy publishing formats can also come some degrees of inflexibility, especially in terms of web infrastructure and aesthetics – designing a dynamic and… pretty Omeka site will often require the skills of a seasoned web designer. Navigating through a theme’s files can make even fundamental changes (fonts, headers, column widths, the addition of a logo or footer) incredibly frustrating. Displaying and sorting content in ways in which the platform is not natively designed to do may also require some creative thinking

Pros:

Cons:

  • Self hosting can be challenging (e.g., server compatibility issues, occasional bugs, plugins developed by small developers get discontinued, etc.)
  • Small community of practice
  • Styling your website may require advanced coding skills  
  • Requires some maintenance 

 

WordPress

Sample Sites: Open Culture (Archive as a repository); The Bais Yaakov Project (Scholarly archive with structured metadata); The Berkeley Revolution (Ongoing class project that features items with some metadata descriptors and digital interpretative essays/exhibitions).

WordPress is a flexible open-source CMS that powers nearly 40% of all sites on the web. If you have used the Commons for one of your classes (as a student, or as a teacher), you already are a WordPress user! As such, you may know that building websites with it can be a matter of just a few clicks and that the looks and feels of it can be changed by selecting (and then editing) a new theme. Likewise, functionalities can be added by installing and activating plugins. 

WordPress’s native category and tagging system can be used to organize your archive’s information architecture, but structured metadata to describe your items is not immediately available and has to be implemented using plugins or by creating custom fields with some extra coding. Likewise, creating exhibitions, timelines, or maps featuring your items, will also require considerably more effort than it would if you were to use Omeka. One of the advantages of working with a CMS so widely used is that if you are looking for a plugin that adds a functionality to your website, chances are that someone has already developed it (although many of those plugins rely on a freemium business model) and widely tested it, whereas only a limited number of plugins is available for Omeka, and oftentimes, especially for less popular ones, they don’t have much mileage under their belt. As a result, you might run into issues that have yet to be identified or resolved. 

However, one of the disadvantages of working with a platform that heavily relies on plugins to extend its core functionalities is that it will require quite a bit of maintenance: WordPress updates are as frequent as biweekly and, for each update, you have to make sure that the different components (WordPress, its themes, and plugins, as well as any additional customization you might have implemented directly to the code) are compatible with each other. This may not be as labor intensive as my previous sentence made it sound… until something goes wrong with your website and you have to figure out which update caused the issue. Luckily, you can rely on a large community of support and extensive pre-existing resources both to set up and troubleshoot your website. 

If you are looking to create a website for your students to engage class material, write blog posts, and interact with one another, a WordPress course site is often a better and more flexible option. Likewise, if you’re looking to build a highly customizable and aesthetically appealing website that works as a repository or that offers some level of metadata description, WordPress might be a good solution for you. Omeka, by contrast, is best suited for projects that involve a digital (or digitizable) collection with metadata that you want to curate, organize, describe, and publish in the form of a digital exhibition. With a little bit of work, you might even opt to have your main WordPress site where you can provide a link to your Omeka site and even style them in a similar way. This option allows each platform to do what it does best.

Pros:

  • Immediacy of use (if used as a repository)
  • Thousands of plugins 
  • Easily customizable (yes, it’s relatively easy to make it look pretty)
  • Can be fully hosted (wordpress.com or… CUNY Commons!)
  • Largest community of practice (easier to find support and tutorials)

Cons:

  • Requires plugins or programming for structured metadata and exhibitions
  • Hosted version available but not as customizable
  • High maintenance 

 

Drupal 

Sample sites: John Latham Archive; Indian Culture.

Like WordPress, Drupal is also a free and open-source web content management framework written in PHP. Drupal provides a back-end framework for at least 12% of the top 10,000 websites worldwide – ranging from personal blogs to corporate, political, and government sites. Drupal’s native installation also does not allow you to upload and describe the items in your archive out of the box – but it does allow a great (perhaps the greatest, among the platforms we are surveying) degree of flexibility and creativity when it comes to, respectively, describing your items (i.e., its content management functions) and displaying them (i.e., its publishing functions). For instance, as the John Latham Archive shows, Drupal allows you to display items in more than one arrangement, including ways that are highly unconventional and, in this case, mimic the aesthetic statement of the artist to whom the items in the collection belong. As the project’s developer highlighted in an essay documenting his curatorial process:

Drupal makes no assumption about the type of content one needs to publish. There is an unlimited number of content types that could be created in any Drupal installation. Each content type is customizable by adding a set of fields according to a specification. … In using Drupal for online archives, the archivist can follow this strategy for content organization: all items in the archive share a common content type, and the decisions in formats (e.g., video, image, text, etc.) comes from the attached metadata. 

Drawing a comparison between Drupal and Omeka, Quinn Dombrowski has noted that “For [complex projects] where the data doesn’t fit well into the item / exhibit paradigm, or that require a different set of metadata, Omeka lacks the user-configurable customization options to easily support projects outside its defined scope.” Similarly, it allows to link all the items in your collection in a much more systematic way than Omeka does, and without as much twisting and turning. Assuming you’ve set up your platform just right. 

Indeed, while Drupal allows you to develop an extremely customizable infrastructure for your archive, this comes at the cost of investing considerably more time in the planning phase as well as the programming process, before you can populate your archive. While this process can be extremely frustrated for beginners (both Filipa and I can testify to this!), Dombrowski has created a set of tutorials dedicated to humanists looking to learn Drupal. If you are interested in getting started on a Drupal-based digital archive, you also want to look into ArchiveX, a distribution of Drupal developed to store different digital formats of notable archives (e.g., Albums, Exhibits, Music, Library, Videos, Orders, Software, etc.) and the Drupal Group for Cultural Heritage.

Pros:

  • Extremely customizable and extendable 
  • An item can be displayed in multiple ways
  • Fairly large community of practice

Cons:

  • Set up and structural changes to the website can be extremely challenging 
  • Steep learning curve
  • High maintenance

Wax (Jekyll-based) 

Sample Site: Wax

Ok, this solution is a little different from the other platform’s we have looked at, in as far as Jekyll is not a CMS. Wax Exhibits is a digital exhibition generator that makes use of Jekyll to build static websites based on minimal computing principles. Let’s unpack this definition.

Static websites, as opposed to dynamic websites, do not generate the pages that a viewer is browsing upon queries that interrogate the website’s underlying database. In other words, with a CMS, every time you click on an item, or rather, any time your browser is loading a page, you are sending a request to the web server where your website is hosted to generate a page with the content stored in your database that you are seeking to view. On the other hand, in static sites, web pages already exist on the server (or your local computer!) before and regardless of whether a user generates a query. This implies that in order to build and run a Jekyll-based website, you can rely on a much less powerful web server – e.g., all CMS require a LAMP (Linux, Apache, MySQL, PHP/Perl/Python) package to run on your server, whereas Jekyll doesn’t. As such, Wax is a much more lightweight and sustainable option, if compared to the previous platforms. 

Alex Gil, one of its developers, recently shared a document (“Wax Exhibits: Collaborations, Workflows and Best Practices”) on Twitter that includes a chart that maps the difference between Wax and Omeka. wax vs omeka chart
Gil also contributed to a Twitter thread that explains who might benefit from using Wax and in what circumstances. An additional guide to Jekyll can be found here

Pros:

  • Based on minimal computing principles (No LAMP required)
  • Easy to style
  • Low maintenance
  • Good for teaching computation fundamentals 

Cons:

  • Steep learning curve
  • Limited functionalities
  • Small community of practice 

 

I want to conclude this survey by briefly mentioning three more platforms that I encourage you to look into.

  • Collective Access is a highly customizable relational database that “enables powerful searching and browsing options and provides opportunities for nuanced web-based collection discovery.” Collective Access is often used in conjunction with Omeka to make the most of the potentiality for minutious and flexible descriptions of the former, and the publishing functions of the latter. 
  • Mukurtu is a grassroots built platform developed in collaboration with Indigenous communities to manage and share digital cultural heritage in culturally relevant and ethically-minded ways. 
    • Gather (Connecting Aboriginal communities with collections and stories from the State Library of NSW).
  • CollectionBuilder is a lightweight, flexible tool for creating metadata driven digital collection websites. Like Wax, CollectionBuilder also makes use of Jekyll’s static web technology. It mostly relies on three simple components to develop its websites: “a spreadsheet of metadata, a directory of assets, and a configuration file.” CollectionBuilder offers three options: one offered through GitHub hosting, one through traditional web servers and one that works in conjunction with an existing CMS. A version compatible with Mukurtu is in the works.

A final note: if you do find a digital archive that you want to reverse engineer, Wappalyzer is usually a good starting point to understand its underlying informational technology.

What is your experience with these (or other) platforms? What are your concerns? We are eager to hear from you! Join us at our next Digital Archive Research Collective (DARC) meetings on April 12 and May 5 at 12pm. Sign up to the DARC Group to receive updates and be a part of our community of practice. ICYM: DARC also has a Wiki with additional information on platforms and resources available at the GC.