The Dutch National Centre of Expertise and Repository for Research Data (DANS) data archive collection contains datasets in the fields of humanities, archaeology, geospatial sciences and behavioural and social sciences.
Humanitiesdata.com seeks to help collect and disseminate information about publicly available data of particular interest to digital humanities and humanities computing. Humanitiesdata.com collects exclusively open datasets.
The Journal of Open Humanities Data (JOHD) features peer reviewed publications describing humanities data or techniques with high potential for reuse. The journal currently publishes two types of papers: short data papers that contain a concise description of a humanities research object with high reuse potential and full length research papers discuss and illustrate methods, challenges, and limitations in the creation, collection, management, access, processing, or analysis of data in humanities research, including standards and formats.
EEBO-TCP is a partnership with ProQuest and with more than 150 libraries to generate highly accurate, fully-searchable, SGML/XML-encoded texts corresponding to books from the Early English Books Online (EEBO) Database.
This dataset contains metadata as well as data regarding geographic locations mentioned in works of fiction from 1701-2011 found in the HathiTrust Digital Library. The dataset comes in three versions: volumemeta, recordmeta, and titlemeta. The dataset contains over 30 columns of data for each volume row. Data in the dataset includes geographic location as it appears in the volume, number of times the location is mentioned in the volume, as well as the latitude and longitude for the location.
The HTRC Extracted Features Dataset v.2.0 is composed of page-level features for 17.1 million volumes in the HathiTrust Digital Library. This version contains non-consumptive features for both public-domain and in-copyright books. Features include part-of-speech tagged term token counts, header/footer identification, marginal character counts, and much more.
Genre-specific wordcounts for 178,381 volumes from the HathiTrust Digital Library. This dataset contains the word frequencies for all English-language volumes of fiction, drama, and poetry in the HathiTrust Digital Library from 1700 to 1922. Word counts are aggregated at the volume level, but include only pages tagged as belonging to the relevant literary genre.
The World-Historical Dataverse is published by the World History Center at the University of Pittsburgh. It is intended to contribute to the development and exchange of datasets relevant to world-historical documentation and analysis. Modern Data Bank, Japan Historical Statistics, and Slave movements in the 18th and 19th century, amongst many others.