Text Data Mining

This guide describes some of the key resources the libary has avaiable for text data mining.

TDM Studio

A text and data mining solution for research at all levels and all disciplines.  ProQuest's TDM Studio is a cloud based product that allows text and data mining for content that the library has licensed from ProQuest Inc. (newspapers, scholarly articles, dissertations & theses, government databases)   TDM Studio comes in two flavors called dashboards.  Each dashboard offers access geared to individual user needs and coding skills.    

A"TDM Studio Visualizations" Dashboard

  • No coding skills required
  • Currently offers visualizations via Geographic AnalysisTopic ModelingSentiment Analysis tools.
  • Content available for use in the Visualizations dashboard is limited to the a select group of major newspapers found in ProQuest databases ( New York TimesWashington Post, Wall Street Journal, Chicago Tribune, Los Angeles Times, Globe and Mail (Torento), Guardian (London), South China Morning PostSydney Morning Herald, Times of India)
  • Each visualizations dataset can analyse up up to 10,000 documents. 
  • The Visualizations dashboard is a growing product with more analysis tools and dataset content offered with each new release.

Access to the Visualizations Dashboard:
Use this library link to sign-on and create an account for the Visualizations Dashboard  https://databases.library.jhu.edu/databases/proxy/JHU07287


B.  “TDM Studio Workbench” Dashboard

  • Requires coding skills using "R" or Python and incorporates Jupyter notebooks.
  • Project set-up is for an individual researcher or small groups.
  • Workbench allows the widest range of analysis options.
  • A workbench can have up to 10 datasets and each dataset can have up to two million documents.
  • All analysis is performed within TDM studio.
  • Analysis results and scripts created in TDM Studio can be downloaded.  Content, full-text of articles used in analysis, cannot be downloaded as source publishers retain copyright ownership.
  • Workbench includes content from ALL the ProQuest databases licensed by the library (newspapers, scholarly articles, dissertations & theses, government databases)  encompassing hundreds of datasets.  The three most popular news datasets are the New York Times (1923 - present), Washington Post (1877 - present), Wall Street Journal, (1923 - present).   

Access to the Workbench Dashboard:
If you have experience with "R" or Python TDM Studio Workbench might be the right option for you.  To set up your account complete the registration form at this link  https://bit.ly/3olVWwT  or contact Jim Gillispie, Social Science Librarian at jeg@jhu.edu for more information.

For more descriptions, webinars and videos regarding TDM Studio capabilities, see this ProQuest LibGuide  https://proquest.libguides.com/tdmstudio