Text Data Mining
- Sheridan Libraries
- Guides
- Text Data Mining
- TDM Studio
TDM Studio
A text and data mining solution for research at all levels and all disciplines. ProQuest's TDM Studio is a cloud based product that allows text and data mining for content that the library has licensed from ProQuest Inc. (newspapers, scholarly articles, dissertations & theses, government databases) TDM Studio comes in two flavors called dashboards. Each dashboard offers access geared to individual user needs and coding skills.
A. "TDM Studio Visualizations" Dashboard
- No coding skills required
- Currently offers visualizations via Geographic Analysis, Topic Modeling, Sentiment Analysis tools.
- Content available for use in the Visualizations dashboard is limited to the a select group of major newspapers found in ProQuest databases ( New York Times, Washington Post, Wall Street Journal, Chicago Tribune, Los Angeles Times, Globe and Mail (Torento), Guardian (London), South China Morning Post, Sydney Morning Herald, Times of India)
- Each visualizations dataset can analyse up up to 10,000 documents.
- The Visualizations dashboard is a growing product with more analysis tools and dataset content offered with each new release.
Access to the Visualizations Dashboard:
Use this library link to sign-on and create an account for the Visualizations Dashboard https://databases.library.jhu.edu/databases/proxy/JHU07287
B. “TDM Studio Workbench” Dashboard
- Requires coding skills using "R" or Python and incorporates Jupyter notebooks.
- Project set-up is for an individual researcher or small groups.
- Workbench allows the widest range of analysis options.
- A workbench can have up to 10 datasets and each dataset can have up to two million documents.
- All analysis is performed within TDM studio.
- Analysis results and scripts created in TDM Studio can be downloaded. Content, full-text of articles used in analysis, cannot be downloaded as source publishers retain copyright ownership.
- Workbench includes content from ALL the ProQuest databases licensed by the library (newspapers, scholarly articles, dissertations & theses, government databases) encompassing hundreds of datasets. The three most popular news datasets are the New York Times (1923 - present), Washington Post (1877 - present), Wall Street Journal, (1923 - present).
Access to the Workbench Dashboard:
If you have experience with "R" or Python TDM Studio Workbench might be the right option for you. To set up your account complete the registration form at this link https://bit.ly/3olVWwT or contact Jim Gillispie, Social Science Librarian at jeg@jhu.edu for more information.
For more descriptions, webinars and videos regarding TDM Studio capabilities, see this ProQuest LibGuide https://proquest.libguides.com/tdmstudio