Skip to Main Content
Research Guides

Text and Data Mining Guide: API

Step-by-step guide on how to get started with your text mining project along with examples of past text mining projects from UW researchers and students.

Introduction to Text Mining APIs

In text mining projects, APIs (Application Programming Interfaces) play a crucial role by providing access to vast repositories of textual data and powerful tools for analyzing, processing, and extracting insights from this data. APIs serve as bridges between software applications, allowing developers to interact with external services and databases seamlessly. By leveraging APIs, your text mining projects can harness the collective knowledge and resources available on the internet, including social media platforms, news outlets, academic databases, and more.

How to select an API?

  1. Identify all potential APIs
  2. Identify the amount of data you need to collect and identify the tier of access you’d need for each API in your list
  3. Identify the amount of money you’d need to invest (if you need large amount of data) to use the API
  4. Pick the API that matches your budget and data needs the best

List of APIs

Description: Provides access to metadata and article abstracts for the e-prints hosted on arXiv.org.

Free/Paid? Free; No key required

Limitations: None

Help Contact: arXiv Help

More Information: arXiv Homepage

Description: Provides access to ADS database of bibliographic data on astronomy and physics publications

Free/Paid? Free; Key required

Limitations: Rate limits apply

Help Contact: adshelp@cfa.harvard.edu

More Information: https://github.com/adsabs/adsabs-dev-api

Description: Provides access both to metadata and full-text content for the 260,000 open access journals published on BioMed Central.

Free/Paid? Free; Key required

Limitations: None

Help Contact: info@biomedcentral.com

More Information: BioMed API Info Page

Description: Access to information about historic newspapers and select digitized newspaper pages.

Free/Paid? Free; No key required

Limitations: None

Help Contact: ndi@loc.gov

More Information: Chronicling America API Page

Description: Provides access to metadata records with CrossRef DOIs, covering about 75 million scholarly works from around 5000 publishers.

Free/Paid? Free; No key required

Limitations: None

Help Contact: tdm@crossref.org

More Information: CrossRef Documentation Page

Description: Provides metadata on items and collections indexed by the DPLA. Also includes partner data from Harvard, New York Public Library, ARTstor, and others.

Free/Paid? Free; Key required

Limitations: None

Help Contact: codex@dp.la ; Troubleshooting & FAQ

More Information: DPLA API Basics

Description: Provides bibliographic and rights information for items in the HathiTrust Digital Library. Please note that this API is not intended for bulk-retrieval of records.

Free/Paid? Free; no key required

Limitations: None; Permission must be sought for bulk retrieval

Help Contact: feedback@issues.hathitrust.org

More Information: HathiTrust Biographic API Page

Description: Provides access to HathiTrust and Google digitized texts of public domain works. Volumes digitized by Google will require agreement with Google.

Free/Paid? Free; Key required

Limitations: No specific limits, however please see their policies on data use

Help Contact: feedback@issues.hathitrust.org

More Information: Requesting and Using Research Datasets

Description: Not a true API, but provides access to content on JSTOR for research and teaching.

Free/Paid? Free; Requires MyJSTOR account registration

Limitations: Max 25,000 documents per dataset; users can get larger datasets by special request

Help Contact: Data for Research help

Description: Multiple APIs available to download bibliographic data and search Library of Congress digital collections, including images, public radio and television, and historic newspapers

Free/Paid? Free; Most APIs do not require key

Limitations: Varies

Help Contact: ndi@loc.gov

More Information: Data for Exploration

Description: Provides access to the Metadata and more than 460,000 open access full-text documents from Springer Nature.

Free/Paid? Free; Varied access requirement

Limitations: No specific limits, however downloads should be limited to “reasonable rates”
Springer Nature TDM Policy

Help Contact: tdm@springernature.com

More Information: Springer API Portal

Description: NLM offers 29 separate APIs for accessing a wide variety of content from various NLM databases.

Free/Paid? Free; Varied access requirement

Limitations: Varies

Help Contact: Varies

More Information: https://eresources.nlm.nih.gov/nlm_eresources/

Description: Several public APIs to access many databases and tools including PubMed, PMC, Gene, Nuccore and Protein.

Free/Paid? Free; Most APIs do not require key

Limitations: Varies

Help Contact: NCBI Help Manual

More Information: NCBI API Site

Description: Provides access to a selection of top used OECD datasets.

Free/Paid? Free; no key required

Limitations: Max 1,000,000 results per query, max URL length of 1,000 characters

Help Contact: OECD.Stats help

More Information: OECD data for developers

Description: Downloadable datasets for citations drawn from two large academic graphs: Microsoft Academic Graph and AMiner. (Not an API)

Free/Paid? Free; no key required

Limitations: None

Help Contact: None

More Information: Open Academic Graph

Description: Queries and searches the ORCID researcher identifier system and obtain researcher profile data

Free/Paid? Free; ORCID ID Account required

Limitations: Two options: 1) Users can access the free Public API, which only returns data marked as “public”; 2) Become an ORCID member to receive API credentials: see here

Help Contact: ORCID API FAQ

More Information: API Tutorial: Read Data on a Record s

Description: Oxford University Press grants research access to the Corpus for academic projects that can demonstrate a strong practical need for this data.

Free/Paid? Free; Key required. Academic researchers can request free access

Limitations: 3,000 request per month and 60 calls per minute with free option, other options available

Help Contact: API FAQ
Oxford Dictionaries Contact
API Forum

More Information: Oxford Dictionaries API

Description: Retrieves article-level metrics (including usage statistics, citation counts, and social networking activity) for articles published in PLOS journals and articles added to PLOS Hubs: Biodiversity.

Free/Paid? Free; Key required

Limitations: Results limited to batches of 50 at a time
Contact api@plos.org for high-volume use requests

Help Contact: api@plos.org
Questions can also be posted in Oxford Dictionaries Contact
PLoS API Google Group

More Information: PLOS Homepage

Description: Allows PLoS content to be queried for integration into web, desktop, or mobile applications

Free/Paid? Free; Key required

Limitations: Max is 7200 requests a day, 300 per hour, 10 per minute; users should wait 5 seconds for each query to return results; requests should not return more than 100 rows; high-volume users should contact api@plos.org; API users are limited to no more than five concurrent connections from a single IP address

Help Contact: api@plos.org
Questions can also be posted in Oxford Dictionaries Contact
PLoS API Google Group

More Information: PLOS API FAQ

Description: Provide access to World Bank statistical databases, indicators, projects, and loans, credits, financial statements and other data related to financial operations

Free/Paid? Free; no key required

Limitations: Request volume limits are unspecified, but should be “reasonable”

Help Contact: data@worldbankgroup.org

More Information: Developer Information

Description: Provides metadata and DOIs for IEEE Xplore articles.

Free/Paid? Cost negotiated per request

Limitations: Key required; Must subscribe to or be a member of an institution that subscribes to IEEE Xplore (UW subscribes)

Help Contact: onlinesupport@ieee.org

More Information: Getting Started with IEEE Xplore

Description: Bibliographic search service for displaying STAT!Ref results on a website.

Free/Paid? Free (with subscription)

Limitations: Free to register for users at a subscribing institution

Help Contact: onlinesupport@ieee.org

More Information: support@statref.com

Description: The Twitter API provides the tools you need to contribute to, engage with, and analyze the conversation happening on Twitter.

Free/Paid? Academic Research product track provides free access to the full history of public conversation

Limitations: Limitation might apply if using other tier of access. The academic research product allows for scrape with a higher monthly Tweet volume cap of 10 million

Help Contact: Twitter Help Topics

More Information: Twitter API v2

Description: Allows text- and data-mining access to content in the Wiley Online Library

Free/Paid? Free (with subscription)

Limitations: Must be part of a subscribing institution to have full text access. Users will encounter a click-through agreement and will receive a Client API Token, which is needed when requesting full text of articles

Help Contact: TDM@wiley.com
Wiley TDM help page

More Information: Wiley Text and Data Mining Agreement