Skip to Main Content
Research Guides

Text and Data Mining Guide: Text Mining Tools

Step-by-step guide on how to get started with your text mining project along with examples of past text mining projects from UW researchers and students.

Tools for Web Scraping

  • Programming based 
    • Python  - Scrapy, BeautifulSoup
    • Selenium
    • R - rvest, RCrawler
  • Software
    • Parse Hub
    • Dexi.io
    • Scraping-bot.io

Tools for Text Cleaning

  • TextClean - Collection of open-source tools for cleaning & normalizing text documents in R
  • OpenRefine - Open-Source data cleansing tool by Google
  • Trifacta Wrangler - Free tool for data preparation

Tools for Text Analytics

  Text Processing Named Entity Recognizer Document Classifier Sentiment Analysis Topic Modeling Text Classification
Apache OpenNLP Yes Yes Yes No No No
RapidMiner Yes  No No Yes No No
NTLK Yes  Yes  No Yes  No Yes 
SpaCy Yes  Yes  No Yes  No No
Gensim Yes  No No No Yes  Yes 
Rosetta Text Analytics Yes  No Yes  Yes  No Yes 
Word Stat Yes  Yes  Yes  No Yes  No
  • Rosette Text Analytics - Suite of interoperable components for text analytics
  • WordStat - Advanced Content Analysis
  • Apache OpenNLP - Document Categorizer and more
  • Natural Language Toolkit - Industrial strength NLP libraries in Python

Topic Modeling

  • MALLET
  • Gensim

Sentiment Analysis

  • SentiWordNet
  • RapidMiner
  • WordFish

Text Classification

  • NLTK
  • Scikit-Learn
  • WordNet