Skip to Main Content
Research Guides

Text and Data Mining Guide: Text Mining Tools

Step-by-step guide on how to get started with your text mining project along with examples of past text mining projects from UW researchers and students.

Tools for Web Scraping

  • Programming based 
    • Python  - Scrapy, BeautifulSoup
    • Selenium
    • R - rvest, RCrawler
  • Software
    • Parse Hub
    • Dexi.io
    • Scraping-bot.io

Tools for Text Cleaning

  • TextClean - Collection of open-source tools for cleaning & normalizing text documents in R
  • OpenRefine - Open-Source data cleansing tool by Google
  • Trifacta Wrangler - Free tool for data preparation

Tools for Text Analytics

  Text Processing Named Entity Recognizer Document Classifier Sentiment Analysis Topic Modeling Text Classification

Apache OpenNLP:

Apache OpenNLP is a machine learning based toolkit for the processing of natural language text.

Yes Yes Yes No No No

RapidMiner:

RapidMiner enables organizations to uncover insights from data and use data analytics  AI techniques solutions.

Yes  No No Yes No No

NTLK:

The Natural Language Toolkit is a suite of libraries and programs for symbolic and statistical natural language processing for English

Yes  Yes  No Yes  No Yes 

SpaCy:

spaCy is a free open-source library for Natural Language Processing in Python

Yes  Yes  No Yes  No No

Gensim:

Gensim is an open-source library for unsupervised topic modeling, document indexing, retrieval by similarity, and other natural language processing functionalities, using modern statistical machine learning

Yes  No No No Yes  Yes 

Word Stat:

WordStat is a content analysis and text mining software. 

Yes  Yes  Yes  No Yes  No

Topic Modeling

  • MALLET
  • Gensim

Sentiment Analysis

  • SentiWordNet
  • RapidMiner
  • WordFish

Text Classification

  • NLTK
  • Scikit-Learn
  • WordNet