Text and Data Mining Guide: Text Analytics & Visualization

Step-by-step guide on how to get started with your text mining project along with examples of past text mining projects from UW researchers and students.

What is Text Analytics?

Colossal amount of unstructured text data is generated every year - thanks to the rise of social media platforms and easy accessibility to internet. Thus, generating insights or structuring this data to drive business, projects, research is what constitute text analytics. Text Analytics can be as simple as identifying trends in social media poll to gauge customer satisfaction to identifying sentiment of tweets on large scale to decide which company to invest in.

There are multiple tools now available which extract and provide insights from text data, alternatively you can do it yourself by using the now state-of-the-art packages and libraries coming up due to the advances in Natural Language Processing.

Text Visualization

Data is best consumed when visualized. It is the art of turning bulky tables into elegant, insightful visualizations which can capture the essence of analysis and effectively communicate the impact the analysis can have. There are several text visualization forms one can use apart from the standard frequency chart, distribution charts etc.

Word Cloud using KeyWords
Word Tree
Word Counts
Document Term Matrix
Frequency of Word Within Topic

Natural Language Processing

Natural Language Processing has recently became one of the most talked about sub-field of Machine Learning. It also for the use of text data to due various tasks such as

Language detection & translation
Question - Answering e.g chatbots
Text Summarization
Text Classification such as Spam Detection
Named Entity Recognition

These techniques can be achieved using supervised, unsupervised and deep learning algorithms.