Database Reference
In-Depth Information
5.4
Visualizing Textual Information
Text is an integral part of Twitter. Here, we describe two approaches to visualize text.
5.4.1
Word Clouds
Word clouds highlight important words in the text. Typically, the frequency of
a word is used as a measure of its importance. Word clouds are an effective
summarizing technique. In word clouds, importance of a word is highlighted using
its font size. The language used on Twitter is multilingual and mostly informal.
Punctuations and correctness of grammar are often sacrificed to gain additional
characters. Abbreviations are also frequently employed. To generate a word cloud,
first we remove these elements and break the text into tokens. Then the frequency
of each token is counted in the text using the method GetTopKeywords , which is
summarized in Listing 5.10 .
Listing 5.10
Extracting word frequencies from Tweets
public JSONArray GetTopKeywords(String inFilename, int K,
boolean ignoreHashtags, boolean ignoreUsernames, TextUtils
tu) {
//Read each JSONObject in the file and process the Tweet
...
/ ** Step 1: Tokenize Tweets into individual words. and
count their frequency in the corpus
* Remove stop words and special characters. Ignore
user names and hashtags if the user chooses to.
* /
HashMap<String,Integer> tokens = tu.TokenizeText(text,
ignoreHashtags,ignoreUsernames);
Set<String> keys = tokens.keySet();
for(String key:keys) {
if(words.containsKey(key)) {
words.put(key, words.get(key)+tokens.get
(key));
}
else {
words.put(key, tokens.get(key));
}
}
...
// Step 2: Sort the words in descending order of
frequency
Set<String> keys = words.keySet();
ArrayList<Tags> tags = new ArrayList<Tags>();
for(String key:keys) {
Tags tag = new Tags();
tag.setKey(key);
 
Search WWH ::




Custom Search