Twitter_Topic_Tangle

Here is another app I put together to practice with a few components / packages that I think could be useful in the future. It incorporates light-touch machine learning and interactive visualization.

Search for a topic. See the associated tweets. Click on topics to see more related tweets, or click on a user to see more of his/her tweets. Notice how everything gets interrelated.

https://twitter_topic_tangle.anvilapp.net/

Some noteable packages utilized are below. Kindly let me know what you think, if you find any bugs, or have any questions on implementation of the below packages.

Packages:
**NLTK (python) - This Natural Language package is used to do named entity recognition: the extraction of specific persons, places, and organizations from text.
**Tweepy (python) - This package is used to interface with Twitter.
**Anvil.Server (python) - The NLTK and Tweepy code is executed via a server call to a virtual machine in the cloud running python on Ubuntu Linux. That being said, there is nothing so computationally intensive you couldn’t just do it in the built-in anvil server module. I wanted the practice setting up and connecting to a cloud server.
**vis.js (JavaScript) - This library was used for the network visualization. It was quite easy to use- I don’t know much JavaScript but looking at a few examples was enough to hit the ground running.

Cheers.

5 Likes

Looks fabulous!

Haven’t got a clue what’s going on, but it’s keeping me amused :slight_smile:

I’m literally just zooming right out and dragging the pink ones about to make a squid like effect.

If nothing else, it’s a great stress reliever (or time waster).

Nice, do you have some links to tutorials you found useful in putting this together with regards to NLTK and Tweepy? How do you derive the topic names? Are they directly taken from the hashtag label or are they derived from the context to the tweets? I suppose you need a paid anvil account in order to use all this libraries in the app, right?

All the fancy stuff (tweepy, NLTK) is on the backend. I think Anvil may limit the packages in the server module to a somewhat limited set of Python 2.7 packages for the free tier. I do think you will ultimately want to go with the paid version for custom domains etc. at deployment time.

That being said, you can still get started with the free version. You can run any version of python on any computer attached to the internet, and use it as your backend via the anvil.server package. It’s absolutely beautiful and one of Anvil’s biggest selling points to me. You can also buy a virtual machine so it is always online and doesn’t fail when your local internet connection craps out. Linode has plans for as little as $5/mo.

A few details are omitted, but here is the pith of the code:

Import packages:

import tweepy
import nltk
import anvil.server

Define functions:

# DEPENANCY- This extract named entities from a NLTK chunked expression
def extract_entity_names(t): 
	entity_names = []
	if hasattr(t, 'label') and t.label:
		if t.label() == 'NE':
			entity_names.append(' '.join([child[0] for child in t]))
		else:
			for child in t:
				entity_names.extend(extract_entity_names(child))
	return entity_names
# this functino extracts a list of named entities from a large string
def named_entities_fromTextBlob(textBlob): 
	all_entities = set([])
	sentences = nltk.sent_tokenize(textBlob)
	tokenized_sentences = [nltk.word_tokenize(sentence) for sentence in sentences]
	tagged_sentences = [nltk.pos_tag(sentence) for sentence in tokenized_sentences]
	chunked_sentences = nltk.ne_chunk_sents(tagged_sentences, binary=True)
	for tree in chunked_sentences:
		for ent in extract_entity_names(tree):
			if (len(ent.split()) < 5) and (len(ent.strip())>2):# very long entities are suspect. so are very short entities
				all_entities.add(ent)
	return list(all_entities) 

# this function cleans up text- it removes words that are a hyperlink or a @tag
def strip_string(string):
	string = string.replace('#',' ').replace(':',' ')
	for word in string.split():
		if (word.startswith('@') or ('http' in word) or ('//' in word) or (word == 'RT')):
			string = string.replace(word,' ')
	string = ' '.join(string.split())
	return string

Connect to Twitter:

ACCESS_TOKEN = ‘hidden’
ACCESS_SECRET = ‘hidden’
CONSUMER_KEY = ‘hidden’
CONSUMER_SECRET = ‘hidden’
auth = tweepy.OAuthHandler(CONSUMER_KEY,CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN,ACCESS_SECRET)
api = tweepy.API(auth)

Define the Anvil Server-side function:

@anvil.server.callable 
def AVS_tweet_entities_inst1(search_string):
	tweet_list = []
	tweets = api.search(search_string,lang='en') # this connects to twitter and searches for results
	for tweet in tweets:
		try: # this should happen seldom, but catch it for stability
			tweet_id = tweet.__dict__['id_str'] # id for the tweet
			txt = strip_string(tweet.text) # the text tweeted
			named_entities = named_entities_fromTextBlob(txt) # extract named entities
			user_name = tweet.user.screen_name # twitter user screen name
			user_descrip = tweet.user.descriptio # the user's description
			location = tweet.__dict__['user'].__dict__['location'] # this will be a name of a place
			# append the result JSON object to the tweet_list
			tweet_JSON = {'id':tweet_id,'txt':txt,'named_entities':named_entities,'location':location,'user_name':user_name,'user_descrip':user_descrip}
			tweet_list.append(tweet_JSON)
		except:
			pass
	# export the results as a JSON dictionary
	server_JSON = {'tweet_list':tweet_list}
	return server_JSON

… and wait for requests…:

anvil.server.connect("uplink key")
print('ALL DONE! You\'re good to go! Waiting for server requests...')
anvil.server.wait_forever()
1 Like

Hey @navigate do you happen to have a working link to this project? I’m trying to mess around with the Twitter API but not having much luck

Unfortunately no, But here are some of the key pieces of code from my archives. Let me know if you get stuck.

import tweepy, os, json
ckey, csecret, atoken, asecret = ["you have to get these four keys from twitter"]
auth = tweepy.auth.OAuthHandler(ckey, csecret)
auth.set_access_token(atoken, asecret)
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)

q = "search phrase"
tweet_list = api.search(q=q, count=100, lang="en",
            include_entities=True, tweet_mode='extended',
            wait_on_rate_limit=True, wait_on_rate_limit_notify=True)

screen_name_or_id = "my_favorite_person_i_follow"
tweets_from_one_user = api.user_timeline(screen_name=screen_name_or_id, tweet_mode='extended', count=count,
                    wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
       
1 Like

Hey @navigate thanks so much! Okay been banging my head against this for a minute here, but keeping coming against this error:

SyntaxError: invalid syntax (streaming.py, line 355)
at /usr/local/lib/python3.7/site-packages/tweepy/streaming.py, line 355

Any idea where I might be going wrong here? I’m on a paid plan and using Full Python 3 in the server. Here is my Server code for reference:

import anvil.server
import anvil.http
import tweepy

ckey = "xxxx"
csecret = "xxxx"
atoken = "xxxx"
asecret = "xxxx"

auth = tweepy.auth.OAuthHandler(ckey, csecret)
auth.set_access_token(atoken, asecret)

api = tweepy.API(auth)


@anvil.server.callable
def test():
  public_tweets = api.home_timeline()
  for tweet in public_tweets:
    print(tweet.text)
  return

I think this may have something to do with this known issue, but I’m not sure how to fix this.

@chesney.l I suspect you are right. I was running it on python 3.6.7 on Linux (ubuntu 18.04 to be precise). Which OS are you running it on? Or is this in a server module?

Just on a server module

Is app still available?
I click the link and get:

We could not find an app that matched your request.

You may have copied the URL for this app incorrectly, or this app may have been deleted or withdrawn. You may also have been logged out, in which case refreshing this page might help.

@aldo.ercolani no @navigate said that it is not, but he provided some helpful code snippets above. Just trying to work through an issue with Tweepy now

1 Like