Twitter-Pandas: like git-pandas, but for twitter.

I’ve got a python library that I’ve posted here before, that people seem to like called git-pandas.  The idea is to provide a pandas-centric interface to the data in a git repository.  To start with, we added simple representations of common datasets (commits, file changes, branches, etc), and as the library grew, we added in specialized processing methods to ease common analyses (cumulative blame, bus-factor, file owners, etc).

This week, I’ve started a very similar project: twitter-pandas.

The goal of twitter-pandas is pretty much the same as the goal of git-pandas: to provide a simple, intuitive way to get data out of twitter and into an easily usable format for data science and data analytics.  Because twitter is a public api and not a privately held database (like git), we have the added responsibility with this library to be responsible API users.  That means not going over rate limits, and not forcing the user to reason about that kind of thing in the first place. To do this, we merge two fantastic libraries: tweepy and pandas.  The library is still in active development, looking for help from anyone that would like to help out, and is targeting a v1.0.0 release in the next month or so.

The interface to twitter-pandas is very simple, and reminiscent of the interface to tweepy:

from twitterpandas import TwitterPandas

# create a twitter pandas client object
tp = TwitterPandas(
    TWITTER_OAUTH_TOKEN,
    TWITTER_OAUTH_SECRET,
    TWITTER_CONSUMER_KEY,
    TWITTER_CONSUMER_SECRET
)

# create a dataframe with 10 of my own followers
df = tp.followers(limit=10)
print(df.head())

# create a dataframe with my own information
df = tp.me()
print(df)

# get a dataframe with the information of user willmcginnis
df = tp.get_user(screen_name='willmcginnis')
print(df)

# get back 10 users who match the query willmcginnis
df = tp.search_users(query='willmcginnis', limit=10)
print(df)

The different methods and API endpoints in tweepy are broken into a few categories:

  • user methods
  • timeline methods
  • status methods
  • direct message methods
  • friendship methods
  • account methods
  • favorite methods
  • block methods
  • saved search methods
  • help methods
  • list methods
  • trend methods
  • geo methods

So far I’ve implemented TwitterPandas methods for only the user methods.  Over the next few weeks the plan is to work through the methods in the remaining groups that return datasets (TwitterPandas is intended to be read-only at this stage).  The target for the version 1 release is full implementation of all of these methods including tests and documentation.

In version 2, the plan is to add in the higher level analysis methods on top of these building blocks, with functionality like:

  • People who I follower, but don’t follow me back (and vice versa)
  • Top users of a hashtag (by different metrics)
  • Top followers of mine (by different metrics)
  • Follower growth charts
  • Any other useful features we think up along the way

The code is up on github, so check it out, open issues with suggestions, or if you’d like to help implement some of the methods mentioned above.

https://github.com/wdm0006/twitter-pandas

The post Twitter-Pandas: like git-pandas, but for twitter. appeared first on Will’s Noise.