Skip to content
This repository was archived by the owner on Mar 6, 2019. It is now read-only.

Arbitrary Sets

JP de Vooght edited this page Jun 24, 2016 · 3 revisions

Introduction

Sometimes you want to look at an arbitrary set of tweeps. This set can come from a list of people or organizations you picked up from a book. In my case, I had to relate a list of companies for a MOOC assignment. Twecoll can help you get a quick picture of how the nodes in this arbitray set relate.

Prepare the file

Twecoll is able to fetch tweets from a handle or for a search query. It stores them in a .twt file. We'll use this to create manually a list of tweeps I am interested in. Using my editor, I create a file with the extension .twt which contains the following text.

@nike
@pfizer
@hugoboss
@ikea
@swatch
@pampers
@redbull
@samsungmobile
@total
@kodak
@caterpillarinc
@nutellausa
@loreal
@lacoste
@versace
@nestle
@proctergamble
@dior
@burgerking
@disney
@google
@toyota
@lenovo
@toyota
@lenovo
@cocacola
@nikon
@carlsberg
@tatacompanies
@thenorthface
@michelintyres
@wengerbrand
@arcelormittal
@levis
@ibm
@philipspr
@emirates
@bmw
@monsantoco
@ebay
@abercrombie

In my case, I put one handle per line but a .twt file can contain handles in any structure really. This will is handy when you insert handles in your notes and save the file as brabandere.twt for processing with twecoll.

Process the file

In order to process the file with twecoll, we proceed in two steps: first build the .dat file and fetch the second-degree relations using fetch as follows.

$ twecoll init -q brabandere

Note I omit the .twt extension. followed by

$ twecoll fetch brabandere

If you notice too many skipped handles, you can always re-run fetch passing a higher count like so

$ twecoll -c 50000 brabandere

Now we have generated the brabandere.dat file and a set of .f files in the fdat directory. We can now proceed to generating the GML file.

$ twecoll edgelist -s brabandere

This will generate brabandere.gml which can be processed with Gephi for example. It will also output basic stats as well as generate a simple visualization provided igraph is installed.

Visualize the result

I ran the edgelist command with the -s switch which requests to identify strongly connected components. Omitting this will yield a graph with many colored nodes weakly inter-connected.

edgelist

Clone this wiki locally