-
Notifications
You must be signed in to change notification settings - Fork 60
Arbitrary Sets
Sometimes you want to look at an arbitrary set of tweeps. This set can come from a list of people or organizations you picked up from a book. In my case, I had to relate a list of companies for a MOOC assignment. Twecoll can help you get a quick picture of how the nodes in this arbitray set relate.
Twecoll is able to fetch tweets from a handle or for a search query. It stores them in a .twt file. We'll use this to create manually a list of tweeps I am interested in. Using my editor, I create a file with the extension .twt which contains the following text.
@nike
@pfizer
@hugoboss
@ikea
@swatch
@pampers
@redbull
@samsungmobile
@total
@kodak
@caterpillarinc
@nutellausa
@loreal
@lacoste
@versace
@nestle
@proctergamble
@dior
@burgerking
@disney
@google
@toyota
@lenovo
@toyota
@lenovo
@cocacola
@nikon
@carlsberg
@tatacompanies
@thenorthface
@michelintyres
@wengerbrand
@arcelormittal
@levis
@ibm
@philipspr
@emirates
@bmw
@monsantoco
@ebay
@abercrombie
In my case, I put one handle per line but a .twt file can contain handles in any structure really. This will is handy when you insert handles in your notes and save the file as brabandere.twt for processing with twecoll.
In order to process the file with twecoll, we proceed in two steps: first build the .dat file and fetch the second-degree relations using fetch as follows.
$ twecoll init -q brabandere
Note I omit the .twt extension. followed by
$ twecoll fetch brabandere
If you notice too many skipped handles, you can always re-run fetch passing a higher count like so
$ twecoll -c 50000 brabandere
Now we have generated the brabandere.dat file and a set of .f files in the fdat directory. We can now proceed to generating the GML file.
$ twecoll edgelist -s brabandere
This will generate brabandere.gml which can be processed with Gephi for example. It will also output basic stats as well as generate a simple visualization provided igraph is installed.
I ran the edgelist command with the -s switch which requests to identify strongly connected components. Omitting this will yield a graph with many colored nodes weakly inter-connected.
