Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example input files from vectors to t-SNE #1

Open
avilella opened this issue Apr 24, 2024 · 3 comments
Open

Example input files from vectors to t-SNE #1

avilella opened this issue Apr 24, 2024 · 3 comments

Comments

@avilella
Copy link

Hi all,

I've been searching github for ways to classify documents (E.g. Word or Excel documents in a corpus) and found the paper on Q-SNE interesting and the code self-contained and easy to deploy.

Would it be possible to have some example input files in this repo and a brief description of the steps to go from them to a t-SNE coordinates output file?

The examples in the paper either were pre-processed (TF-IDF) from where they were downloaded, or I couldn't find them to download myself.

I am assuming a combination of csv2dmat and tsne should be enough, not sure if a similar approach would be wrapped all in one step in the testapq binary.

My plan is to do:

./csv2dmat -i e.test.vec -d e.dmat
./tsne -i e.dmat

But I am unsure what the .vec file format should be.

Thanks in advance.

@sfingram
Copy link
Owner

sfingram commented May 2, 2024

Hi @avilella , I unfortunately don't have anymore time to devote this project (an academic project from several years back), but the examples are located at this project page

You will have to copy the links and replace the http with https ... (I can no longer edit that page to correct it)

Edit (here are the corrected links for your convenience):

warlogs TFIDF term-vectors. → (BZip2 archive 1MB)
metacombine TFIDF term-vectors. → (BZip2 archive 871KB)
textfiles TFIDF term-vectors. → (BZip2 archive 149MB)
cables TFIDF term-vectors. → (BZip2 archive 318MB)
nytimes TFIDF term-vectors. → (BZip2 archive 524MB)
pubmed TFIDF term-vectors. → (BZip2 archive 4GB)

@avilella
Copy link
Author

avilella commented May 2, 2024 via email

@avilella
Copy link
Author

avilella commented May 2, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants