Bluesky labeler starter code

You'll find the starter code for Assignment 3 in this repository. More detailed instructions can be found in the assignment spec.

The Python ATProto SDK

To build your labeler, you'll be using the AT Protocol SDK, which is documented here.

Automated labeler

The bulk of your Part I implementation will be in automated_labeler.py. You are welcome to modify this implementation as you wish. However, you must preserve the signatures of the __init__ and moderate_post functions, otherwise the testing/grading script will not work. You may also use the functions defined in label.py. You can import them like so:

from .label import post_from_url

For Part II, you will create a file called policy_proposal_labeler.py for your implementation. You are welcome to create additional files as you see fit.

Input files

For Part I, your labeler will have as input lists of T&S words/domains, news domains, and a list of dog pictures. These inputs can be found in the labeler-inputs directory. For testing, we have CSV files where the rows consist of URLs paired with the expected labeler output. These can be found under the test-data directory.

Testing

We provide a testing harness in test-labeler.py. To test your labeler on the input posts for dog pictures, you can run the following command and expect to see the following output:

% python test_labeler.py labeler-inputs test-data/input-posts-dogs.csv
The labeler produced 20 correct labels assignments out of 20
Overall ratio of correct label assignments 1.0

Policy Proposal Labeler

A machine learning-based tool for detecting cryptocurrency-related scams on Bluesky social network. The tool uses a combination of text analysis, image processing, and pattern matching to identify potential scam posts.

Features

Text Analysis: Uses TF-IDF and pattern matching to detect scam-related content
Image Processing: Extracts text from images using OpenAI's Vision API
Real-time Processing: Analyzes posts as they are fetched from Bluesky
Performance Monitoring: Tracks processing time, memory usage, and network data
Detailed Reporting: Provides comprehensive metrics and classification results

Setup

Create a .env file in the project root with your credentials:

BSKY_USER=your_username
BSKY_PASS=your_password
OPENAI_API_KEY=your_openai_api_key

Install required dependencies:

pip install -r requirements.txt

Usage

python policy_proposal_labeler.py

Configuration

The script can be configured by modifying these variables in policy_proposal_labeler.py:

# Search Configuration
QUERY_LIST = [
    "crypto giveaway", "bitcoin giveaway", "free crypto",
    "ethereum airdrop", "crypto winners"
]
LIMIT = 20  # posts per query

# Model Configuration
DO_TRAIN = True  # Set to False to use existing model
DO_SCAN = True   # Set to False to skip scanning

Output Files

Model File: crypto_model.joblib - The trained model
Scan Results: scan_results.csv - Contains analyzed posts with classification results
Performance Metrics: performance_metrics.json - Contains timing and resource usage data
Evaluation Results: evaluation.json - Contains model evaluation metrics

Real-time Processing

The script provides real-time updates during processing:

Processing post 1/15
Author: example.bsky.social
Text length: 150 chars
Found 2 images
Extracting text from images...
Processing image 1/2
Image text extracted successfully
Classifying post...
Classification: SCAM (confidence: 0.85)
Post processed in 1.23s
Current memory usage: 120.45 MB

Model Details

The classification model uses:

TF-IDF vectorization for text features
Pattern matching for known scam indicators
Logistic Regression with class weights
Cross-validation for parameter tuning

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
labeler-inputs		labeler-inputs
pylabel		pylabel
test-data		test-data
.gitignore		.gitignore
README.md		README.md
bluesky_labeler_debug.log		bluesky_labeler_debug.log
crypto_posts_dataset.csv		crypto_posts_dataset.csv
get_crypto_posts.py		get_crypto_posts.py
get_post_test.py		get_post_test.py
policy_proposal_labeler.py		policy_proposal_labeler.py
posts.csv		posts.csv
requirement.txt		requirement.txt
requirements.txt		requirements.txt
test_labeler.py		test_labeler.py
video_slides.txt		video_slides.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bluesky labeler starter code

The Python ATProto SDK

Automated labeler

Input files

Testing

Policy Proposal Labeler

Features

Setup

Usage

Configuration

Output Files

Real-time Processing

Model Details

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Bluesky labeler starter code

The Python ATProto SDK

Automated labeler

Input files

Testing

Policy Proposal Labeler

Features

Setup

Usage

Configuration

Output Files

Real-time Processing

Model Details

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages