Why there are no descriptions? #1109

agzam · 2023-03-07T22:04:22Z

Can descriptions be added? Otherwise, this collection of a bunch of URLs (albeit alphabetized) has little use. Maybe a script that goes through them and retrieves document.title would do?

The text was updated successfully, but these errors were encountered:

sysadmin-info · 2023-12-01T15:27:54Z

Can descriptions be added? Otherwise, this collection of a bunch of URLs (albeit alphabetized) has little use. Maybe a script that goes through them and retrieves document.title would do?

I agree. For technologies is fine as it is, for companies also, but for individuals categories at least one should be added. I do not know 100% of them and checking each site one by one is a nightmare. I am not sure is it legal. Something like this should do the job.

import requests
from bs4 import BeautifulSoup
from transformers import pipeline

# Initialize a summarization pipeline
summarizer = pipeline("summarization")

def crawl_and_summarize(url):
    # Crawl the webpage
    response = requests.get(url)
    soup = BeautifulSoup(response.content, 'html.parser')

    # Extract main content, could be more specific based on site structure
    text = soup.get_text()

    # Summarize the text
    summary = summarizer(text, max_length=130, min_length=30, do_sample=False)
    return summary[0]['summary_text']

def read_urls_from_file(file_path):
    with open(file_path, 'r') as file:
        return [line.strip() for line in file if line.strip()]

# File containing URLs, one per line
file_path = 'urls.txt'

# Read URLs from the file
urls = read_urls_from_file(file_path)

# Crawl and summarize each URL
for url in urls:
    try:
        summary = crawl_and_summarize(url)
        print(f"URL: {url}\nSummary: {summary}\n")
    except Exception as e:
        print(f"Error processing {url}: {e}")

In this script:

URLs are read from a file named 'urls.txt', but you can change the file_path variable to the actual path of your file.
The script reads each line from the file, strips any leading/trailing whitespace, and ignores empty lines.
Error handling is added to continue processing even if an error occurs with a specific URL.

Remember to place the 'urls.txt' file in the same directory as your script, or provide the absolute path to the file. Also, ensure that each URL in the file is on a new line.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why there are no descriptions? #1109

Why there are no descriptions? #1109

agzam commented Mar 7, 2023

sysadmin-info commented Dec 1, 2023 •

edited

Loading

Why there are no descriptions? #1109

Why there are no descriptions? #1109

Comments

agzam commented Mar 7, 2023

sysadmin-info commented Dec 1, 2023 • edited Loading

sysadmin-info commented Dec 1, 2023 •

edited

Loading