Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLI: better control of output file names #754

Open
DesBw opened this issue Nov 30, 2024 · 2 comments · May be fixed by #763
Open

CLI: better control of output file names #754

DesBw opened this issue Nov 30, 2024 · 2 comments · May be fixed by #763
Labels
enhancement New feature or request

Comments

@DesBw
Copy link

DesBw commented Nov 30, 2024

It would have been great if there is a way to name the output files by the url address where they are extracted.
This is useful to manually check faulty texts.
A feature like this:

trafilatura --input-file links.txt --output-dir converted --name-from-url

The file generated from "www.example.com/page3" would be named "www.example.com_page3.txt.".

@adbar adbar added the enhancement New feature or request label Dec 2, 2024
@AdamQuadmon
Copy link

I want to add 2 cents to this issue:

the use of -o flag it's confusing , in some examples I found -o output-file.txt but the output is a output-file.txt directory with hashed-filename.txt

also consider: using --markdown flag I expected to get a hashed-filename.md file at least

Thanks for this great tool! Hope to see these enancements soon!

@adbar
Copy link
Owner

adbar commented Dec 5, 2024

@AdamQuadmon @DesBw Thanks for your feedback, I am open to review a pull request on this.

@adbar adbar changed the title Name the files from the URL CLI: better control of output file names Dec 5, 2024
AdamQuadmon added a commit to AdamQuadmon/trafilatura that referenced this issue Dec 6, 2024
…m output paths

Introduces two new CLI arguments to allow fine-grained control over how output file paths are generated:

--filename-template: Specify a template string using variables like {domain}, {hash}, {ext} to define a custom directory structure and file naming scheme

--max-length: Set a maximum character limit for generated file paths, intelligently truncating if needed while preserving essential components

Includes documentation updates covering the new options, examples, and troubleshooting.

Closes adbar#754
AdamQuadmon added a commit to AdamQuadmon/trafilatura that referenced this issue Dec 6, 2024
Introduces two new CLI arguments to allow fine-grained control over how output file paths are generated:

--filename-template: Specify a template string using variables like {domain}, {hash}, {ext} to define a custom directory structure and file naming scheme

--max-length: Set a maximum character limit for generated file paths, intelligently truncating if needed while preserving essential components

Includes documentation updates covering the new options, examples, and troubleshooting.

Closes adbar#754
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants