Skip to content

Commit

Permalink
update changelog and corresponding doc pages
Browse files Browse the repository at this point in the history
  • Loading branch information
adbar committed Nov 19, 2024
1 parent c1aa319 commit b528514
Show file tree
Hide file tree
Showing 6 changed files with 56 additions and 8 deletions.
36 changes: 33 additions & 3 deletions HISTORY.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,39 @@
## future v2.0.0

Breaking changes:
- `bare_extraction()`: the function now returns an instance of the Document class by default
- `bare_extraction()`: `as_dict` deprecation warning → use `.as_dict()` method on return value
- `bare_extraction()` and `extract()`: `no_fallback` deprecation warning → use `fast` instead
- Python 3.6 and 3.7 deprecated (#709)
- `bare_extraction()`:
- now returns an instance of the `Document` class by default
- `as_dict` deprecation warning → use `.as_dict()` method on return value (#730)
- `bare_extraction()` and `extract()`: `no_fallback` deprecation warning → use `fast` instead (#730)
- downloads: remove `decode` argument in `fetch_url()` → use `fetch_response` instead (#724)
- deprecated graphical user interface now removed (#713)
- extraction: move `max_tree_size` parameter to `settings.cfg` (#742)
- see [Python](https://trafilatura.readthedocs.io/en/latest/usage-python.html#deprecations) and [CLI](https://trafilatura.readthedocs.io/en/latest/usage-cli.html#deprecations) deprecations in the docs

Fixes:
- set `options.source` before raising error on empty doc tree by @dmoklaf (#707)
- robust encoding in `options.source` (#717)
- more robust mapping for conversion to HTML (#721)
- CLI downloads: use all information in settings file (#734)
- downloads: cleaner urllib3 code (#736)
- CLI: print URLs early for feeds and sitemaps with `--list` with @gremid (#744)

Metadata:
- more robust URL extraction (#710)

Maintenance:
- remove already deprecated functions and args (#716)
- add type hints (#723, #728)
- setup: use `pyproject.toml` file (#715)
- simplify code (#708, #709, #727)
- better debug messages in `main_extractor` (#714)
- evaluation: review data, update packages, add magic_html (#731)
- setup: explicit exports through `__all__` (#740)

Documentation:
- fix link in `docs/index.html` by @nzw0301 (#711)
- remove docs from published packages (#743)


## 1.12.2
Expand Down
6 changes: 6 additions & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,12 @@ Contributions of all kinds are welcome. Visit the `Contributing page <https://gi
Many thanks to the `contributors <https://github.com/adbar/trafilatura/graphs/contributors>`_ who extended the docs or submitted bug reports, features and bugfixes!


Changes
-------

For version history and changes see the `changelog <https://github.com/adbar/trafilatura/blob/master/HISTORY.md>`_.


Context
-------

Expand Down
5 changes: 4 additions & 1 deletion docs/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,7 @@ In case this does not happen automatically, specify the version number:

``pip install trafilatura==number``

- Last version for Python 3.6 and 3.7: ``1.12.2``
- Last version for Python 3.5: ``0.9.3``
- Last version for Python 3.4: ``0.8.2``

Expand Down Expand Up @@ -123,6 +124,8 @@ htmldate[all] / htmldate[speed]
py3langid
Language detection on extracted main text
pycurl
Faster downloads, possibly less robust though
Faster downloads, useful where urllib3 fails
urllib3[socks]
Downloads through SOCKS proxy with urllib3
zstandard
Additional compression algorithm for downloads
4 changes: 2 additions & 2 deletions docs/usage-cli.rst
Original file line number Diff line number Diff line change
Expand Up @@ -289,10 +289,10 @@ For all usage instructions see ``trafilatura -h``:
[--parallel PARALLEL] [-b BLACKLIST] [--list]
[-o OUTPUTDIR] [--backup-dir BACKUP_DIR] [--keep-dirs]
[--feed [FEED] | --sitemap [SITEMAP] | --crawl [CRAWL] |
--explore [EXPLORE]] [--archived]
--explore [EXPLORE] | --probe [PROBE]] [--archived]
[--url-filter URL_FILTER [URL_FILTER ...]] [-f]
[--formatting] [--links] [--images] [--no-comments]
[--no-tables] [--only-with-metadata]
[--no-tables] [--only-with-metadata] [--with-metadata]
[--target-language TARGET_LANGUAGE] [--deduplicate]
[--config-file CONFIG_FILE] [--precision] [--recall]
[--output-format {csv,json,html,markdown,txt,xml,xmltei} |
Expand Down
4 changes: 4 additions & 0 deletions docs/usage-gui.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,10 @@ Note that the GUI was a feature in Trafilatura until version 1.8.1, but it is cu
If installation fails, usage on the command-line is recommended.


.. hint::
This interface is removed until further notice starting from Trafilatura version 2, mostly due to issues with cross-platform tests and maintenance.


Installation
~~~~~~~~~~~~

Expand Down
9 changes: 7 additions & 2 deletions docs/usage-python.rst
Original file line number Diff line number Diff line change
Expand Up @@ -481,7 +481,12 @@ Deprecations

The following functions and arguments are deprecated:

- extraction: ``process_record()`` function → use ``extract()`` instead
- extraction:
- ``process_record()`` function → use ``extract()`` instead
- ``csv_output``, ``json_output``, ``tei_output``, ``xml_output`` → use ``output_format`` parameter instead
- ``bare_extraction(as_dict=True)`` → the function returns a ``Document`` object, use ``.as_dict()`` method on it
- ``bare_extraction()`` and ``extract()``: ``no_fallback`` → use ``fast`` instead
- ``max_tree_size`` parameter moved to ``settings.cfg`` file
- downloads: ``decode`` argument in ``fetch_url()`` → use ``fetch_response`` instead
- utils: ``decode_response()`` function → use ``decode_file()`` instead
- extraction: ``csv_output``, ``json_output``, ``tei_output``, ``xml_output`` → use ``output_format`` parameter instead
- metadata: ``with_metadata`` (include metadata) had once the effect of today's ``only_with_metadata`` (only documents with necessary metadata)

0 comments on commit b528514

Please sign in to comment.