Skip to content

Commit

Permalink
Merge pull request #1123 from kermitt2/release-0.8.1
Browse files Browse the repository at this point in the history
Preparation release 0.8.1
  • Loading branch information
lfoppiano authored Sep 14, 2024
2 parents 399ef9d + d15e4d2 commit 4cad850
Show file tree
Hide file tree
Showing 38 changed files with 2,437 additions and 637 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/ci-build-manual-crf.yml
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,6 @@ jobs:
registry: docker.io
pushImage: true
tags: |
latest-develop${{ github.event.inputs.suffix != '' && '-' || '' }}${{ github.event.inputs.suffix }}, latest-crf${{ github.event.inputs.suffix != '' && '-' || '' }}${{ github.event.inputs.suffix }}
latest-develop, latest-crf${{ github.event.inputs.suffix != '' && '-' || '' }}${{ github.event.inputs.suffix }}
- name: Image digest
run: echo ${{ steps.docker_build.outputs.digest }}
6 changes: 3 additions & 3 deletions .github/workflows/ci-build-unstable.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,11 @@ jobs:

steps:
- uses: actions/checkout@v4
- name: Set up JDK 17
- name: Set up JDK 11
uses: actions/setup-java@v4
with:
java-version: '17.0.10+7'
distribution: 'temurin'
java-version: '11'
distribution: 'adopt'
cache: 'gradle'
- name: Build with Gradle
run: ./gradlew clean assemble --info --stacktrace --no-daemon
Expand Down
23 changes: 23 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,29 @@ All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).

## [0.8.1] - 2024-06-10

### Added
- Identified URLs are now added in the TEI output #1099
- Added DL models for patent processing #1082
- Copyright and licence identification models #1078
- Add research infrastructure recognition for funding processing #1085

### Changed
- Improved the recognition of URLs using (when available) PDF annotations, such as clickable links
- Updated TEI schema #1084
- Review patent process #1082
- Add Kotlin language to support development and testing #1096

### Fixed
- Sentence segmentation avoids to split sentences with an URL in the middle #1097
- Sentence segmentation is now applied to funding and acknowledgement #1106
- Docker image was optimized to reduce the needed space #1088
- Fixed OOBE when processing large quantities of notes #1075
- Corrected `<title>` coordinate attribute name #1070
- Fix missing coordinates in paragraph continuation #1076
- Fixed JSON log output

## [0.8.0] - 2023-11-19

### Added
Expand Down
3 changes: 1 addition & 2 deletions Readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,11 +105,10 @@ Detailed end-to-end [benchmarking](https://grobid.readthedocs.io/en/latest/Bench
A series of additional modules have been developed for performing __structure aware__ text mining directly on scholar PDF, reusing GROBID's PDF processing and sequence labelling weaponry:

- [software-mention](https://github.com/ourresearch/software-mentions): recognition of software mentions and associated attributes in scientific literature
- [datastet](https://github.com/kermitt2/datastet): identification of named and implicit research datasets and associated attributes in scientific articles
- [datastet](https://github.com/kermitt2/datastet): identification of sections and sentences introducing datasets in a scientific article, identification of dataset names and attributes (implict and named datasets) and classification of the type of datasets
- [grobid-quantities](https://github.com/kermitt2/grobid-quantities): recognition and normalization of physical quantities/measurements
- [grobid-superconductors](https://github.com/lfoppiano/grobid-superconductors): recognition of superconductor material and properties in scientific literature
- [entity-fishing](https://github.com/kermitt2/entity-fishing), a tool for extracting Wikidata entities from text and document, which can also use Grobid to pre-process scientific articles in PDF, leading to more precise and relevant entity extraction and the capacity to annotate the PDF with interactive layout
- [datastet](https://github.com/kermitt2/datastet): identification of sections and sentences introducing datasets in a scientific article, identification of dataset names (implict and named datasets) and classification of the type of these datasets
- [grobid-ner](https://github.com/kermitt2/grobid-ner): named entity recognition
- [grobid-astro](https://github.com/kermitt2/grobid-astro): recognition of astronomical entities in scientific papers
- [grobid-bio](https://github.com/kermitt2/grobid-bio): a toy bio-entity tagger using BioNLP/NLPBA 2004 dataset
Expand Down
31 changes: 21 additions & 10 deletions build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -60,19 +60,29 @@ subprojects {
}
}

// sourceCompatibility = 1.11
// targetCompatibility = 1.11

kotlin {
jvmToolchain(17)
}

java {
toolchain {
languageVersion.set(JavaLanguageVersion.of(17))
sourceCompatibility = 1.11
targetCompatibility = 1.11

tasks.withType(KotlinCompile).configureEach {
sourceCompatibility = JavaVersion.VERSION_11
targetCompatibility = JavaVersion.VERSION_11
kotlinOptions {
jvmTarget = JavaVersion.VERSION_11
}
}

// kotlin {
// jvmToolchain(11)
// }

// java {
// toolchain {
// languageVersion.set(JavaLanguageVersion.of(11))
// vendor.set(JvmVendorSpec.ADOPTIUM)
//
// }
// }

repositories {
mavenCentral()
maven {
Expand Down Expand Up @@ -316,6 +326,7 @@ project("grobid-home") {
}

import org.apache.tools.ant.taskdefs.condition.Os
import org.jetbrains.kotlin.gradle.tasks.KotlinCompile

project(":grobid-service") {
apply plugin: 'application'
Expand Down
Loading

0 comments on commit 4cad850

Please sign in to comment.