(NeurIPS D&B 2024) STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases
-
Updated
Jul 21, 2025 - Python
(NeurIPS D&B 2024) STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases
Refinery is a tool to extract and transform semi-structured data from Excel spreadsheets of different layouts in a declarative way.
Programming language for symbolic computation with unusual combination of pattern matching features: Tree patterns, associative patterns and expressions embedded in patterns.
A dataset for extracting information from repair manuals
Implementation of the semi-structured inference model in our ACL 2020 paper, INFOTABS: Inference on Tables as Semi-structured Data.
Repository containing code for the NAACL 2021 paper (Incorporating External Knowledge to Enhance Tabular Reasoning)
Endoscopic and Pathological data extraction for various endo-pathological data extraction
An ActiveModel extension to model your semi-structured data using embedded associations
Urban Dict spelling variant dataset. Source code of How to Evaluate Word Representations of Informal Domain?
This repository contains the official code for the paper : Realistic Data Augmentation Framework for Enhancing Tabular Reasoning (Findings-EMNLP, 2022).
Schema inference for semistructured data using Formal Concept Analysis
A semi-automatic web-based annotation tool for MyFixit dataset :
Implementation of the semi-structured inference model in our ACL 2023 paper: INFOSYNC: Information Synchronization across Multilingual Semi-structured Tables.
Web-based workflow management system that computes candidate tool workflows given input file(s) and the user's requirements regarding the output. Afterwards, runs a workflow selected by the user from the list of candidates. Implemented in Bracmat (~75%) and Java (~25%).
Endoscopic and Pathological data extraction for various endo-pathological data extraction
Java Standalone application for querying XML documents with requests with preferences (GTPs requests with preferences)
Framework to manipulate semi structured documents and extract data from them
An open collection includes 100+ semi-structured textual datasets. (LOG datasets, TXT datasets, CSV datasets etc.)
Add a description, image, and links to the semi-structured-data topic page so that developers can more easily learn about it.
To associate your repository with the semi-structured-data topic, visit your repo's landing page and select "manage topics."