Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Git LFS instructions #133

Open
wants to merge 14 commits into
base: main
Choose a base branch
from
Open
1 change: 1 addition & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
assets/datasets/* filter=lfs diff=lfs merge=lfs -text
49 changes: 49 additions & 0 deletions .github/workflows/pr_file_check.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
name: Check for Large Files and Restricted Extensions

on:
pull_request:
branches:
- main
types: [opened, synchronize, reopened]

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true

env:
LLVM_VERSION: 16

jobs:
check-files:
name: Check file size and type
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
set-safe-directory: true
fetch-depth: 1

- name: Fetch base branch
run: git fetch origin ${{ github.event.pull_request.base.ref }} --depth=1

- name: Check for large files
run: |
MAX_SIZE=5M # Set max file size limit
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I set this to be 5M for now. Happy to change to some other more sensible numbers

LARGE_FILES=$(git diff --name-only --diff-filter=A origin/${{ github.event.pull_request.base.ref }} | xargs du -h | awk -v max="$MAX_SIZE" '$1 > max {print $2}')

if [[ ! -z "$LARGE_FILES" ]]; then
echo "❌ The following files exceed the allowed size of $MAX_SIZE:"
echo "$LARGE_FILES"
exit 1
fi

- name: Check for restricted file types
run: |
BLOCKED_EXTENSIONS="(exe|zip|tar.gz|bz2)" # Add any forbidden extensions
BAD_FILES=$(git diff --name-only --diff-filter=A origin/${{ github.event.pull_request.base.ref }} | grep -E "\.($BLOCKED_EXTENSIONS)$" || true)
if [[ ! -z "$BAD_FILES" ]]; then
echo "❌ The following files have restricted extensions:"
echo "$BAD_FILES"
exit 1
fi
67 changes: 67 additions & 0 deletions assets/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# Git LFS Setup and Usage Guide

## Installation

Git LFS must be installed before using it in a repository. Follow the installation steps based on your operating system.

### Ubuntu (Debian-based distributions)
```sh
sudo apt update
sudo apt install git-lfs
git lfs install

### AlmaLinux and ManyLinux
sudo dnf install git-lfs
git lfs install

### macOS
brew install git-lfs
git lfs install
```

## Tracking and committing large files

1. Initialize Git LFS in your repository:

`git lfs install`

2. Track specific file types or individual files using the following command:

`git lfs track "assets/*"`, where `assets` is a directory containing large files.

3. Commit the changes to `.gitattributes`:

`git add .gitattributes && git commit -m "Track large files with Git LFS"`

4. Add and commit the large files:

`git add assets/largefile.zip && git commit -m "Add large file"`

5. Push to remote:

`git push origin branch_name`

## Cloning and fetching large files

1. Clone a repository that uses Git LFS:

`git clone https://github.com/username/repository.git`. By default, cloning only retrieves the pointer files to the large file. To fetch the actual large files, use `git lfs pull`.

2. Fetch large files for an existing repository:

`git lfs pull`

## Check Git LFS status

To check which files are tracked by Git LFS:

`git lfs ls-files`

## Removing a file from LFS

Use the following steps to remove a file from LFS:

`git rm --cached assets/largefile.zip`, then commit and push.

Once the file is removed, remember to delete the tracking information in `.gitattributes`.