Skip to content
Merged
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
200 changes: 111 additions & 89 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,121 +23,142 @@ Sourcebot is a fast code indexing and search tool for your codebases. It is buil

![Demo video](https://github.com/user-attachments/assets/227176d8-fc61-42a9-8746-3cbc831f09e4)

## Features
- 💻 **One-command deployment**: Get started instantly using Docker on your own machine.
- 🔍 **Multi-repo search**: Effortlessly index and search through multiple public and private repositories (GitHub, GitLab, BitBucket).
- ⚡**Lightning fast performance**: Built on top of the powerful [Zoekt](https://github.com/sourcegraph/zoekt) search engine.
- 📂 **Full file visualization**: Instantly view the entire file when selecting any search result.
- 🎨 **Modern web application**: Enjoy a sleek interface with features like syntax highlighting, light/dark mode, and vim-style navigation

# Getting Started

## Using Docker
You can try out our public hosted demo [here](https://demo.sourcebot.dev/)!

#### Running Sourcebot example locally

0. Install <a href="https://docs.docker.com/get-started/get-docker/"><img src="https://www.docker.com/favicon.ico" width="16" height="16"> Docker </a>
Before getting started, please make sure you install <a href="https://docs.docker.com/get-started/get-docker/"><img src="https://www.docker.com/favicon.ico" width="16" height="16"> Docker </a>. This is the only dependency required to get started using Sourcebot.

1. Create a `config.json` file and list the repositories you want to index. The JSON schema [index.json](./schemas/index.json) defines the structure of the config file and the available options. For example, if we want to index Sourcebot on its own code, we could use the following config found in `sample-config.json`:
You can try out Sourcebot on your local machine without cloning the repo by running this simple command:

```json
{
"$schema": "https://raw.githubusercontent.com/TaqlaAI/sourcebot/main/schemas/index.json",
"Configs": [
{
"Type": "github",
"GitHubOrg": "TaqlaAI",
"Name": "^sourcebot$"
}
]
}
```
```
docker run -p 3000:3000 --rm --name sourcebot -e CONFIG_PATH=sample-config.json ghcr.io/taqlaai/sourcebot:main
```

Sourcebot also supports indexing GitLab & BitBucket. Checkout the [index.json](./schemas/index.json) for a full list of available options.
Navigate to `localhost:3000` in your favorite web browser to see Sourcebot running on your own machine! This example will index the Sourcebot repository and allow you to search through it. If you'd like to run Sourcebot on a different repository, please read [how to use Sourcebot on a custom repository](#using-sourcebot-on-a-custom-repository) !

2. Create a Personal Access Token (PAT) to authenticate with a code host(s):
<details>
<summary>What does this command do?</summary>

<div>
<details open>
<summary>
<picture>
<source media="(prefers-color-scheme: dark)" srcset=".github/images/github-favicon-inverted.png">
<img src="https://github.com/favicon.ico" width="16" height="16" alt="GitHub icon">
</picture>
GitHub
</summary>
Generate a GitHub Personal Access Token (PAT) [here](https://github.com/settings/tokens/new). If you're only indexing public repositories select the `public_repo` scope; otherwise, select the `repo` scope.
- Pull and run the Sourcebot docker image from [ghcr.io/taqlaai/sourcebot:main](ghcr.io/taqlaai/sourcebot:main)
- Set the `CONFIG_PATH` environment variable in the container to `sample-config.json`. Sourcebot loads the config file located at `CONFIG_PATH` to determine which repositories to index. To make things easier to try Sourcebot, we've baked in an [example](https://github.com/TaqlaAI/sourcebot/blob/main/sample-config.json) config file named `sample-config.json` into the published Docker image.
- Map port 3000 between your machine and the docker image (`-p 3000:3000`). This is what allows you to run Sourcebot by navigating to `localhost:3000`
</details>

You'll need to pass this PAT each time you run Sourcebot, so we recommend adding it as an environment variable. In this guide, we'll add the Github PAT as an environment variable called `GITHUB_TOKEN`:
```sh
export GITHUB_TOKEN=<your-token-here>
```
## Using Sourcebot on a custom repository

If you'd like to persist this environment variable across shell sessions, please add this line to your shell config file (ex. `~/.bashrc`, `~/.bash_profile`, etc)

Sourcebot supports indexing and searching through public and private repositories hosted on <img src="https://github.com/favicon.ico" width="16" height="16" /> GitHub, <img src="https://gitlab.com/favicon.ico" width="16" height="16" />GitLab, and <img src="https://bitbucket.org/favicon.ico" width="16" height="16" /> BitBucket. This section will guide you through configuring the repositories that Sourcebot indexes.

</details>
### Create a Sourcebot workspace
The Sourcebot workspace is a directory on your machine that stores your Sourcebot config and cache data. To create a Sourcebot workspace, simply create a new directory on your machine (ex. `sourcebot_workspace`):

<details>
<summary><img src="https://gitlab.com/favicon.ico" width="16" height="16" /> GitLab</summary>

TODO
### Create a Sourcebot config
Sourcebot needs a config file to tell it which repositories to index. By default, Sourcebot will look for a file called `config.json` within the mounted workspace.

</details>
Create a new file called `config.json` inside your workspace and paste in the following sample config which loads the `sourcebot` github repo:
```
{
"$schema": "https://raw.githubusercontent.com/TaqlaAI/sourcebot/main/schemas/index.json",
"Configs": [
{
"Type": "github",
"GitHubOrg": "TaqlaAI",
"Name": "sourcebot"
}
]
}
```

<details>
<summary><img src="https://bitbucket.org/favicon.ico" width="16" height="16" /> BitBucket</summary>
##### Changing the config file
The config file follows the schema defined [here](https://raw.githubusercontent.com/TaqlaAI/sourcebot/main/schemas/index.json). You can define multiple repos by adding config objects into the `Configs` list:

<pre>
{
"$schema": "https://raw.githubusercontent.com/TaqlaAI/sourcebot/main/schemas/index.json",
"Configs": [
{
"Type": "github",
"GitHubOrg": "TaqlaAI", <b>// github orgs must define the GithubOrg field</b>
"Name": "sourcebot" <b>// name of the github repository to index (regex match)</b>
},
{
"Type": "github",
"GithubUser": "ssloy", <b>// if indexing a github repo owned by a user, set the GithubUser field</b>
"Name": "tinyrenderer" <b>// name of the github repository to index (regex match)</b>
}
]
}
</pre>

TODO: add examples for GitLab and BitBucket above

### Mount your workspace and run Sourcebot
We can now run Sourcebot, but this time we mount the workspace folder so that it can pickup our custom config:

> [!NOTE]
> This command assumes that you're running it from within your Sourcebot workspace. If you're not, replace `$(pwd)` below with the absolute path of your Sourcebot workspace

TODO
```
docker run -p 3000:3000 --rm --name sourcebot -v $(pwd):/data ghcr.io/taqlaai/sourcebot:main
```

</details>
</div>
Navigate to `localhost:3000` in your browser to use Sourcebot to search the repositories you've defined in your config!

3. Launch the latest image from the [ghcr registry](https://github.com/TaqlaAI/sourcebot/pkgs/container/sourcebot):
### (Optional) Provide an access token to index private repositories
In order to allow Sourcebot to index your private repositories, you must provide it with an access token.

<div>
<details open>
<summary>
<picture>
<source media="(prefers-color-scheme: dark)" srcset=".github/images/github-favicon-inverted.png">
<img src="https://github.com/favicon.ico" width="16" height="16" alt="GitHub icon">
</picture>
GitHub
</summary>
<div>
<details>
<summary><img src="https://github.com/favicon.ico" width="16" height="16" /> GitHub</summary>

Run the `sourcebot` docker image, passing in the Github PAT you generated in the previous step as an environment variable called `GITHUB_TOKEN`:
```sh
docker run -p 3000:3000 --rm --name sourcebot -v $(pwd):/data -e GITHUB_TOKEN=$GITHUB_TOKEN ghcr.io/taqlaai/sourcebot:main
```
</details>
Generate a GitHub Personal Access Token (PAT) [here](https://github.com/settings/tokens/new) and make sure you select the `repo` scope.

<details>
<summary><img src="https://gitlab.com/favicon.ico" width="16" height="16" /> GitLab</summary>
You'll need to pass this PAT each time you run Sourcebot by setting the GITHUB_TOKEN environment variable:

```sh
docker run -p 3000:3000 --rm --name sourcebot -v $(pwd):/data -e GITLAB_TOKEN=<token> ghcr.io/taqlaai/sourcebot:main
```
<pre>
docker run -p 3000:3000 --rm --name sourcebot -e <b>GITHUB_TOKEN=[your-github-token]</b> -v $(pwd):/data ghcr.io/taqlaai/sourcebot:main
</pre>

</details>
</details>

<details>
<summary><img src="https://bitbucket.org/favicon.ico" width="16" height="16" /> BitBucket</summary>
<details>
<summary><img src="https://gitlab.com/favicon.ico" width="16" height="16" /> GitLab</summary>

TODO
TODO

</details>
</div>
</details>

Two things should happen: (1) a `.sourcebot` directory will be created containing the mirror repositories and indexes, and (2) you will see output similar to:
<details>
<summary><img src="https://bitbucket.org/favicon.ico" width="16" height="16" /> BitBucket</summary>

```sh
INFO spawned: 'node-server' with pid 10
INFO spawned: 'zoekt-indexserver' with pid 11
INFO spawned: 'zoekt-webserver' with pid 12
run [zoekt-mirror-github -dest /data/.sourcebot/repos -delete -org <org>]
...
INFO success: node-server entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
INFO success: zoekt-indexserver entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
INFO success: zoekt-webserver entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
```
TODO

zoekt will now index your repositories (at `HEAD`). By default, it will re-index existing repositories every hour, and discover new repositories every 24 hours.
</details>
</div>

### (Optional) Specifying a config file

By default Sourcebot will look for a file called `config.json` from within the directory that you [mount to `/data`](#mount-your-workspace-and-run-sourcebot). However, if you have multiple config files and you'd like to specificy a config file you can do so by setting the `CONFIG_PATH` environment variable when running the Docker image:

<pre>
docker run -p 3000:3000 --rm --name sourcebot -e <b>CONFIG_PATH=path/to/your/config_file</b> -v $(pwd):/data ghcr.io/taqlaai/sourcebot:main
</pre>

4. Go to `http://localhost:3000` - once an index has been created, you can start searching.

## Building Sourcebot
>[!NOTE]
>You don't need to build Sourcebot in order to use it! If you'd just like to use Sourcebot, please read [how to use Sourcebot with custom repositories](#using-sourcebot-on-a-custom-repository).

If you'd like to make changes to Sourcebot you'll need to build from source:

1. Install <a href="https://go.dev/doc/install"><img src="https://go.dev/favicon.ico" width="16" height="16"> go</a> and <a href="https://nodejs.org/"><img src="https://nodejs.org/favicon.ico" width="16" height="16"> NodeJS</a>. Note that a NodeJS version of at least `21.1.0` is required.

Expand Down Expand Up @@ -167,7 +188,7 @@ The zoekt binaries and web dependencies are placed into `bin` and `node_modules`
{
"Type": "github",
"GitHubOrg": "TaqlaAI",
"Name": "^sourcebot$"
"Name": "sourcebot"
}
]
}
Expand Down Expand Up @@ -216,14 +237,15 @@ The zoekt binaries and web dependencies are placed into `bin` and `node_modules`

## Telemetry

By default, Sourcebot collects anonymized usage data through [PostHog](https://posthog.com/) to help us improve the performance and reliability of our tool. We do not collect or transmit [any information related to your codebase](https://github.com/search?q=repo:TaqlaAI/sourcebot++captureEvent&type=code). All events are [sanitized](https://github.com/TaqlaAI/sourcebot/blob/main/src/app/posthogProvider.tsx) to ensure that no sensitive or identifying details leave your machine. The data we collect includes general usage statistics and metadata such as query performance (e.g., search duration, error rates) to monitor the application's health and functionality. This information helps us better understand how Sourcebot is used and where improvements can be made :)
By default, Sourcebot collects anonymized usage data through [PostHog](https://posthog.com/) to help us improve the performance and reliability of our tool. We do not collect or transmit [any information related to your codebase](https://github.com/search?q=repo:TaqlaAI/sourcebot++captureEvent&type=code). In addition, all events are [sanitized](https://github.com/TaqlaAI/sourcebot/blob/main/src/app/posthogProvider.tsx) to ensure that no sensitive or identifying details leave your machine. The data we collect includes general usage statistics and metadata such as query performance (e.g., search duration, error rates) to monitor the application's health and functionality. This information helps us better understand how Sourcebot is used and where improvements can be made :)

If you'd like to disable all telemetry, you can do so by setting the environment variable `SOURCEBOT_TELEMETRY_DISABLED` to `1` in the docker run command:
```sh
docker run -e SOURCEBOT_TELEMETRY_DISABLED=1 /* additional args */ ghcr.io/taqlaai/sourcebot:main
```

Or if you are building locally, add the following to your [.env](./.env) file:
<pre>
docker run -e <b>SOURCEBOT_TELEMETRY_DISABLED=1</b> /* additional args */ ghcr.io/taqlaai/sourcebot:main
</pre>

Or if you are [building locally](#building-sourcebot), add the following to your [.env](./.env) file:
```sh
SOURCEBOT_TELEMETRY_DISABLED=1
NEXT_PUBLIC_SOURCEBOT_TELEMETRY_DISABLED=1
Expand Down