Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving Self-Hosting and Removing 3rd Party dependencies. #4465

Open
wants to merge 40 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 36 commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
60303ff
Self-Hosting Changes
Podginator Oct 30, 2024
bb7b3c9
Fix Minio Environment Variable
Podginator Oct 30, 2024
593bac0
Just make pdfs successful, due to lack of PDFHandler
Podginator Oct 30, 2024
d4710a8
Fix issue where flag was set wrong
Podginator Oct 30, 2024
26c5ef3
Added an NGINX Example file
Podginator Oct 31, 2024
4607032
Add some documentation for self-hosting via Docker Compose
Podginator Oct 31, 2024
ae66e2e
Make some adjustments to Puppeteer due to failing sites.
Podginator Oct 31, 2024
b350fbd
adjust timings
Podginator Oct 31, 2024
322ec68
Add start of Mail Service
Podginator Nov 1, 2024
6f1ee6b
Fix Docker Files
Podginator Nov 1, 2024
222ba06
More email service stuff
Podginator Nov 2, 2024
34e039e
Add Guide to use Zapier for Email-Importing.
Podginator Nov 2, 2024
8b845b5
Ensure that if no env is provided it uses the old email settings
Podginator Nov 2, 2024
e557fd0
Add some instructions for self-hosted email
Podginator Nov 3, 2024
b8226db
Add SNS Endpoints for Mail Watcher
Podginator Nov 3, 2024
af70b25
Add steps and functionality for using SES and SNS for email
Podginator Nov 3, 2024
2e3134c
Uncomment a few jobs.
Podginator Nov 3, 2024
ab51fc9
Added option for Firefox for parser. Was having issues with Chromium …
Podginator Nov 4, 2024
0e6c675
Add missing space.
Podginator Nov 5, 2024
6b7f170
Fix some wording on the Guide
Podginator Nov 6, 2024
9d41cc5
Fix Package
Podginator Nov 11, 2024
a66f92b
Fix MV
Podginator Nov 13, 2024
c27af01
Do raw handlers for Medium
Podginator Nov 22, 2024
7bebb45
Fix images in Medium
Podginator Nov 22, 2024
7bdf222
Update self-hosting/GUIDE.md
Podginator Nov 25, 2024
d42656b
Update Guide with other variables
Podginator Nov 27, 2024
685f542
Merge
Podginator Nov 27, 2024
be7102b
Add The Verge to JS-less handlers
Podginator Nov 28, 2024
55ba7b0
Update regex and image-proxy
Podginator Nov 28, 2024
e729225
Update self-hosting/nginx/nginx.conf
Podginator Nov 28, 2024
7fd4095
Update regex and image-proxy
Podginator Nov 28, 2024
a0f6f14
Update regex and image-proxy
Podginator Nov 28, 2024
99ed2bb
Update self-hosting/docker-compose/docker-compose.yml
Podginator Nov 28, 2024
e423885
Fix Minio for Export
Podginator Nov 29, 2024
ad6a997
Revert yarn lock removal
Podginator Nov 29, 2024
da6ab7a
Merge to main
Podginator Nov 29, 2024
f16085f
Update GUIDE with newer NGINX
Podginator Dec 2, 2024
efe7e61
Update nginx config to include api/save route
Podginator Dec 3, 2024
ea69eb6
Enable Native PDF View for PDFS
Podginator Dec 8, 2024
eab1c2a
Enable Native PDF View for PDFS
Podginator Dec 8, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 1 addition & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -151,24 +151,7 @@ is done fetching your content you will see it in your library.

## How to deploy to your own server

Omnivore was originally designed to be deployed on GCP and takes advantage
of some of GCP's PaaS features. We are working to make Omnivore more portable
so you can easily run the service on your own infrastructure. You can track
progress here: <https://github.com/omnivore-app/omnivore/issues/25>

To deploy Omnivore on your own hardware you will need to deploy three
dockerized services and configure access to a postgres service. To handle
PDF documents you will need to configure access to a Google Cloud Storage
bucket.

- `packages/api` - the backend API service
- `packages/web` - the web frontend (can easily be deployed to vercel)
- `packages/puppeteer-parse` - the content fetching service (can easily
be deployed as an AWS lambda or GCP Cloud Function)

Additionally, you will need to run our database migrations to initialize
your database. These are dockerized and can be run with the
`packages/db` service.
A guide for running a self hosted server can be found [here](./self-hosting/GUIDE.md)

## License

Expand Down
Binary file added docs/guides/images/cloudflare-tunnel.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/guides/images/create-new-email.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/guides/images/imported-email.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/guides/images/received-email.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/guides/images/ses-add-domain.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/guides/images/ses-verify.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/guides/images/sns-add-action-publish.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/guides/images/sns-create-identity.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/guides/images/sns-create-ruleset.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/guides/images/sns-create-subscription.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/guides/images/sns-create-topic.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/guides/images/sns-publish-menu.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/guides/images/sns-topic-menu.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/guides/images/testing-incoming-email.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/guides/images/zapier-email-webhook.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/guides/images/zapier-javascript-step.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/guides/images/zapier-webhook-step.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion imageproxy/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM willnorris/imageproxy:v0.10.0 as build
FROM ghcr.io/willnorris/imageproxy:main as build

# Above imageproxy image is built from scratch image and is barebones
# Switching over to ubuntu base image to allow us to debug better.
Expand Down
5 changes: 4 additions & 1 deletion packages/api/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,10 @@
"voca": "^1.4.0",
"winston": "^3.3.3",
"yaml": "^2.4.1",
"youtubei": "^1.5.4"
"youtubei": "^1.5.4",
"@aws-sdk/client-s3": "^3.679.0",
"@aws-sdk/s3-request-presigner": "^3.679.0",
"@aws-sdk/lib-storage": "^3.679.0"
},
"devDependencies": {
"@istanbuljs/nyc-config-typescript": "^1.0.2",
Expand Down
61 changes: 61 additions & 0 deletions packages/api/queue-processor/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
FROM node:18.16 as builder

WORKDIR /app

ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD true
RUN apt-get update && apt-get install -y g++ make python3

COPY package.json .
COPY yarn.lock .
COPY tsconfig.json .
COPY .prettierrc .
COPY .eslintrc .

COPY /packages/readabilityjs/package.json ./packages/readabilityjs/package.json
COPY /packages/api/package.json ./packages/api/package.json
COPY /packages/text-to-speech/package.json ./packages/text-to-speech/package.json
COPY /packages/content-handler/package.json ./packages/content-handler/package.json
COPY /packages/liqe/package.json ./packages/liqe/package.json
COPY /packages/utils/package.json ./packages/utils/package.json

RUN yarn install --pure-lockfile

ADD /packages/readabilityjs ./packages/readabilityjs
ADD /packages/api ./packages/api
ADD /packages/text-to-speech ./packages/text-to-speech
ADD /packages/content-handler ./packages/content-handler
ADD /packages/liqe ./packages/liqe
ADD /packages/utils ./packages/utils

RUN yarn workspace @omnivore/utils build
RUN yarn workspace @omnivore/text-to-speech-handler build
RUN yarn workspace @omnivore/content-handler build
RUN yarn workspace @omnivore/liqe build
RUN yarn workspace @omnivore/api build

# After building, fetch the production dependencies
RUN rm -rf /app/packages/api/node_modules
RUN rm -rf /app/node_modules
RUN yarn install --pure-lockfile --production

FROM node:18.16 as runner
LABEL org.opencontainers.image.source="https://github.com/omnivore-app/omnivore"

RUN apt-get update && apt-get install -y netcat-openbsd

WORKDIR /app

ENV NODE_ENV production

COPY --from=builder /app/packages/api/dist /app/packages/api/dist
COPY --from=builder /app/packages/readabilityjs/ /app/packages/readabilityjs/
COPY --from=builder /app/packages/api/package.json /app/packages/api/package.json
COPY --from=builder /app/packages/api/node_modules /app/packages/api/node_modules
COPY --from=builder /app/node_modules /app/node_modules
COPY --from=builder /app/package.json /app/package.json
COPY --from=builder /app/packages/text-to-speech/ /app/packages/text-to-speech/
COPY --from=builder /app/packages/content-handler/ /app/packages/content-handler/
COPY --from=builder /app/packages/liqe/ /app/packages/liqe/
COPY --from=builder /app/packages/utils/ /app/packages/utils/

CMD ["yarn", "workspace", "@omnivore/api", "start_queue_processor"]
66 changes: 37 additions & 29 deletions packages/api/src/jobs/export.ts
Original file line number Diff line number Diff line change
@@ -1,10 +1,6 @@
import archiver, { Archiver } from 'archiver'
import { v4 as uuidv4 } from 'uuid'
import {
ContentReaderType,
LibraryItem,
LibraryItemState,
} from '../entity/library_item'
import { LibraryItem, LibraryItemState } from '../entity/library_item'
import { TaskState } from '../generated/graphql'
import { findExportById, saveExport } from '../services/export'
import { findHighlightsByLibraryItemId } from '../services/highlights'
Expand All @@ -17,12 +13,11 @@ import { sendExportJobEmail } from '../services/send_emails'
import { findActiveUser } from '../services/user'
import { logger } from '../utils/logger'
import { highlightToMarkdown } from '../utils/parser'
import {
contentFilePath,
createGCSFile,
generateUploadFilePathName,
} from '../utils/uploads'
import { batch } from 'googleapis/build/src/apis/batch'
import { env } from '../env'
import { storage } from '../repository/storage/storage'
import { File } from '../repository/storage/StorageClient'
import { Readable } from 'stream'
import { contentFilePath, generateUploadFilePathName } from '../utils/uploads'
import { getRepository } from '../repository'
import { UploadFile } from '../entity/upload_file'

Expand All @@ -31,6 +26,12 @@ export interface ExportJobData {
exportId: string
}

const bucketName = env.fileUpload.gcsUploadBucket

const createGCSFile = (filename: string): File => {
return storage.createFile(bucketName, filename)
}

export const EXPORT_JOB_NAME = 'export'

const itemStateMappping = (state: LibraryItemState) => {
Expand Down Expand Up @@ -61,7 +62,7 @@ const uploadContent = async (
const file = createGCSFile(filePath)

// check if file is already uploaded
const [exists] = await file.exists()
const exists = await file.exists()
if (!exists) {
logger.info(`File not found: ${filePath}`)

Expand All @@ -81,10 +82,14 @@ const uploadContent = async (
contentType: 'text/html',
private: true,
})
archive.append(Readable.from(item.readableContent), {
name: `content/${libraryItem.slug}.html`,
})
}

// append the existing file to the archive
archive.append(file.createReadStream(), {
const content = await file.download()
archive.append(Readable.from(content.toString()), {
name: `content/${libraryItem.slug}.html`,
})
}
Expand All @@ -97,17 +102,19 @@ const uploadPdfContent = async (
id: libraryItem.uploadFileId,
})
if (!upload || !upload.fileName) {
console.log(`upload does not have a filename: ${upload}`)
console.log(
`upload does not have a filename: ${upload?.fileName ?? 'empty'}`
)
return
}

const filePath = generateUploadFilePathName(upload.id, upload.fileName)
const file = createGCSFile(filePath)
const [exists] = await file.exists()
const exists = await file.exists()
if (exists) {
console.log(`adding PDF file: ${filePath}`)
// append the existing file to the archive
archive.append(file.createReadStream(), {
archive.append(await file.download(), {
name: `content/${libraryItem.slug}.pdf`,
})
}
Expand Down Expand Up @@ -238,20 +245,25 @@ export const exportJob = async (jobData: ExportJobData) => {

// Create a write stream
const writeStream = file.createWriteStream({
metadata: {
contentType: 'application/zip',
},
contentType: 'application/zip',
})

const finishedPromise = new Promise<void>((resolve, reject) => {
if (writeStream.closed) {
resolve()
}
writeStream.on('finish', () => {
logger.info('File successfully written to GCS')
resolve()
})
writeStream.on('error', reject)
})

// Handle any errors in the streams
writeStream.on('error', (err) => {
logger.error('Error writing to GCS:', err)
})

writeStream.on('finish', () => {
logger.info('File successfully written to GCS')
})

// Initialize archiver for zipping files
const archive = archiver('zip', {
zlib: { level: 9 }, // Compression level
Expand All @@ -264,7 +276,6 @@ export const exportJob = async (jobData: ExportJobData) => {

// Pipe the archiver output to the write stream
archive.pipe(writeStream)

let cursor = 0
try {
// fetch data from the database
Expand Down Expand Up @@ -305,17 +316,14 @@ export const exportJob = async (jobData: ExportJobData) => {
}

// Ensure that the writeStream has finished
await new Promise((resolve, reject) => {
writeStream.on('finish', resolve)
writeStream.on('error', reject)
})
await finishedPromise

logger.info(`export completed, exported ${cursor} items`, {
userId,
})

// generate a temporary signed url for the zip file
const [signedUrl] = await file.getSignedUrl({
const signedUrl = await storage.signedUrl(bucketName, fullPath, {
action: 'read',
expires: Date.now() + 168 * 60 * 60 * 1000, // one week
})
Expand Down
46 changes: 30 additions & 16 deletions packages/api/src/queue-processor.ts
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,10 @@ import {
PROCESS_YOUTUBE_VIDEO_JOB_NAME,
} from './jobs/process-youtube-video'
import { pruneTrashJob, PRUNE_TRASH_JOB } from './jobs/prune_trash'
import { refreshAllFeeds } from './jobs/rss/refreshAllFeeds'
import {
REFRESH_ALL_FEEDS_JOB_NAME,
refreshAllFeeds,
} from './jobs/rss/refreshAllFeeds'
import { refreshFeed } from './jobs/rss/refreshFeed'
import { savePageJob } from './jobs/save_page'
import {
Expand Down Expand Up @@ -159,25 +162,25 @@ export const createWorker = (connection: ConnectionOptions) =>
async (job: Job) => {
const executeJob = async (job: Job) => {
switch (job.name) {
// case 'refresh-all-feeds': {
// const queue = await getQueue()
// const counts = await queue?.getJobCounts('prioritized')
// if (counts && counts.wait > 1000) {
// return
// }
// return await refreshAllFeeds(appDataSource)
// }
// case 'refresh-feed': {
// return await refreshFeed(job.data)
// }
case 'refresh-all-feeds': {
const queue = await getQueue()
const counts = await queue?.getJobCounts('prioritized')
if (counts && counts.wait > 1000) {
return
}
return await refreshAllFeeds(appDataSource)
}
case 'refresh-feed': {
return await refreshFeed(job.data)
}
case 'save-page': {
return savePageJob(job.data, job.attemptsMade)
}
// case 'update-pdf-content': {
// return updatePDFContentJob(job.data)
// }
// case THUMBNAIL_JOB:
// return findThumbnail(job.data)
case THUMBNAIL_JOB:
return findThumbnail(job.data)
case TRIGGER_RULE_JOB_NAME:
return triggerRule(job.data)
case UPDATE_LABELS_JOB:
Expand Down Expand Up @@ -218,8 +221,8 @@ export const createWorker = (connection: ConnectionOptions) =>
// return updateHome(job.data)
// case SCORE_LIBRARY_ITEM_JOB:
// return scoreLibraryItem(job.data)
// case GENERATE_PREVIEW_CONTENT_JOB:
// return generatePreviewContent(job.data)
case GENERATE_PREVIEW_CONTENT_JOB:
return generatePreviewContent(job.data)
case PRUNE_TRASH_JOB:
return pruneTrashJob(job.data)
case EXPIRE_FOLDERS_JOB_NAME:
Expand Down Expand Up @@ -260,6 +263,17 @@ const setupCronJobs = async () => {
},
}
)

await queue.add(
REFRESH_ALL_FEEDS_JOB_NAME,
{},
{
priority: getJobPriority(REFRESH_ALL_FEEDS_JOB_NAME),
repeat: {
every: 14_400_000, // 4 Hours
},
}
)
}

const main = async () => {
Expand Down
Loading
Loading