Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Spike] for addressing HTML in DB #236

Open
nickvisut opened this issue Sep 26, 2024 · 5 comments
Open

[Spike] for addressing HTML in DB #236

nickvisut opened this issue Sep 26, 2024 · 5 comments

Comments

@nickvisut
Copy link
Collaborator

nickvisut commented Sep 26, 2024

Recent change to seed file includes values that have HTML incl class names and plain text. If we store data like this, especially if it becomes editable (eg via CMS) down the road, this could result in increasing our attack surface.

Need to look into 1) best practice and 2) sanitizing or storing in a diff way.

See issue #222 for referenced code.


Original comment below:

          @BeeSeeWhy @mattgianni @thomhickey might make sense to get this merged in despite my question above. Any recos on how to tackle HTML in our data, though? Is this fine?

Originally posted by @nickvisut in #222 (comment)

@mattgianni
Copy link
Collaborator

I did a little digging last night on this topic, but couldn't find anything that seemed like an authority on it.

The general consensus seems to be that storing HTML in a database isn't really a serious security issue in itself -- the trouble comes when you render HTML content. It doesn't matter whether it is stored in a DB, a filesystem or memory ... if it comes from an untrusted source, it's dangerous. There are issues that are specific to DBs (like SQL-injection), but these issues are independent of HTML/JS.

Generally speaking, validating/sanitizing HTML from untrusted sources doesn't seem practical. Web browser are just too powerful and ever-changing. I've read some ppl are using Markdown instead to reduce the risk ... but that seems like a big mistake to me. (It might be even easier to take advantage of bugs in open-source markdown libraries ...).

If we are going to render HTML or JS on the site, whether we store it in GitHub or in Postgres at Vercel, it seems like we need to trust the authors.

@nickvisut
Copy link
Collaborator Author

nickvisut commented Sep 26, 2024 via email

@nickvisut
Copy link
Collaborator Author

nickvisut commented Sep 26, 2024 via email

@nickvisut
Copy link
Collaborator Author

nickvisut commented Sep 26, 2024 via email

@mattgianni
Copy link
Collaborator

I think it comes down to the use case. If the HTML/JS is coming from our team, I wouldn't be worried about it. Storing the HTML in a DB vs FS seems pretty similar.

If down the road we allow anonymous website users to post comments, etc., that use case would make me MUCH more nervous about user-submitted HTML of course.

(One crazy thought occurred to me though, and I'm not seriously suggesting it -- it seems like it would be possible to get one of these LLMs to review user submitted HTML/JS for potential security problems during validation - I wonder how reliable something like that could be).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

2 participants