[Spike] for addressing HTML in DB #236

nickvisut · 2024-09-26T01:57:28Z

Recent change to seed file includes values that have HTML incl class names and plain text. If we store data like this, especially if it becomes editable (eg via CMS) down the road, this could result in increasing our attack surface.

Need to look into 1) best practice and 2) sanitizing or storing in a diff way.

See issue #222 for referenced code.

Original comment below:

          @BeeSeeWhy @mattgianni @thomhickey might make sense to get this merged in despite my question above. Any recos on how to tackle HTML in our data, though? Is this fine?

Originally posted by @nickvisut in #222 (comment)

The text was updated successfully, but these errors were encountered:

mattgianni · 2024-09-26T19:08:05Z

I did a little digging last night on this topic, but couldn't find anything that seemed like an authority on it.

The general consensus seems to be that storing HTML in a database isn't really a serious security issue in itself -- the trouble comes when you render HTML content. It doesn't matter whether it is stored in a DB, a filesystem or memory ... if it comes from an untrusted source, it's dangerous. There are issues that are specific to DBs (like SQL-injection), but these issues are independent of HTML/JS.

Generally speaking, validating/sanitizing HTML from untrusted sources doesn't seem practical. Web browser are just too powerful and ever-changing. I've read some ppl are using Markdown instead to reduce the risk ... but that seems like a big mistake to me. (It might be even easier to take advantage of bugs in open-source markdown libraries ...).

If we are going to render HTML or JS on the site, whether we store it in GitHub or in Postgres at Vercel, it seems like we need to trust the authors.

nickvisut · 2024-09-26T21:10:38Z

Good stuff, thanks for looking into it! How about forcing a subset of HTML (eg via a DSL like Markdown)?

…

On Thu, Sep 26, 2024 at 12:08 PM Matt Gianni ***@***.***> wrote: I did a little digging last night on this topic, but couldn't find anything that seemed like an authority on it. The general consensus seems to be that storing HTML in a database isn't really a serious security issue in itself -- the trouble comes when you *render* HTML content. It doesn't matter whether it is stored in a DB, a filesystem or memory ... if it comes from an untrusted source, it's dangerous. There are issues that are specific to DBs (like SQL-injection), but these issues are independent of HTML/JS. Generally speaking, validating/sanitizing HTML from untrusted sources doesn't seem practical. Web browser are just too powerful and ever-changing. I've read some ppl are using Markdown instead to reduce the risk ... but that seems like a big mistake to me. (It might be even easier to take advantage of bugs in open-source markdown libraries ...). If we are going to render HTML or JS on the site, whether we store it in GitHub or in Postgres at Vercel, it seems like we need to trust the authors. — Reply to this email directly, view it on GitHub <#236 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAVRHSCRSGZQPRV3JAQ4WFTZYRLSXAVCNFSM6AAAAABO35DZSKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNZXG4ZDOMZUHA> . You are receiving this because you authored the thread.Message ID: ***@***.***>

nickvisut · 2024-09-26T21:18:09Z

Ah parsed your message a bit too quickly. What are your thoughts on having some protection vs none, however? I would think that, yes, it's an arms race, but that covering the more obvious scenarios (like don't output JS if it can be helped) would be feasible. As a rough and hyperbolic counterpoint, an analogous position would be that it's impossible to fully secure an OS b/c of 0 days, so effort in that direction could be futile. On Thu, Sep 26, 2024 at 2:09 PM Nick Visutsithiwong ***@***.***> wrote:

…

Good stuff, thanks for looking into it! How about forcing a subset of HTML (eg via a DSL like Markdown)? On Thu, Sep 26, 2024 at 12:08 PM Matt Gianni ***@***.***> wrote: > I did a little digging last night on this topic, but couldn't find > anything that seemed like an authority on it. > > The general consensus seems to be that storing HTML in a database isn't > really a serious security issue in itself -- the trouble comes when you > *render* HTML content. It doesn't matter whether it is stored in a DB, a > filesystem or memory ... if it comes from an untrusted source, it's > dangerous. There are issues that are specific to DBs (like SQL-injection), > but these issues are independent of HTML/JS. > > Generally speaking, validating/sanitizing HTML from untrusted sources > doesn't seem practical. Web browser are just too powerful and > ever-changing. I've read some ppl are using Markdown instead to reduce the > risk ... but that seems like a big mistake to me. (It might be even easier > to take advantage of bugs in open-source markdown libraries ...). > > If we are going to render HTML or JS on the site, whether we store it in > GitHub or in Postgres at Vercel, it seems like we need to trust the authors. > > — > Reply to this email directly, view it on GitHub > <#236 (comment)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AAVRHSCRSGZQPRV3JAQ4WFTZYRLSXAVCNFSM6AAAAABO35DZSKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNZXG4ZDOMZUHA> > . > You are receiving this because you authored the thread.Message ID: > ***@***.***> >

nickvisut · 2024-09-26T21:20:16Z

(wrt to eg SQL injection, we could break that out into a sibling ticket or just rename this one to be more expansive) On Thu, Sep 26, 2024 at 2:17 PM Nick Visutsithiwong ***@***.***> wrote:

…

Ah parsed your message a bit too quickly. What are your thoughts on having some protection vs none, however? I would think that, yes, it's an arms race, but that covering the more obvious scenarios (like don't output JS if it can be helped) would be feasible. As a rough and hyperbolic counterpoint, an analogous position would be that it's impossible to fully secure an OS b/c of 0 days, so effort in that direction could be futile. On Thu, Sep 26, 2024 at 2:09 PM Nick Visutsithiwong ***@***.***> wrote: > Good stuff, thanks for looking into it! How about forcing a subset of > HTML (eg via a DSL like Markdown)? > > On Thu, Sep 26, 2024 at 12:08 PM Matt Gianni ***@***.***> > wrote: > >> I did a little digging last night on this topic, but couldn't find >> anything that seemed like an authority on it. >> >> The general consensus seems to be that storing HTML in a database isn't >> really a serious security issue in itself -- the trouble comes when you >> *render* HTML content. It doesn't matter whether it is stored in a DB, >> a filesystem or memory ... if it comes from an untrusted source, it's >> dangerous. There are issues that are specific to DBs (like SQL-injection), >> but these issues are independent of HTML/JS. >> >> Generally speaking, validating/sanitizing HTML from untrusted sources >> doesn't seem practical. Web browser are just too powerful and >> ever-changing. I've read some ppl are using Markdown instead to reduce the >> risk ... but that seems like a big mistake to me. (It might be even easier >> to take advantage of bugs in open-source markdown libraries ...). >> >> If we are going to render HTML or JS on the site, whether we store it in >> GitHub or in Postgres at Vercel, it seems like we need to trust the authors. >> >> — >> Reply to this email directly, view it on GitHub >> <#236 (comment)>, >> or unsubscribe >> <https://github.com/notifications/unsubscribe-auth/AAVRHSCRSGZQPRV3JAQ4WFTZYRLSXAVCNFSM6AAAAABO35DZSKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNZXG4ZDOMZUHA> >> . >> You are receiving this because you authored the thread.Message ID: >> ***@***.***> >> >

mattgianni · 2024-09-29T04:34:43Z

I think it comes down to the use case. If the HTML/JS is coming from our team, I wouldn't be worried about it. Storing the HTML in a DB vs FS seems pretty similar.

If down the road we allow anonymous website users to post comments, etc., that use case would make me MUCH more nervous about user-submitted HTML of course.

(One crazy thought occurred to me though, and I'm not seriously suggesting it -- it seems like it would be possible to get one of these LLMs to review user submitted HTML/JS for potential security problems during validation - I wonder how reliable something like that could be).

nickvisut added this to Support SFUSD Sep 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Spike] for addressing HTML in DB #236

[Spike] for addressing HTML in DB #236

nickvisut commented Sep 26, 2024 •

edited

Loading

mattgianni commented Sep 26, 2024

nickvisut commented Sep 26, 2024 via email

nickvisut commented Sep 26, 2024 via email

nickvisut commented Sep 26, 2024 via email

mattgianni commented Sep 29, 2024

[Spike] for addressing HTML in DB #236

[Spike] for addressing HTML in DB #236

Comments

nickvisut commented Sep 26, 2024 • edited Loading

mattgianni commented Sep 26, 2024

nickvisut commented Sep 26, 2024 via email

nickvisut commented Sep 26, 2024 via email

nickvisut commented Sep 26, 2024 via email

mattgianni commented Sep 29, 2024

nickvisut commented Sep 26, 2024 •

edited

Loading