-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Spike] for addressing HTML in DB #236
Comments
I did a little digging last night on this topic, but couldn't find anything that seemed like an authority on it. The general consensus seems to be that storing HTML in a database isn't really a serious security issue in itself -- the trouble comes when you render HTML content. It doesn't matter whether it is stored in a DB, a filesystem or memory ... if it comes from an untrusted source, it's dangerous. There are issues that are specific to DBs (like SQL-injection), but these issues are independent of HTML/JS. Generally speaking, validating/sanitizing HTML from untrusted sources doesn't seem practical. Web browser are just too powerful and ever-changing. I've read some ppl are using Markdown instead to reduce the risk ... but that seems like a big mistake to me. (It might be even easier to take advantage of bugs in open-source markdown libraries ...). If we are going to render HTML or JS on the site, whether we store it in GitHub or in Postgres at Vercel, it seems like we need to trust the authors. |
Good stuff, thanks for looking into it! How about forcing a subset of HTML
(eg via a DSL like Markdown)?
…On Thu, Sep 26, 2024 at 12:08 PM Matt Gianni ***@***.***> wrote:
I did a little digging last night on this topic, but couldn't find
anything that seemed like an authority on it.
The general consensus seems to be that storing HTML in a database isn't
really a serious security issue in itself -- the trouble comes when you
*render* HTML content. It doesn't matter whether it is stored in a DB, a
filesystem or memory ... if it comes from an untrusted source, it's
dangerous. There are issues that are specific to DBs (like SQL-injection),
but these issues are independent of HTML/JS.
Generally speaking, validating/sanitizing HTML from untrusted sources
doesn't seem practical. Web browser are just too powerful and
ever-changing. I've read some ppl are using Markdown instead to reduce the
risk ... but that seems like a big mistake to me. (It might be even easier
to take advantage of bugs in open-source markdown libraries ...).
If we are going to render HTML or JS on the site, whether we store it in
GitHub or in Postgres at Vercel, it seems like we need to trust the authors.
—
Reply to this email directly, view it on GitHub
<#236 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAVRHSCRSGZQPRV3JAQ4WFTZYRLSXAVCNFSM6AAAAABO35DZSKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNZXG4ZDOMZUHA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Ah parsed your message a bit too quickly. What are your thoughts on having
some protection vs none, however? I would think that, yes, it's an arms
race, but that covering the more obvious scenarios (like don't output JS if
it can be helped) would be feasible. As a rough and hyperbolic
counterpoint, an analogous position would be that it's impossible to fully
secure an OS b/c of 0 days, so effort in that direction could be futile.
On Thu, Sep 26, 2024 at 2:09 PM Nick Visutsithiwong ***@***.***>
wrote:
… Good stuff, thanks for looking into it! How about forcing a subset of HTML
(eg via a DSL like Markdown)?
On Thu, Sep 26, 2024 at 12:08 PM Matt Gianni ***@***.***>
wrote:
> I did a little digging last night on this topic, but couldn't find
> anything that seemed like an authority on it.
>
> The general consensus seems to be that storing HTML in a database isn't
> really a serious security issue in itself -- the trouble comes when you
> *render* HTML content. It doesn't matter whether it is stored in a DB, a
> filesystem or memory ... if it comes from an untrusted source, it's
> dangerous. There are issues that are specific to DBs (like SQL-injection),
> but these issues are independent of HTML/JS.
>
> Generally speaking, validating/sanitizing HTML from untrusted sources
> doesn't seem practical. Web browser are just too powerful and
> ever-changing. I've read some ppl are using Markdown instead to reduce the
> risk ... but that seems like a big mistake to me. (It might be even easier
> to take advantage of bugs in open-source markdown libraries ...).
>
> If we are going to render HTML or JS on the site, whether we store it in
> GitHub or in Postgres at Vercel, it seems like we need to trust the authors.
>
> —
> Reply to this email directly, view it on GitHub
> <#236 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AAVRHSCRSGZQPRV3JAQ4WFTZYRLSXAVCNFSM6AAAAABO35DZSKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNZXG4ZDOMZUHA>
> .
> You are receiving this because you authored the thread.Message ID:
> ***@***.***>
>
|
(wrt to eg SQL injection, we could break that out into a sibling ticket or
just rename this one to be more expansive)
On Thu, Sep 26, 2024 at 2:17 PM Nick Visutsithiwong ***@***.***>
wrote:
… Ah parsed your message a bit too quickly. What are your thoughts on having
some protection vs none, however? I would think that, yes, it's an arms
race, but that covering the more obvious scenarios (like don't output JS if
it can be helped) would be feasible. As a rough and hyperbolic
counterpoint, an analogous position would be that it's impossible to fully
secure an OS b/c of 0 days, so effort in that direction could be futile.
On Thu, Sep 26, 2024 at 2:09 PM Nick Visutsithiwong ***@***.***>
wrote:
> Good stuff, thanks for looking into it! How about forcing a subset of
> HTML (eg via a DSL like Markdown)?
>
> On Thu, Sep 26, 2024 at 12:08 PM Matt Gianni ***@***.***>
> wrote:
>
>> I did a little digging last night on this topic, but couldn't find
>> anything that seemed like an authority on it.
>>
>> The general consensus seems to be that storing HTML in a database isn't
>> really a serious security issue in itself -- the trouble comes when you
>> *render* HTML content. It doesn't matter whether it is stored in a DB,
>> a filesystem or memory ... if it comes from an untrusted source, it's
>> dangerous. There are issues that are specific to DBs (like SQL-injection),
>> but these issues are independent of HTML/JS.
>>
>> Generally speaking, validating/sanitizing HTML from untrusted sources
>> doesn't seem practical. Web browser are just too powerful and
>> ever-changing. I've read some ppl are using Markdown instead to reduce the
>> risk ... but that seems like a big mistake to me. (It might be even easier
>> to take advantage of bugs in open-source markdown libraries ...).
>>
>> If we are going to render HTML or JS on the site, whether we store it in
>> GitHub or in Postgres at Vercel, it seems like we need to trust the authors.
>>
>> —
>> Reply to this email directly, view it on GitHub
>> <#236 (comment)>,
>> or unsubscribe
>> <https://github.com/notifications/unsubscribe-auth/AAVRHSCRSGZQPRV3JAQ4WFTZYRLSXAVCNFSM6AAAAABO35DZSKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNZXG4ZDOMZUHA>
>> .
>> You are receiving this because you authored the thread.Message ID:
>> ***@***.***>
>>
>
|
I think it comes down to the use case. If the HTML/JS is coming from our team, I wouldn't be worried about it. Storing the HTML in a DB vs FS seems pretty similar. If down the road we allow anonymous website users to post comments, etc., that use case would make me MUCH more nervous about user-submitted HTML of course. (One crazy thought occurred to me though, and I'm not seriously suggesting it -- it seems like it would be possible to get one of these LLMs to review user submitted HTML/JS for potential security problems during validation - I wonder how reliable something like that could be). |
Recent change to seed file includes values that have HTML incl class names and plain text. If we store data like this, especially if it becomes editable (eg via CMS) down the road, this could result in increasing our attack surface.
Need to look into 1) best practice and 2) sanitizing or storing in a diff way.
See issue #222 for referenced code.
Original comment below:
Originally posted by @nickvisut in #222 (comment)
The text was updated successfully, but these errors were encountered: