Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File upload improvements #2919

Open
wants to merge 7 commits into
base: develop
Choose a base branch
from
Open

Conversation

BCerki
Copy link
Contributor

@BCerki BCerki commented Feb 26, 2025

#2123

This PR

  • set blob chunk size (I tried a couple options, didn't seem to make much of a difference)
  • remove check for same hash -- more efficient to just delete and recreate than read and check
  • the docs explain the file upload strategy
  • consolidated and renamed some of the ninja schemas
  • I'm having trouble with some mypy errors I hope the reviewer can weigh in on
  • I'm also having trouble with a bug where you can't edit an operation to be new entrant because an opted-in field is missing

@BCerki BCerki force-pushed the 2123-remove-files-have-same-hash branch from e775467 to 43cca7d Compare February 27, 2025 19:38
@BCerki BCerki force-pushed the 2123-remove-files-have-same-hash branch from 38918a4 to 0bdf9cc Compare March 6, 2025 23:14
<ul className="m-0 py-0 flex flex-col justify-start">
<li>
<Link
download={fileName}

Check warning

Code scanning / CodeQL

DOM text reinterpreted as HTML Medium

DOM text
is reinterpreted as HTML without escaping meta-characters.
DOM text
is reinterpreted as HTML without escaping meta-characters.

Copilot Autofix AI 2 days ago

To fix the problem, we need to ensure that the fileName is properly escaped before being rendered in the DOM. This can be achieved by using a library like he (HTML entities) to encode the file name, which will prevent any HTML or JavaScript from being executed.

  • Import the he library for encoding HTML entities.
  • Encode the fileName before rendering it in the Link component.
Suggested changeset 2
bciers/libs/components/src/form/widgets/FileWidget.tsx

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/bciers/libs/components/src/form/widgets/FileWidget.tsx b/bciers/libs/components/src/form/widgets/FileWidget.tsx
--- a/bciers/libs/components/src/form/widgets/FileWidget.tsx
+++ b/bciers/libs/components/src/form/widgets/FileWidget.tsx
@@ -6,2 +6,3 @@
 import Link from "next/link";
+import he from "he";
 
@@ -99,3 +100,3 @@
             <Link
-              download={fileName}
+              download={he.encode(fileName)}
               href={downloadUrl}
@@ -104,3 +105,3 @@
             >
-              {fileName}
+              {he.encode(fileName)}
             </Link>
EOF
@@ -6,2 +6,3 @@
import Link from "next/link";
import he from "he";

@@ -99,3 +100,3 @@
<Link
download={fileName}
download={he.encode(fileName)}
href={downloadUrl}
@@ -104,3 +105,3 @@
>
{fileName}
{he.encode(fileName)}
</Link>
bciers/package.json
Outside changed files

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/bciers/package.json b/bciers/package.json
--- a/bciers/package.json
+++ b/bciers/package.json
@@ -106,3 +106,4 @@
     "vite-require": "^0.2.3",
-    "vitest": "^3.0.5"
+    "vitest": "^3.0.5",
+    "he": "^1.2.0"
   },
EOF
@@ -106,3 +106,4 @@
"vite-require": "^0.2.3",
"vitest": "^3.0.5"
"vitest": "^3.0.5",
"he": "^1.2.0"
},
This fix introduces these dependencies
Package Version Security advisories
he (npm) 1.2.0 None
Copilot is powered by AI and may make mistakes. Always verify output.
Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CodeQL has raised an interesting point here. I'm looking into this more to see if it can actually be exploited.

) -> Tuple[Literal[200], Operation]:
payload = OperationNewEntrantApplicationInWithDocuments(
date_of_first_shipment=details.date_of_first_shipment,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing it like this instead of **details.dict() was to make mypy happy

Comment on lines 25 to 26
naics_code_id: Optional[int] = None
opt_in: Optional[bool] = False
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are optional and None so I'm not sure why I'm getting mypy errors

@BCerki BCerki force-pushed the 2123-remove-files-have-same-hash branch 3 times, most recently from 9f2664b to 06b749b Compare March 10, 2025 16:21
@BCerki BCerki force-pushed the 2123-remove-files-have-same-hash branch from 06b749b to 6a71ad4 Compare March 10, 2025 20:33
Copy link
Contributor

@andrea-williams andrea-williams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot 2025-03-11 at 3 08 50 PM

I wasn't able to get past the first step of the Registration workflow. Getting this error in the backend:

  File "/Users/awilliam/Documents/cas-registration/bc_obps/service/document_service_v2.py", line 29, in create_or_replace_operation_document
    existing_document = cls.get_operation_document_by_type_if_authorized(user_guid, operation_id, document_type)
                                                                                    ^^^^^^^^^^^^
NameError: name 'operation_id' is not defined

---------------------------------------------ERROR END-------------------------------------------------
Bad Request: /api/registration/operations/df62d793-8cfe-4272-a93e-ea9c9139ff82/registration/operation
[11/Mar/2025 22:07:41] "PUT /api/registration/operations/df62d793-8cfe-4272-a93e-ea9c9139ff82/registration/operation HTTP/1.1" 400 49

# Administration schemas


class OperationAdminstrationIn(OperationRegistrationIn):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class OperationAdminstrationIn(OperationRegistrationIn):
class OperationAdministrationIn(OperationRegistrationIn):

operation_representatives: List[int]


class OperationAdminstrationInWithDocuments(OperationRegistrationInWithDocuments):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class OperationAdminstrationInWithDocuments(OperationRegistrationInWithDocuments):
class OperationAdministrationInWithDocuments(OperationRegistrationInWithDocuments):

@router.get(
"/operations/{uuid:operation_id}/documents/{uuid:document_type}",
response={200: DocumentOut, custom_codes_4xx: Message},
tags=OPERATOR_TAGS,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OPERATION_TAGS instead?

description="""Retrieves a document""",
auth=authorize("approved_authorized_roles"),
)
def get_operation_doucment(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def get_operation_doucment(
def get_operation_document(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this function name could be renamed to a more accurate descriptor - e.g., get_operation_document_by_type

"/operations/{uuid:operation_id}/documents/{uuid:document_type}",
response={200: DocumentOut, custom_codes_4xx: Message},
tags=OPERATOR_TAGS,
description="""Retrieves a document""",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
description="""Retrieves a document""",
description="""Retrieves the most recent version of a document for the specified operation and document type.""",

"""
existing_document = cls.get_operation_document_by_type_if_authorized(user_guid, operation_id, document_type)
# if there is an existing document, check if the new one is different
existing_document = DocumentDataAccessService.get_operation_document_by_type(operation_id, type)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did get_operation_document_by_type_if_authorized get replaced with this? We're no longer checking whether the user is authorized to access the document

@@ -897,7 +904,7 @@ describe("the OperationInformationForm component", () => {
expect(screen.getByText(/Must not have fewer than 1 items/i)).toBeVisible();
});

it(
it.only(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
it.only(
it(

Comment on lines +90 to +91
- DocumentServiceV2.create_or_replace_operation_document
- DocumentDataAccessServiceV2.create_document
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- DocumentServiceV2.create_or_replace_operation_document
- DocumentDataAccessServiceV2.create_document
- DocumentService.create_or_replace_operation_document
- DocumentDataAccessService.create_document

Comment on lines +32 to +37
// this removes the file keys if no new file has been uploaded
if (
(key === "boundary_map" ||
key === "process_flow_diagram" ||
key === "new_entrant_application") &&
typeof rjsfFormData[key] === "string"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please help this idiot understand - where does the check for "no new file has been uploaded" happen?

<ul className="m-0 py-0 flex flex-col justify-start">
<li>
<Link
download={fileName}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CodeQL has raised an interesting point here. I'm looking into this more to see if it can actually be exploited.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants