You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sort order of ListObjects with unicode should be compatible with AWS and not rely to UTF8 sort order.
We can load the string into a buffer and use Buffer.compare instead - the concern is just the amount of work and GC it addes to the listing flow, so we should try to minimize this overhead.
Some third-party implementations of Amazon’s S3 protocol return object information (‘file listings’) in UTF-16 code-unit order rather than the Amazon-compatible Unicode code-point order.
Introduced in Moonwalk 2023.2, when configuring Moonwalk’s s3generic:// plugin (as well as certain other plugins that provide 3rd party S3 support such as s3cos://), a ‘UTF-16 listing order work-around’ option is provided in the Plugin Configuration panel to allow Moonwalk to correctly process results returned in this non-standard order and thereby allow correct and complete scanning of your S3 buckets.
How do you determine whether you need to enable this option?
The following experiment will test the sort order of your S3-compatible device.
Create a new folder on a Windows server with Moonwalk Agent installed
Add files with the EXACT names shown below - use cut & paste to get them right
file_ꦏ_1.txt
file__2.txt
file__3.txt
file_𐎣_4.txt
Don’t worry about the order that Windows shows the files in and don’t worry if some programs just show the characters between the underscores as a box or a question mark etc
Use an Ingest policy to upload this folder to a test bucket on your S3-compatible storage
Use a Gather Statistics policy to scan the location to which you just ingested the files
a. Tick ‘Export raw file metadata’
b. Untick the ‘Compress (gzip)’ option
c. Choose ‘CSV’ format
Check the exported CSV data (e.g. using notepad) to determine the order in which the files appear:
If the files appear in 1, 2, 3, 4 order: congratulations, your S3-compatible device uses the expected AWS ordering - you should NOT tick the workaround box
If the files appear in 1, 4, 2, 3 order: your device is using UTF-16 code-unit order - you WILL need to tick the ‘UTF-16 listing order work-around’ box
Note: this option does not change the order in which results are actually returned, it just ensures that Moonwalk processes them correctly.
More information - Screenshots / Logs / Other output
The text was updated successfully, but these errors were encountered:
Environment info
Actual behavior
Expected behavior
Steps to reproduce
More information - Screenshots / Logs / Other output
The text was updated successfully, but these errors were encountered: