Skip to content

FIREFLY-1693: ObsCore improvements #1738

Merged
kpuriIpac merged 3 commits intodevfrom
FIREFLY-1693-obscore-improvements
Apr 16, 2025
Merged

FIREFLY-1693: ObsCore improvements #1738
kpuriIpac merged 3 commits intodevfrom
FIREFLY-1693-obscore-improvements

Conversation

@kpuriIpac
Copy link
Copy Markdown
Contributor

@kpuriIpac kpuriIpac commented Apr 4, 2025

Tickets: Firefly-1693 and Firefly-1692
IFE PR: https://github.com/IPAC-SW/irsa-ife/pull/400

  • Cleaning up the logic in ObsCorePackager to remove all custom file naming logic, as this will be handled by both the Download Script and the Zipped Packager, by first looking at Content-Disposition header, and then at the URL path as a fallback
    • this keeps our code generic, and lets the service set the name.
    • @robyww suggested maybe keeping some of the file naming convention, @loitly is making a strong case for not doing that with which I agree. We can discuss this further if needed. For now, I am only setting file name (from the file column, if available) for cutouts. But we can move away from doing this as well.
  • merged the PrepareDownload from EuclidUtils into ttFeatureWatchers
  • apps have the option of choosing via props to PrepareDownload whether they want to display File Structure (flattened vs folder)
    • I have kept the default as flattened. Currently Euclid is the only one giving users this option.
  • When users select folder, I have used "row_" + row_index as the folder name. Open to other suggestions instead of this, but not sure if looking at any of the columns again for this makes sense, in the spirit of keeping it generic?

Testing:

@kpuriIpac kpuriIpac added this to the 2025.3 milestone Apr 4, 2025
@kpuriIpac kpuriIpac requested review from loitly and robyww April 4, 2025 17:17
@kpuriIpac kpuriIpac self-assigned this Apr 4, 2025
curFileInfoIdx++;
try {
zippedBytes += zipHandler.addZipEntry(zout, fi);
zippedBytes += zipHandler.addZipEntry(zout, fi, curFileInfoIdx);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove this argument, was testing something, not needed anymore.

Copy link
Copy Markdown
Contributor

@loitly loitly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make the suggested code cleanup before merging.

Comment on lines +180 to +184
String suggestedFilename = URLDownload.getSugestedFileName(uc);

//fallback to filename from URL path if content-disposition header is missing
if (isEmpty(suggestedFilename)) {
String urlPath = url.getPath();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new block of code is confusing, likely due to hasFileNameResolver, which is no longer used. It was removed, but the cleanup was incomplete. With that in mind, the updated code should now look like this:

String extName = ofNullable(fi.getExternalName()).orElse("").trim();
if (extName.isEmpty() || extName.endsWith("/")) {
    String suggestedFilename = URLDownload.getSugestedFileName(uc);
    suggestedFilename = isEmpty(suggestedFilename) ? getFileNameFromUrl(filename) : suggestedFilename;
    fi.setExternalName(extName + suggestedFilename);
}

Move the logic for getFileNameFromUrl() into a separate method.

Comment on lines +188 to +189
//strip surrounding quotes, if any
suggestedFilename = suggestedFilename.replaceAll("^\"|\"$", "");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These logic should only apply when getFileNameFromUrl. But, it's not enough. I would also sanitize it to remove illegal characters in file path/name.
A quick and dirty way is to do this:

suggestedFilename.replaceAll("[^a-zA-Z0-9._-]", "_");        // all chars not in this set will be converted to '_'

Comment on lines +142 to +143
boolean isDatalink = (access_format != null && access_format.contains(DATALINK)) ||
(access_format == null && datalinkServDesc != null);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This simplifies to : datalinkServDesc != null || access_format != null && access_format.contains(DATALINK).
Is that the logic you intended?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I suppose that would work as well. The second condition there datalinkServDesc != null is essentially checking for non-obscore tables containing datalink (like mer-catalog in Euclid, for instance). So I'll make the change to simplify this.

@kpuriIpac kpuriIpac force-pushed the FIREFLY-1693-obscore-improvements branch from 457fa02 to 7d8d611 Compare April 4, 2025 21:22
if (extName.isEmpty() || extName.endsWith("/")) {
String suggestedFilename = URLDownload.getSugestedFileName(uc);
suggestedFilename = isEmpty(suggestedFilename) ? getFileNameFromUrl(url) : suggestedFilename;
suggestedFilename = suggestedFilename.replaceAll("[^a-zA-Z0-9._-]", "_");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this line into getFileNameFromUrl().

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, addressed the other changes as well. Rebuilding the test builds to make sure no new bugs were introduced.

Copy link
Copy Markdown
Contributor

@robyww robyww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing CADC

  • I tried to do cutout only and I did not get anything even though I could do a cutout.
  • When I ask for the whole file I got filename surrounded by quotes.
    Screenshot 2025-04-08 at 10 24 22 AM

Testing DCE

  • Mostly looks good
  • The first row is not package.
    • the first row is not packaged
    • if I choose the first three rows I just get two products
    • If I choose the second and third rows I get the same two products.

@kpuriIpac kpuriIpac force-pushed the FIREFLY-1693-obscore-improvements branch from 7156aed to 70ef4e1 Compare April 9, 2025 00:47
@kpuriIpac
Copy link
Copy Markdown
Contributor Author

Testing CADC

  • I tried to do cutout only and I did not get anything even though I could do a cutout.
  • When I ask for the whole file I got filename surrounded by quotes.
    Screenshot 2025-04-08 at 10 24 22 AM

Testing DCE

  • Mostly looks good

  • The first row is not package.

    • the first row is not packaged
    • if I choose the first three rows I just get two products
    • If I choose the second and third rows I get the same two products.

Thanks for pointing this out. There was a bug in the cutout service descriptor url generation. Fixed now, along with the quotes. CADC should work as expected now.

For the DCE bug, can you tell me what search you did where you did not see the first row's downloads show up? I am not able to recreate it, it works for me.

@robyww
Copy link
Copy Markdown
Contributor

robyww commented Apr 9, 2025

For the DCE bug, can you tell me what search you did where you did not see the first row's downloads show up? I am not able to recreate it, it works for me.

  1. goto DCE and choose DIGIT
    • 20h23m46.89s +42d12m43.0s Equ J2000
    • 200 arcsec
  2. select first 3 rows and download - 2 results in zip
  3. select second and third row and download - 2 results in zip (same as above)
  4. select only first row and download - no results

Copy link
Copy Markdown
Contributor

@robyww robyww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am letting @loitly take the lead of the Java side review. So this is the JavaScript side.

I am also testing. After your next round of fixes we need to involve @lrebull as well.

const tbl_id = getActiveTableId();
const tblTitle = getTblById(tbl_id)?.title ?? 'unknown';
export const PrepareDownload = React.memo(({table_id, tbl_title, viewerId, showFileStructure=false,
downloadType='package', dataSource, fileName}) => {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dialog should show a working message until the fetchSemanticList completes. In Euclid on the first download the list of semantics only shows all.

useEffect(() => {
        fetchSemanticList(tbl_id).then(setSemList);
    }, [tbl_id]);

it probably should be more like this.

useEffect(() => {
        fetchSemanticList(tbl_id).then( (result) => {
                     setSemList(result);
                    setWorking(false);
             });
       setWorking(true);
    }, [tbl_id]);

Then in the UI show a working indicator.

Copy link
Copy Markdown
Contributor Author

@kpuriIpac kpuriIpac Apr 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robyww I did this, but while testing it, DownloadDialog does not update probably because it internally calls showDownloadDialog, which evaluates the JSX only once. so when working is true and let's say I have some text like "Loading....", then when semList is updated and setWorking is set to false, the dialog still does not update.

I tried a few things to make it work, but it was getting quite messy.

So I ended up with the logic i have in the latest commit, DownloadDialog by itself is only called when the call to fetchSemanticList is resolved. So theoretically the user won't see the Generate Download Script button until this is ready (there is some lag though, so I'm not sure if that's good UX)

But it ensures DownloadDialog is populated with the semList when it's available. Let me know what you think.

@robyww
Copy link
Copy Markdown
Contributor

robyww commented Apr 9, 2025

Testing Euclid

If I click too fast I get the following. The details are in the review.
Screenshot 2025-04-09 at 9 54 56 AM

@robyww
Copy link
Copy Markdown
Contributor

robyww commented Apr 9, 2025

This is not correct. Is it fixed in @loitly's PR?

Screenshot 2025-04-09 at 9 58 38 AM

It is not multipart and the options should be immediate.

@robyww
Copy link
Copy Markdown
Contributor

robyww commented Apr 9, 2025

SPHEREx Testing

This script is generating lines with folders like row_1

download_file 'https://irsatest.ipac.caltech.edu/ibe/data/spherex/qr/level2/2025W35_1A/level2b/5/level2_2025W35_1A_0086_3D5_simu_level2b.fits' 'row_1' ''

The folder can't named liked this. You should us the naming algorithm we have.

when I run the whole script which came from the first row of a results.

download_file 'https://irsatest.ipac.caltech.edu/ibe/data/spherex/qr/level2/2025W35_1A/level2b/5/level2_2025W35_1A_0086_3D5_simu_level2b.fits' 'row_1' ''
download_file 'https://irsatest.ipac.caltech.edu/ibe/cutout?ra=261.57131241460627&dec=-8.364730535295832&size=0.01&path=spherex/qr/level2/2025W35_1A/level2b/5/level2_2025W35_1A_0086_3D5_simu_level2b.fits' 'row_1' ''

the script gives me this warning for the second file.

Warning: File exists
curl: (23) Failure writing output to destination, passed 88 returned 4294967295
>> ERROR: failed to download https://irsatest.ipac.caltech.edu/ibe/cutout?ra=261.57131241460627&dec=-8.364730535295832&size=0.01&path=spherex/qr/level2/2025W35_1A/level2b/5/level2_2025W35_1A_0086_3D5_simu_level2b.fits

it generated only file. I think it is trying to overwrite the name. If you look at the url this makes sense. We have to come up for a solution to this.

@kpuriIpac kpuriIpac force-pushed the FIREFLY-1693-obscore-improvements branch from 70ef4e1 to 6fcd7a0 Compare April 11, 2025 01:26
return (
<>
{loading && <div />}
{loading && <ToolbarButton enabled={false} variant={isRowSelected?'solid':'soft'} color='warning' text={downloadType === 'script' ? 'Generate Download Script' : 'Prepare Download'}/>}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

came up with a solution to show a dummy (disabled) button while the semList is loading so as to not have any weird UX movement of a button randomly appearing for the user @robyww.

@kpuriIpac kpuriIpac force-pushed the FIREFLY-1693-obscore-improvements branch 3 times, most recently from 752b6f4 to b049f1f Compare April 15, 2025 03:46
@kpuriIpac
Copy link
Copy Markdown
Contributor Author

@robyww @loitly I addressed all of the feedback.

  • brought back file names if a service requests it, and also using the file name logic for folder names
  • waiting for fetchSemanticList to resolve before the Prepare Download / Generate Download Script button is made clickable
  • did some more refactoring in ObsCorePackager to try and make the code more readable (by using separation of concern where possbile)
  • the bug with DCE has a new ticket now assigned to @loitly: https://jira.ipac.caltech.edu/browse/FIREFLY-1711

Apps to test downloads on:

Note: I have temporarily kept generateDownloadFileName = true for firefly, so if you test CADC, you can notice the naming convention being used for folder and file names is written by us in ObsCorePackager. For all other apps, we're using whatever name the service gives us (or from the URL as a fallback).

@kpuriIpac kpuriIpac requested a review from lrebull April 15, 2025 22:03
@lrebull
Copy link
Copy Markdown
Contributor

lrebull commented Apr 16, 2025

Looking at the description at the top but not understanding enough of the code to do more detailed following of all the comments, AND empirically playing around with the builds you've linked, here are my $0.02:

  • astronomers are going to want the flattened/folders choice in several archives
  • the original filenames of the downloaded data often have important meaning(s) and should be preserved
  • for bundled (zip files?) downloads, filenaming is less of a big deal, but I agree it should look like it came from the corresponding archive (like a euclid bundle should have euclid in the filename by default).
  • for scripts, the default name should look like a script. When I get a bundled file from ops DCE, it's, say, "LGAImages-1.zip" so we know it's a zip file without looking too closely. the extension doesn't appear in the pop-up, so I don't know before it finishes whether i'm going to get a zip file (blahblah.zip) or a script (blahblah.csh). (I note that Euclid and SPHEREx are successfully telling me this clearly.)
  • the second request from the same tile should be blahblah-2 -- at the moment, they're all blahblah-1
  • i am confused. i'm trying the DCE build you linked above. at first, i thought from the description that it was all DCE searches, but a regular dce download is giving me the same behavior as on ops (zip files of real data), so not even Loi's updated background monitor, much less any wget/curl scripts. then i read more closely, and thought maybe this is just obscore searches? but that is also giving me zip files (zip files of real data), not download scripts. So i'm super confused. (Euclid and SPHEREx are both giving me a script, but just like on ops and not using Loi's new stuff.)
  • am i supposed to be testing the script per se, or just the GUI interface to all of this?

@kpuriIpac
Copy link
Copy Markdown
Contributor Author

@lrebull I'll address some of your concerns:

  • Currently only Euclid & Spherex offer Download Scripts, DCE / CADC etc. are all regular packaged downloads (zip files of real data). We could offer users an option when they're downloading for one versus the other (and this is the plan for a future sprint).
    • Loi's PR on the background monitor work is still open, so we won't see those changes here until that PR is merged. But once my PR and his are merged, we should be able to see the new background monitor UI.
  • We could offer flattened/folders choice in other archives as well, if it's helpful.
  • For zip files, we could possibly say in the UI that the file name will be "LGAImages-1.zip" (including the zip) if that would be helpful.
  • For this PR, testing the GUI is more important than testing the script. I'll have a new ticket right after this PR that will work on some improvement & bug fixes for the script.

@robyww feel free to add to this.

url= dlDescriptor?.accessURL;
url= makeDlUrl(dlDescriptor,table, row);
catch (err) {
console.error('fetchSemanticList call failed');
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good. I think the catch needs to be here.

If you want to log the error you should include err as well. I will help to know where the problem happened. IT is possible that getServiceDescriptors() or getObsCoreAccessURL should be fixed.

@kpuriIpac kpuriIpac force-pushed the FIREFLY-1693-obscore-improvements branch from 07a60c4 to 5d793ae Compare April 16, 2025 23:06
@kpuriIpac kpuriIpac force-pushed the FIREFLY-1693-obscore-improvements branch from 5d793ae to aa36148 Compare April 16, 2025 23:16
@kpuriIpac kpuriIpac merged commit 6b0e4da into dev Apr 16, 2025
@kpuriIpac kpuriIpac deleted the FIREFLY-1693-obscore-improvements branch April 16, 2025 23:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants