Skip to content

Links diff and HTML diff should ignore Java Servlet Session IDs #18

Open
@Mr0grog

Description

@Mr0grog

Java Servlets can keep your session ID in the URL instead of in cookies by tacking ;jsessionid=XYZ onto the end of all the link URLs in a page. See NOAA NCEI’s Historical Observing Metadata Repository for an example: https://www.ncdc.noaa.gov/homr/

(Note: the behavior only occurs on that site for fresh sessions, so try with a private/incognito browser window.)

https://www.ncdc.noaa.gov/homr/api;jsessionid=A2DECB66D2648BFED11FC721FC3043A1

Since most captures of a page using this will necessarily have different sessions, we should ignore this part of link/subresource URLs when diffing. (This should be adjustable via an argument, but I think ignoring it is the right default.) Ideally, the full URL would still appear in the output; it just wouldn’t be highlighted by the differ.

I’m pretty sure there are other (mostly older) systems that do something similar, and we should treat them the same as we discover them.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions