Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Merge Rows (diff) with non alpha characters always generates a new value in the flag field. #4860

Open
rbrenchley opened this issue Feb 3, 2025 · 3 comments

Comments

@rbrenchley
Copy link

Apache Hop version?

2.10,1

Java version?

openjdk version "21.0.4" 2024-07-16 LTS

Operating system

Windows

What happened?

When using the Merge Rows (diff), if there are any non alpha characters in the strings being compared the merge always assigns a new value flag field.
Ex. Keys to match Field = R&D in both reference and compare fields will always generate a new flag , thus creating duplicate records with R&D even though the strings are exactly the same.

Issue Priority

Priority: 2

Issue Component

Component: Actions

@hansva
Copy link
Contributor

hansva commented Feb 4, 2025

I did a quick test and everything seems to be in working order.
You will have to double check if the sorting is correct and if there are really no differences on the rows (spaces or hidden spaces)

Image

Image

@dave-csc
Copy link
Contributor

Queuing here... are you comparing a stream sorted by Hop with a one sorted by a database?

What I experienced during development is that sorting with different methods doesn't give the same exact order, even if the sorting criteria are the same (in my case I was comparing IP masks such as 192.168.0.0/24: the results in Merge rows stated some "new" rows and some "deleted" rows with the same value)

@rbrenchley
Copy link
Author

Using the database sorting.
Mine was pretty simple as I was using this to populate a dimension table. I simply sorted the source landing table and the current dimension table (all from PostgreSQL using the order by statement) to check to see if I had a new value, then I would do a direct insert (with no modifications to the source value).
On subsequent calls the merge diff found any string with a non alpha character to be a new field even though the fields were identical and sorted.
I'm not sure what was causing the issue, as I examined the strings in the database tables and they were identical.
The only thing that worked for me is to remove all non-alpha characters using a regular expression before the comparison and it worked fine.

I'm very new to Apache Hop, how is the merge diff doing its compare?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants