Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated files to Python3000, Windows support, and interactive #1

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

sesas
Copy link

@sesas sesas commented Mar 28, 2012

made it work with Python3000, it now has interactive mode (see readme file) and supports windows (earlier a windows user could not use the code cause the DOS doesn't have pipes.

@sesas
Copy link
Author

sesas commented Mar 28, 2012

I should add that I didn't test it on Linux. So you might want to do that before merging.

also I don't know what's the performance hit of using the os.walk() rather than using $ find ... | ./duplicatefinder.py

@kassoulet
Copy link
Owner

Awesome. I need to check this, but I have no time now.

@kassoulet
Copy link
Owner

This is very nice.
I've borrowed the python3 fixes.
For the rest, I'm not sure. Deleting files is really dangerous and should be well thought.
Also, using find and a pipe was so much simpler, because now you will want to add a whitelist/blacklist matcher, and a size threshold, and... ..It has no end :)

And note that this program was more or less a protoype for a GUI available here:
http://kassoulet.blogspot.fr/2010/09/jankis-duplicate-finder.html

I was tempted to do a tkinter version so I can use it on windows and osx, but an interactive mode is a great idea.

@kassoulet
Copy link
Owner

And if you want to go further, here are some ideas I've no time to implement yet:

  1. In addition of checking the first KB of files, check also the last one. I'm pretty sure this allows to remove the need to read the whole files to detect partial matches.
  2. Store the files information (name, size, checksums) in a file, and use it in interactive mode. With this, the user can change dynamically the minimum size, or the minimum number of matches, without re-scanning.
  3. Add size/patterns options to the walker.

@sesas
Copy link
Author

sesas commented Apr 2, 2012

Thank you for the advice and feedback.

I do agree that on Linux using find is the best (and faster) option, but does 'find' work on Windows?

the main reason why I wrote the walker and the deleting function is because my housemate needs to get rid of duplicate files. She has windows and doesn't want to spend a lot of time on it, nor pay money to solve such problem.

I would like to add that for now, there is no way to delete files but in interactive mode.

The idea to read only the first KB and the last one is pretty great. But I don't know if I will have time to implement that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants