Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split big files #75

Closed
wants to merge 15 commits into from
Closed

Split big files #75

wants to merge 15 commits into from

Conversation

thesunlover
Copy link
Contributor

added support for huge files
check the long_test.py and play with minutes & processes number to fit in the RAM available

Edit:
Procedure of the process:

  1. Splits the large audio file to fingerprintable smaller pieces
  2. Fingerprints them one by one.
  3. Saves them as a single audio in the DataBase.

@thesunlover thesunlover mentioned this pull request Mar 23, 2015
@worldveil
Copy link
Owner

So to be clear, this splits up large files over a certain length (ie, 3 minutes), and fingerprints each as a separate "song", yes?

@thesunlover
Copy link
Contributor Author

updated the description of the PR

@@ -16,6 +32,9 @@ class Dejavu(object):
OFFSET = 'offset'
OFFSET_SECS = 'offset_seconds'

SPLIT_DIR = "split_dir"
OVERWRITE_WHEN_SPLITING = 1
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overwrites what?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its about overwriting the temp files that were split with ffmpeg.
probably wouldnt be needed if we delete the temporary directory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably should have coded it as "always overwrite" with no conditions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed the constant to OVERWRITE_TEMP_FILES_WHEN_SPLITING

@worldveil
Copy link
Owner

Also the binary files (large mp3s) are too large and I'd rather not increase the size of the repo with those.

Can you instead give a public link (could even download with urllib2 in example.py) to some copyright-free music?

@thesunlover
Copy link
Contributor Author

ok, will do it later in the evening
Edit:
got rid of the mp3 files and created a modification in the test-file so it generates a new file with the content of the existing mp3 files.
modified the test so that it uses the generated file (58minutes file is generated)

@thesunlover
Copy link
Contributor Author

removed the non-copyright files and added auto-creation of the long file with "ffmpeg concat"
added treeremove of the completed long file
please review once again

@thesunlover
Copy link
Contributor Author

do we need to test with other formats like ogg & flv?

@worldveil
Copy link
Owner

If pydub handles them, I think it should be fine.

Will review this sometime this weekend I hope.

OVERWRITE_TEMP_FILES_WHEN_SPLITING
song_name_for_the_split
@thesunlover
Copy link
Contributor Author

I have not thought about using 1 minute limit and splitting the existing files.

If I set the maximum audio length for straight fingerprinting to be 1 minute and use all the available CPU cores it might be faster to fingerprint even short songs. - full CPU usage and less amount of memory usage at the same time.

What do you say, should i test that?

@thesunlover
Copy link
Contributor Author

in the last 2 commits i moved a few arguments to be properties of Dejavu.
Should I move them in the config file?

@worldveil
Copy link
Owner

will review this week - apologies for the delay!

@worldveil
Copy link
Owner

OK, I'm pulling this for review right now.

First things first, we need to remove the large binary files from the git history. You removed them from current version, but they are still in history (.git folder is 89MB).

Second, yes the arguments for fingerprinting and splitting large files should be in the config. And we should document those options in README.md as well.

Many thanks for your patience in getting back to you!

@thesunlover
Copy link
Contributor Author

I think it would be easier for me to
start a new branch,
copy the changes in there
close this pull request
and reopen it with the new branch

is that ok?

@worldveil
Copy link
Owner

sounds fine to me.

@worldveil
Copy link
Owner

were you able to create a new PR with this? would gladly merge.

@worldveil worldveil closed this Aug 3, 2015
@worldveil worldveil reopened this Aug 3, 2015
@thesunlover
Copy link
Contributor Author

Hello, worldveil
I hope I have enough time in the evening to do it at home

@thesunlover
Copy link
Contributor Author

reposted the PR in here:
#87

@thesunlover
Copy link
Contributor Author

@worldveil would you please review the new PR #87 and give advise how to properly calculate the offset_seconds for the parts that follow the first one ?

@NathanielCustom
Copy link

@thesunlover I left a comment in PR #87 on how I got offset_seconds working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants