memory usage #18

andrei4002 · 2014-01-08T06:01:26Z

I'm having issues fingerprinting a 2 hour long mp3. From what i can see, it fills up the memory (RAM) and then the script crashes. The filesize is around 140MB and I was testing in a virtual machine with ubuntu/5GB ram/3GB swap.
Any thoughts on this?

pguridi · 2014-01-08T13:44:00Z

Sounds like this is because now the files are converted to wav in memory. SHA: 7122e11

worldveil · 2014-01-09T17:55:58Z

What needs to happen is that Dejavu should convert and fingerprint piece by piece. The fingerprinting process for a single audio track is actually embarrassingly parallel, so there's no need to tax the RAM like we are doing now. You just have to be careful that you start fingerprinting each chunk of the audio such that it overlaps at least

DEFAULT_WINDOW_SIZE * DEFAULT_OVERLAP_RATIO

so that the windowing process doesn't miss fingerprints on window borders.

If this is something you care about, I would gladly accept a pull request fixing it.

Wessie · 2014-01-21T18:23:24Z

The piece by piece fingerprinting is slightly harder to do due to supporting multiple channels. The implementation for this would require to interleave fingerprinting per channel.

But from the tests I ran while trying to gain speedups there seems to be no notable difference between stereo fingerprinting and mono fingerprinting and their accuracy. Admittedly I've not tested this with microphone input.

As for the fingerprinting in chunks, what is the required size of each chunk?

worldveil · 2014-01-31T20:12:14Z

That's just the DEFAULT_WINDOW_SIZE * DEFAULT_OVERLAP_RATIO that I mentioned. This is the amount of overlapping required to keep the same fingerprinting behaviour.

I'd welcome PRs fixing this.

utilitarianexe · 2014-12-16T21:13:48Z

Sorry for the terrible formating but here is some code that will split up an audio file for you into peices that dejavu can handel. Would be much better to modify dejavu but this should hopefully help.

def write_audio_file(file_path,audio_array):
'''
Need to do this just for testing
want to be able to create some bizzarre test files for croma print
'''
pipe = sp.Popen([ FFMPEG_BIN,
'-y', # (optional) means overwrite the output file if it already exists.
'-r', "44100", # the input will have 44100 Hz
'-ac','2', # the input will have 2 channels (stereo)
"-f", 's16le', # means 16bit input
'-i', '-', # means that the input will arrive from the pipe
'-vn', # means "don't expect any video input"
'-acodec', "aac","-strict" ,"-2",#"ac3_fixed", # output audio codec
file_path],
stdin=sp.PIPE,stdout=sp.PIPE, stderr=sys.stdin)

pipe.stdin.write(audio_array)

def get_audio_array(pipe,minutes):
number_of_audio_frames = 88200_30_minutes
bytes_in_frame = 4
bytes_to_read = number_of_audio_frames*bytes_in_frame
raw_audio = pipe.stdout.read(bytes_to_read)

raw_audio_array = numpy.fromstring(raw_audio, dtype="int16")
if len(raw_audio_array) < 1:
    return None,None,'reached end of file'
audio_array = raw_audio_array.reshape((len(raw_audio_array)/2,2))
return audio_array,raw_audio,None

def chunk_file(file_path,chunk_folder_path,max_chunk_size,file_suffix = ''):
pipe = read_audio_data(file_path)
extention = '.m4a' #probably a bad choice but first I go to work
i = 0
while True:
i += 1
audio_data,raw_audio_data,error = get_audio_array(pipe,5)
if error is not None:
print error
return

    chunk_file_path = chunk_folder_path + '/chunk_'  + str(i)+ '_' + file_suffix + extention
    print 'write file ' + chunk_file_path
    write_audio_file(chunk_file_path,raw_audio_data)

utilitarianexe · 2014-12-16T21:14:48Z

Oh it is the chunck file function you need to call with with max_chun_size being an int number of minutes.

worldveil · 2014-12-16T22:13:43Z

Sorry, I'm really having trouble following your thoughts/code. Code sections will help, but I think an end-to-end example would be better.

Can you format this all in a branch or PR?

utilitarianexe · 2014-12-16T22:58:59Z

'''
This will audio files that are too long into sections.
There are probably better libs to do this. 
'''


FFMPEG_BIN = 'ffmpeg'
import subprocess as sp
import numpy
import sys

def read_audio_data(file_path,offset='00:00:00'):
    #define TRUE 0
    command = [ FFMPEG_BIN,
                '-ss',offset,
                '-i', file_path,
                '-f', 's16le',
                '-acodec', 'pcm_s16le',
                '-ar', '44100', # ouput will have 44100 Hz
                '-ac', '2', # stereo (set to '1' for mono)
                '-']
    pipe = sp.Popen(command, stdout=sp.PIPE, bufsize=10**8)
    return pipe

def write_audio_file(file_path,audio_array):
    '''
    Need to do this just for testing
    want to be able to create some bizzarre test files for croma print
    '''
    pipe = sp.Popen([ FFMPEG_BIN,
       '-y', # (optional) means overwrite the output file if it already exists.

       #"-acodec", "pcm_s16le", # means raw 16bit input
       '-r', "44100", # the input will have 44100 Hz
       '-ac','2', # the input will have 2 channels (stereo)
        "-f", 's16le', # means 16bit input
       '-i', '-', # means that the input will arrive from the pipe
       '-vn', # means "don't expect any video input"
       '-acodec', "aac","-strict" ,"-2",#"ac3_fixed", # output audio codec
        #'-acodec', "adpcm_sw",#"ac3_fixed", # output audio codec

       #'-b',"mp3", # output bitrate (=quality). Here, 3000kb/second
                      file_path],
                    stdin=sp.PIPE,stdout=sp.PIPE, stderr=sys.stdin)

    pipe.stdin.write(audio_array)

def get_audio_array(pipe,minutes):
    number_of_audio_frames = 88200*30*minutes 
    bytes_in_frame = 4
    bytes_to_read = number_of_audio_frames*bytes_in_frame
    raw_audio = pipe.stdout.read(bytes_to_read)

    raw_audio_array = numpy.fromstring(raw_audio, dtype="int16")
    if len(raw_audio_array) < 1:
        return None,None,'reached end of file'
    audio_array = raw_audio_array.reshape((len(raw_audio_array)/2,2))
    return audio_array,raw_audio,None


def chunk_file(file_path,chunk_folder_path,max_chunk_size,file_suffix = '',extention='.m4a'):
    pipe = read_audio_data(file_path)
    i = 0
    while True:
        i += 1
        audio_data,raw_audio_data,error = get_audio_array(pipe,5)
        if error is not None:
            print error
            return

        chunk_file_path = chunk_folder_path + '/chunk_'  + str(i)+ '_' + file_suffix + extention
        print 'write file ' + chunk_file_path
        write_audio_file(chunk_file_path,raw_audio_data)

def chunk_folder(folder_path,chunked_folder_path,max_chunk_size):
    '''
    so basically go through all files in folder
    turn each file into a bunch of files of length max_chunk_size(given in minutes)
    '''
    from os import listdir
    from os.path import isfile, join
    file_names = [ file_name for file_name in listdir(folder_path) if isfile(join(folder_path,file_name)) ]
    for file_name in file_names:
        file_path = folder_path + '/' +file_name
        chunk_file(file_path,chunked_folder_path,max_chunk_size,file_suffix=file_name)

utilitarianexe · 2014-12-16T23:00:38Z

basically call
chunk_folder(path_to_long_audio_files,path_to_output_chunked_files,minutes_for_each_chunk_like_5)

Now you can use dejavu on long files. But would be better if dejavu did this internally.

CommonLoon102 · 2014-12-30T02:52:51Z

The problem even worse when it tries to fingerprint more files parallel.
I've hardcoded 1 into the script to avoid out of memory:

pool = multiprocessing.Pool(1)

From the command line it isn't possible to pass the max parallel processes as a parameter and it will be defaulted to 4.

djv.fingerprint_directory(directory, ["." + extension], 4)

The program doing a very good job besides this! It just doesn't really can handle long audio files.

thesunlover · 2015-03-23T15:05:59Z

created a pull request that.
if you find some other improvement on the code go for it
#75

thesunlover · 2015-11-01T09:36:39Z

Hi, guys.

Can someone guide me how to calculate the starting offset of the chunks?
it's obvious that I missed this when I created the PRs,
pls, help

pimpmypixel · 2015-11-19T17:03:36Z

HI guys

Did you guys have any luck in getting this running in regards to low memory setups like Raspberry Pi's...?

arunganesan · 2015-11-19T17:32:12Z

I can run it on RPi 2 no problem. I didn't try timing it but it seems to run pretty fast.

pimpmypixel · 2015-11-19T17:41:49Z

@arunganesan what commit are you using?

arunganesan · 2015-11-19T17:47:56Z

Hm, Im not sure. I just cloned the latest commit from like a few days ago. I am only fingerprinting short sounds. At max 30 seconds.

thesunlover · 2015-11-19T18:45:16Z

You can check my repository
https://github.com/IskrenStanislavov/dejavu/tree/split-fingerprinting
it is tested to work with 4 hours audios
the only missing thing is the offset_seconds.detection from the original audio

thesunlover · 2015-12-09T08:15:52Z

@worldveil
Hello, Will.
Is this the direction I should follow to calculate the proper offset_seconds?
3 min * 60 sec * 44100 samples per sec * 2 channels = 15,876,000 samples

thesunlover · 2016-01-04T19:14:51Z

I would like to complete this PullRequest, but I definetely need your help...
Two things.

I want to know how to properly calculate the offset in value manner, not the matrix version. Cause I don't have the experience with the tools used in fingerprinting.
By splitting into files do I need to set up the time limits of files as described in:
memory usage #18 (comment)
and how to calculate in seconds.
Help would be highly appreciated.

shemul · 2016-05-04T19:42:53Z

Does dejavu can fingerprint 2 hours long mp3 correctly and search yet ? what did i miss ?

sheffieldnikki · 2016-08-03T20:23:30Z

Any news on fixing this by merging split-fingerprinting? dejavu is almost unusable on low memory machines - even the example mp3 files give out of memory errors when trying to fingerprint on a 512MB machine :( (and relying on swap is a disaster on this machine - its only storage is a memory card). Thanks

Edit: got something working by fingerprinting my songs on a machine with lots more memory, and then simply copying the MySQL innodb database files over to the small machine. Seems to be running the recognition fine :)

ajahongir · 2016-10-21T15:00:08Z

I had this issue in 2 ubuntu droplets. both had 512mb memory. then I extend one of them to 2gb and problem seams had disappeared!

thesunlover mentioned this issue Aug 5, 2015

Split fingerprinting #87

Open

thesunlover mentioned this issue Jan 4, 2016

pydub operating direct to disk vs in memory? jiaaro/pydub#51

Closed

thesunlover mentioned this issue Feb 1, 2016

Memory Error #95

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

memory usage #18

memory usage #18

andrei4002 commented Jan 8, 2014

pguridi commented Jan 8, 2014

worldveil commented Jan 9, 2014

Wessie commented Jan 21, 2014

worldveil commented Jan 31, 2014

utilitarianexe commented Dec 16, 2014

utilitarianexe commented Dec 16, 2014

worldveil commented Dec 16, 2014

utilitarianexe commented Dec 16, 2014

utilitarianexe commented Dec 16, 2014

CommonLoon102 commented Dec 30, 2014

thesunlover commented Mar 23, 2015

thesunlover commented Nov 1, 2015

pimpmypixel commented Nov 19, 2015

arunganesan commented Nov 19, 2015

pimpmypixel commented Nov 19, 2015

arunganesan commented Nov 19, 2015

thesunlover commented Nov 19, 2015

thesunlover commented Dec 9, 2015

thesunlover commented Jan 4, 2016

shemul commented May 4, 2016

sheffieldnikki commented Aug 3, 2016 •

edited

Loading

ajahongir commented Oct 21, 2016

memory usage #18

memory usage #18

Comments

andrei4002 commented Jan 8, 2014

pguridi commented Jan 8, 2014

worldveil commented Jan 9, 2014

Wessie commented Jan 21, 2014

worldveil commented Jan 31, 2014

utilitarianexe commented Dec 16, 2014

utilitarianexe commented Dec 16, 2014

worldveil commented Dec 16, 2014

utilitarianexe commented Dec 16, 2014

utilitarianexe commented Dec 16, 2014

CommonLoon102 commented Dec 30, 2014

thesunlover commented Mar 23, 2015

thesunlover commented Nov 1, 2015

pimpmypixel commented Nov 19, 2015

arunganesan commented Nov 19, 2015

pimpmypixel commented Nov 19, 2015

arunganesan commented Nov 19, 2015

thesunlover commented Nov 19, 2015

thesunlover commented Dec 9, 2015

thesunlover commented Jan 4, 2016

shemul commented May 4, 2016

sheffieldnikki commented Aug 3, 2016 • edited Loading

ajahongir commented Oct 21, 2016

sheffieldnikki commented Aug 3, 2016 •

edited

Loading