MANUAL

Updated: 18/12/2016
  
 INPUT/OUTPUT OPERATIONS:

 * Parse/dump:

	Read-only mode (operators:  -d, -n, -i, -l, -o, -w, -a, -k, -h, -q)
	
	This mode loads selected binary log into process memory pages or shared mem segment 
	(--shmem) and displays the data in either normal (synopsis), --batch (full, tab delimited) 
	or --raw (binary) formats.	
	
	--nobuffer turns memory buffering off and directly reads record by record using C native 
	IO procedures (fread).
	
	Using --shmem, shared memory pages are used instead of locally allocated memory - loading 
	there allows to	keep log data cached in memory, where other instances can access it thus 
	avoiding having to allocate theirown pages and filing them with the actual log data from 
	the filesystem. 
		
	When the segment that belongs to a log file is non-existant, it's automatically created 
	and filled with data. If glutil detects log data file size is different from segment size 
	or --shmdestroy is present, it automatically destroys the segment, forcing glutil to 
	reallocate and reload it immediatelly after. 
	
	If data log size matches segment size and the segment exists, glutil will proceed using 
	the segment and it's data for all operations (segment data is NOT updated).
	
	The user must ensure shared segment be updated promply, whenever log data on disk changes.
	Even though glutil automatically updates the segment when log/segment sizes do not match, 
	it doesn't detect content changes; 
	
		-q <logname> --shmem --shmdestroy --loadq -vv
		
		Load the data into segment (-q, --shmem), force segment be destroyed and re-allocated 
		(--shmdestroy), quit right after loading, don't do anything else (--loadq)
		
		-q <logname> --shmem --shmreload --loadq -vv
	
	All match/hook operators apply.	
	
	
	Specific options:
	
	-print <format>
			Print <format> on stdout. Directives must be enclosed in {}. Field names that 
			can be referenced are listed above. Only fields respective to the log being 
			processed are valid, with the exception of generic fields, which are available 
			globally.
			
			For example, assume we are processing the nukelog:						
				-n lom "status=0" -print '{basedir} was NUKED {mult}x by {nuker} on {?tl:time#%d %h at %H:%M}'
				
			This returns only nuked releases (status=0):				
				Some.Folder-TAG was NUKED 3x by someuser on 30 Jun at 11:07 
				
			'time' field (unix timestamp) is formatted using strftime (man 3 strftime)
			
			
			See 'EXTENDED DIRECTIVES' section below.
			
	
 * Write:

	Write mode (operator: -z <logname> [--infile=/path/file])
	 
	Used to write one or more data log (specified by <logname>) records from a regular 
	ASCII text file (see README	for examples).
	
	If --infile=<filename> is not set, stdin is expected to hold the input data.
	
	All match/hook operators apply.
	
	
 * Rebuild:

	Rebuild mode (operator: -e <logname>)
	
	Data is loaded into memory, passed via filters/hooks, then written back to log file.
	
	This is basically just for convinience, the same can be accomplished using --raw in 
	parse mode and redirecting stdout back to log file;
	
	For example:
	
		-d -l: dir -regexi "Somefolder" -or -lom "size == 0 || files == 0" --raw > /glftpd/ftp-data/logs/dirlog
		
	would produce the same result as:
	
		-e dirlog -l: dir -regexi "Somefolder" -or -lom "size == 0 || files == 0"
		
	These examples would filter any dirlog record including 'Smallville' in directory 
	name (case insentitive) and any directories with size/files zero, writing results 
	back to data log. Using 'and' instead of 'or' between the match operators, would 
	filter only stuff with both Smallvile in directory name and of size 0.
	
	There are however some advantages to using parsing operators with --raw, which are 
	not available during rebuild mode;
	
	When in rebuild mode (-e), data must be buffered (either locally or in shared mem 
	segment) in order to be processed, so data file size is limited with your RAM capacity 
	(and with --memlimit argument or DB_MAX_SIZE macro in source, 1024MB by default).
	However, while parsing (-d, -n, ..), --nobuffer option can be used, turning buffering 
	off and reading record by record using fget. 
	Since glutil is set by default to compile with large file offsets, theoretically a 
	massive file could be parsed (64-bit file size, not taking filesystem limits into account).
	
	All match/hook options apply.
		
	
 EXTENDED DIRECTIVES:
			
	Legend:
	
	<name>       -  directive name
	<field>      -	only a reference to the actual data, strictly
					fields which are listed in DATA_STRUCTURES 
	<directive>  -	actual data field or an extended directive
	<expression> -	logical/relational or mathematical expression
	<pattern>    - 	string matching pattern (e.g. REGEX)
	<format>     -  Format string (printf(3) or strftime(3)), can
	                also be this tool's internal format
	
	Syntax:
	
		{?<name>:[<..>]<field|(directive)>[<..>]}
	
		or
			
		{(?<name>:[<..>]<field|(<directive>[#<format>])>[<..>][#<format>])}
		
		
	Options:
	
	
	?c	 <num repeat>:<char>				
			Character generator. Example usage: 
			
				'{?c:3:\t}' - prints three tab's.
			
			Escapes are:
			
				\n  New line
				\t  Horizontal tab
				\r  Carriage return
				\s  Space
				\\  Literal backslash ('\')
			
	?t	 <field|(<expression>)>#<format>
			Formats data  returned by  <field>  (if using parenthesis, 
			a math <expression> is expected) as a readable string (GMT) 
			using  according to the <format> specification. 
			See 'strftime(3)' for conversion	specifications (passed 
			after #). If no conversion specifier characters are given, 
			format defaults to: '%d/%h/%Y %H:%M:%S'
			
	?tl  <field|(<expression>)>#<format>
			Same as '?t', only the timestamp is converted relative to 
			the system's timezone.
			
	?l	 <directive>
			Prints the length of the field data based output string. Applies to 
			integer/float fields too, measuring their length in string form.
			
	?m	 <expression>
			Simplistic arithmetic and bitwise operations, compatible with all 
			integer fields. Floating point data types can only be added, subtracted,
			multiplied and divided.
			
			Arithmetic operators:					
				+   Add
				-   Subtract
				*   Multiply
				/   Divide
				%   Modulo (integer remainder)
				^   Power
				~   Square root
				$   Euclidean distance
				
			Bitwise operators:					
				&   AND
				|   OR				
				<<  Bitwise left shift
				>>  Bitwise right shift
			
			Example:
			
				-d -print '{?m:size/(1024^2)}MB in {files} files, {?m:(size/files)/(1024*1024)}MB avg per file {?c:2:\t} {basedir}' -lom 'files>0' --sort size
				
			Output:
				...
				356MB in 28 files, 12MB avg per file 		 Some.Folder-TAG
				...				
				
	?rs  [/[<flags>]]:<directive>:<pattern>:<string|(directive)>
			Before outputing/filtering the string returned by <directive>, substitute
			part(s) matching regex <pattern> with <string|(directive)>. 
			Works on all types.
			
			<flags> are:
				i	Case insensitive
				g 	Match every instance
				n   Match-any-character operators don't match a newline
				b   Use POSIX Basic Regular Expression syntax
						
						
			Replace float dots in 'score' field with commas before 
			printing to stdout:
			
				-a -lom "score>7.0" -print '{?rs:score:[.]:\,} {?c:1:\t}{title}'
										
			Output:				
				7,7		Some Movie
				...
				
			
	?rd  [/[<flags>]]:<directive>:<pattern>
			Delete matching regex <pattern> from string returned by <directive>.
						
			Remove any (..) tags that may be present in tvrage shownames, before
			doing a regex query in tvlog for 'Some Show':
			
				-h -print '{name}' -l: '?rd/g:name:\([ ]+|\(\)\)\\(.*\\)\([ ]+|\(\)\)' -regexi '^Some Show$'
				
			'Some Show (2004)' is matched as 'Some Show', without having to
			wipe (2004) out of the database.
			
	?b   <directive>
			Strip directory and suffix from filename stored in <directive>
			
	?d   <directive>
			Strip last component from file name <directive>
			
	?p   <text>
			Print <text>
			
	?P   <directive>#<format>
			Print result of <directive> formatted per <format> (printf(3))
			
	?L   <expression>:<directive1>:<directive2>
			Evaluates <expression> (LOM syntax) and executes <directive1> if 
			<expression> evaluates true and <directive2> if false.
			
	?X   <directive (FILESYSTEM)>:<directive>
			Resolves <directive> (current data input) and executes <directive (FILESYSTEM)>, 
			using the result of <directive> as the path.			
			
			One instance where this could be usefull is when checking for dead links,
			for example (link needs to be absolute to be valid):
			
				-x / -R -lom "mode=10" -print "Valid: [ {(?X:(isread):(rlink))} ] {path} -> {rlink}"
				
			Output:
			
				...
				Readable: [ 0 ] /usr/bin/pvresize -> lvm
				Readable: [ 0 ] /usr/bin/iptables -> xtables-multi
				Readable: [ 1 ] /usr/bin/servertool -> /usr/lib/jvm/default-runtime/bin/servertool
				Readable: [ 1 ] /usr/bin/lrelease-qt4 -> /usr/lib/qt4/bin/lrelease
				...
			
			To simplify on this example, a filesystem field lookup ('isread' in this case)
			is done, using the path to which the link is pointing (determined by calling 'rlink').
			Returned value is the result of 'isread', using the value returned by 'rlink'.
			
			<directive> executes on the data currently being processed (see DATA_STRUCTURES)
			<directive (FILESYSTEM)> executes on filesystem data generated from given path
			(result of <directive>) as requested by the directive itself (DATA_STRUCTURES, 
			under FILESYSTEM)
			
	?x   <directive>:<key>@<token>
			Scans file returned by <directive> line by line, tokenizes (delimiting by space and tab)
			and matches the first token to <key>. When matched, the result is everything after <key>, 
			unless a token is specifically requested:
			
				-x /glftpd/ftp-data/users/ -lom "depth=1" -v -print "{?x:(path):ALLUP}"
				
			Output:
			
				13 1285419 688 0 0 0 0 0 0 0 0 0 1118 7488426 14483 0 0 0 0 0 0 0 0 0
				
				
			Request the third token of 'ALLUP' line:			
				
				-x /glftpd/ftp-data/users/ -lom "depth=1" -v -print "{?x:(path):ALLUP@3}"
				
			Output:
			
				688
				
	?Q   <format>
			Allows for nesting of the built-in -print(f) <format>.
			
			Example usage (format a path to a userfile, pass it to '?x' to extract user flags):
			
				-w -print "{user}:  {?x:(?Q:(\{glroot\}/ftp-data/users/\{user\})):FLAGS}"
				
			Output:
			
				..
				someone: 13ABCDEFGHI
				sometwo: 3A
				..
				
	?s   <directive>:<option>
			Various string based operations. Checks the return value of <directive>
			based on <option>, returns 1 if true, 0 if false.
			
			<option> can be one of:
				ascii			- Is an ASCII string?
				alphanumeric	- Is an alphanumeric ASCII string?
				numeric			- Is a numeric ASCII string?
				
	?u   <uid>
			Translates <uid> to user name using <glroot>/etc/passwd
							
	?g   <gid>
			Translates <gid> to group name using <glroot>/etc/group
		
				
	Hash functions:
		
		Compute the selected message digest of the result from <directive>.
		Binary must be compiled with --enable-crypto to use these directives (OpenSSL cryptographic libs).
		
		Syntax: ?<option>	<directive>
				
		?S      - SHA-1
		?S224   - SHA-224	
		?S256   - SHA-256
		?S3     - SHA-384
		?S5     - SHA-512
		?M4     - MD4
		?M5     - MD5
		?R      - RIPEMD160
			
	?B    <directive>
			Base64 encode result from <directive>.
	
	?D    <directive>
			Base64 decode result from <directive>.
			
	?h    <directive>:<option>
			Query the hashtable using <directive> as the key.
			
			<option> can be one of:
				c	- Returns entry count
				u	- Gets the value of <directive>
				p	- Returns 1 if value has been accessed with 'u' option, 0 otherwise 
				
	?e    <field>:<directive|value>
			Write the result of <directive> or a static <value> to <field> of the currently processed log.
			
			Strings are automatically converted to integers where necessary.


	Mind that '{' and '}' need to be escaped with '\' whenever used inside the scope
	of a directive.
	
	Char directives:
	
		{:t}	Horizontal tab
		{:n}	New line
			
				
	If #<format> is defined, it gets passed to the standard C printf(3) family function which outputs 
	the requested field (or in case of ?t and ?tl, strftime(3)). Make sure to enclose the entire  
	directive, including #<format>, in parenthesis:
	
		-d -print '{(?t:time#%D %T)}'
		
	Most of the data fields aren't processed using printf(3) by default for performance reasons,
	therefore can't be formatted when referenced directly;
	If you wish to format field data or results of a directive, use the ?P extended directive:
	
		-d -print '{(?P:dir#Hello world - %.25s)}'
		
	Read about '?P' above.
	
	Extended directives can be used with with -exec[v], -print[f] and string filters 
	(-regex[i]/-match):
	
		-d -l: "(?tl:time#%d %h)" -match "12 Mar" -print '{basedir} was created {(?tl:time#%d %h at %H\\:%M)}'
		
		or
		
		-d -l: "(?tl:time#%d %h)" -match "12 Mar" --silent -execv 'echo "{basedir} was created {(?tl:time#%d %h at %H\\:%M)}"'			
		
	Prints only records created on March 12 from the dirlog, custom formatted.


 EXTRACTING DATA FROM BINARY LOGS:

	ANY binary log record can easily be extracted as ASCII string and passed 
	to the shell using -exec, --postexec or --preexec (see --help)
	
	
	Let's assume we want to get the path, number of files and directory  
	creation time out of dirlog, passing these three values for each record 
	to your script:
	
	./glutil -d --silent -execv "/path/to/myscript.sh {dir} {files} {time}"
	
	--/* myscript.sh: */-----------------------------------------------
	
	#/bin/bash
	
	echo $1 $2 $3
	
	exit 0
	
	-------------------------------------------------------------------
	
	Output:
	
	 [x@y bin]$ ./glutil -d --silent -execv "./myscript.sh {dir} {files} {time}"
	 ...
	 /site/archive/tv-../Sky.Special... 16 1277037362
	 /site/archive/tv-.../The.Simpsons.Special... 22 1277037472
	 /site/archive/tv-dvdrip/P/Prison.Break/Prison.Break.S02.. 28 1277065247
	 ...
	
	
	Now do the same, but apply a regex filter (-regex) to dirnames 
	to -exec ONLY what is matched and quit after first match (--imatchq):
	
	 ./glutil -d --silent -execv "./myscript.sh {dir} {files} {time}" -regex \
	 "\/The\.Simpsons" --imatchq
	
	Output:
	
	 /site/archive/tv-.../The.Simpsons.Special... 22 1277037472
	 /site/archive/tv-.../The.Simpsons.S04E05... 21 1277137472
	
	
	 WRITING RECORDS TO BINARY LOGS:
	
	ANY binary log can be built, using only proper ASCII formated input
	
	For this example, let's assume we want to add three new records to
	the dirlog.
	
	  Write records in ASCII form to a temporary file:
	  
	--/* adatafile.tmp: */----------------------------------------------
	
	dir /site/x264/Avengers.1080p.x264-GRP
	user 101
	group 101
	files 49
	size 9437184000
	time 1377307789
	status 0
	
	dir /site/xvid/Die.Hard.DVDRIP.XViD-GRP
	user 104
	group 105
	files 21
	size 734003200
	time 1377307789
	status 0
	
	dir /site/xvid/Fast.and.Furious.BRRIP.XViD-GRP
	user 102
	group 103
	files 22
	size 733103100
	time 1377301789
	status 1
	
	-------------------------------------------------------------------
	
	All records MUST be delimited by two new lines (\n\n), and the
	last record must be followed by two new lines.
	
	See MANUAL for required record fields (all must be defined).
	
	Write info held in 'adatafile.tmp' to the dirlog:
	
	 ./glutil -z dirlog < adatafile.tmp
	
	Assuming there's no ERROR, 3 new records are appended to the data
	file. 
	If you don't want to write to existing dirlog, use path overrides:
	
	 ./glutil -z dirlog --dirlog=/path/to/dirlog.new < adatafile.tmp


 ADDITIONAL LOG TYPES:

	Besides support for glFTPd log types, glutil introduces new log
	types, see /scripts folder for working examples.
	
	For instance, 'imdb_get.sh' scrapes iMDB info off the web and
	constructs iMDB binary data log, writing movie rating/scores/..
	and the directory path to it.
	
	iMDB log example usage:
	
	Say we want to do something with releases (in this case just 'echo'), 
	with a score lower than 5.0, but higher than 0
	
	-a --silent -lom "score < 5.0 && score" -execv "echo \">> {title} << is below the score limit {score}/5.0\"" 
	
	Display all titles with score higher than or equal to 6.0, sort by score ascending:
	
	-a -lom "score >= 6.0" --sort asc,score -print '{score} {:t} {title}'
	
	Display all titles with score higher than 7.0 of genre Comedy,
	sorted by score ascending:
	
	-a -lom "score > 7.0" -l: genres -regexi "Comedy" --sort asc,score -print '{score} {:t} {title} / {genres}'


 FOLDERS FILE (filesystem based dirlog rebuild only):

	To avoid importing junk data into dirlog, a file defined by 'du_fld' macro
	in glutil.c (or at runtime with --folders argument) can be used when 
	running in recursive mode (-r).
	Each line should contain a relative to site root PATH (no / in front or
	back required) and depth to scan at, separated by space.
	
	Example:
	
	  MP3 2
	  TV-XVID 1
	  XVID 1
	  0DAY 2
	  APPS 1
	  ARCHIVE/APPS 2
	  ARCHIVE/TV-XVID 3
	
	
	Depth value defines the exact level of a folder's directory tree, at 
	which the tool imports folders to dirlog, this way, avoiding irrelevant
	entries.
	
	For example:
	
	  A daily/weekly sorted release dir structure usually looks like:
	
	/mp3/0601/some-album-1-la
	/mp3/0601/some-album-2-la
	/mp3/0602/some-other-album-1-la
	..
	
	  We want to avoid importing /mp3/0602 folder itself into dirlog and
	  import only what is inside it.
	
	 MP3 2
	
	  This would NOT import 'mp3/0601/' and 'mp3/0602/' folders, but would 
	  import 'some-album-1-la/', 'some-album-2-la/' and 
	  'some-other-album-1-la/' .Folders inside 'some-album-..' folders 
	  are not imported.
	
	 MP3 1
	
	  This would import '0601/' and '0602/' only, scanning all content
	 inside it.
	
	  There's no reason you can't do this:
	
	 MP3 1
	 MP3 2
	
	  But in this particular case, having dated folders in there likely
	  poses no advantage.
	
	  But maybe you'd want to have individual CD/DVD dirs:
	
	  	XVID 1
	  	XVID 2
	
	  This does two passes, one on each directory tree level, 1 does the,
	  releases 2 the folders inside them.
	
	Passing '--full' when recursing ignores the folders file and imports
	every single folder inside gl's site root into your dirlog (scanning
	it for sizes/file counts).

  
 ACCUMULATE DATA VALUES:
  
  	The tool can accumulate (add-up) any integer/floating point field across the whole
	data source or just a part of it. For example, directory sizes from the dirlog can be
	added up and stored into one of the available accumulators.
	
	Filters are respected, anything not passing through will be ignored.
	
	How and where a value accumulates, is defined via a LOM filter hook;
	Directives within a LOM command scope, which define the accumulation 
	process, are not treated as filters and are ignored.
	
	Examples:
	
	  Count up folder files and sizes from all dirlog records:
	
		-d -lom "u64glob10 += size && u64glob11 += files" -postprint 'Total size: {u64glob10} bytes, files: {u64glob11}' --silent
		
	  Increment by one for each record matched (regex and LOM filters applied):
	  
		-d -regex "a" -lom "size > 0 && files > 0" -lom "u64glob7 += 1" -postprint 'Results: {u64glob7}'