-
Notifications
You must be signed in to change notification settings - Fork 5
/
Copy pathMANUAL
605 lines (395 loc) · 18.8 KB
/
MANUAL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
Updated: 18/12/2016
INPUT/OUTPUT OPERATIONS:
* Parse/dump:
Read-only mode (operators: -d, -n, -i, -l, -o, -w, -a, -k, -h, -q)
This mode loads selected binary log into process memory pages or shared mem segment
(--shmem) and displays the data in either normal (synopsis), --batch (full, tab delimited)
or --raw (binary) formats.
--nobuffer turns memory buffering off and directly reads record by record using C native
IO procedures (fread).
Using --shmem, shared memory pages are used instead of locally allocated memory - loading
there allows to keep log data cached in memory, where other instances can access it thus
avoiding having to allocate theirown pages and filing them with the actual log data from
the filesystem.
When the segment that belongs to a log file is non-existant, it's automatically created
and filled with data. If glutil detects log data file size is different from segment size
or --shmdestroy is present, it automatically destroys the segment, forcing glutil to
reallocate and reload it immediatelly after.
If data log size matches segment size and the segment exists, glutil will proceed using
the segment and it's data for all operations (segment data is NOT updated).
The user must ensure shared segment be updated promply, whenever log data on disk changes.
Even though glutil automatically updates the segment when log/segment sizes do not match,
it doesn't detect content changes;
-q <logname> --shmem --shmdestroy --loadq -vv
Load the data into segment (-q, --shmem), force segment be destroyed and re-allocated
(--shmdestroy), quit right after loading, don't do anything else (--loadq)
-q <logname> --shmem --shmreload --loadq -vv
All match/hook operators apply.
Specific options:
-print <format>
Print <format> on stdout. Directives must be enclosed in {}. Field names that
can be referenced are listed above. Only fields respective to the log being
processed are valid, with the exception of generic fields, which are available
globally.
For example, assume we are processing the nukelog:
-n lom "status=0" -print '{basedir} was NUKED {mult}x by {nuker} on {?tl:time#%d %h at %H:%M}'
This returns only nuked releases (status=0):
Some.Folder-TAG was NUKED 3x by someuser on 30 Jun at 11:07
'time' field (unix timestamp) is formatted using strftime (man 3 strftime)
See 'EXTENDED DIRECTIVES' section below.
* Write:
Write mode (operator: -z <logname> [--infile=/path/file])
Used to write one or more data log (specified by <logname>) records from a regular
ASCII text file (see README for examples).
If --infile=<filename> is not set, stdin is expected to hold the input data.
All match/hook operators apply.
* Rebuild:
Rebuild mode (operator: -e <logname>)
Data is loaded into memory, passed via filters/hooks, then written back to log file.
This is basically just for convinience, the same can be accomplished using --raw in
parse mode and redirecting stdout back to log file;
For example:
-d -l: dir -regexi "Somefolder" -or -lom "size == 0 || files == 0" --raw > /glftpd/ftp-data/logs/dirlog
would produce the same result as:
-e dirlog -l: dir -regexi "Somefolder" -or -lom "size == 0 || files == 0"
These examples would filter any dirlog record including 'Smallville' in directory
name (case insentitive) and any directories with size/files zero, writing results
back to data log. Using 'and' instead of 'or' between the match operators, would
filter only stuff with both Smallvile in directory name and of size 0.
There are however some advantages to using parsing operators with --raw, which are
not available during rebuild mode;
When in rebuild mode (-e), data must be buffered (either locally or in shared mem
segment) in order to be processed, so data file size is limited with your RAM capacity
(and with --memlimit argument or DB_MAX_SIZE macro in source, 1024MB by default).
However, while parsing (-d, -n, ..), --nobuffer option can be used, turning buffering
off and reading record by record using fget.
Since glutil is set by default to compile with large file offsets, theoretically a
massive file could be parsed (64-bit file size, not taking filesystem limits into account).
All match/hook options apply.
EXTENDED DIRECTIVES:
Legend:
<name> - directive name
<field> - only a reference to the actual data, strictly
fields which are listed in DATA_STRUCTURES
<directive> - actual data field or an extended directive
<expression> - logical/relational or mathematical expression
<pattern> - string matching pattern (e.g. REGEX)
<format> - Format string (printf(3) or strftime(3)), can
also be this tool's internal format
Syntax:
{?<name>:[<..>]<field|(directive)>[<..>]}
or
{(?<name>:[<..>]<field|(<directive>[#<format>])>[<..>][#<format>])}
Options:
?c <num repeat>:<char>
Character generator. Example usage:
'{?c:3:\t}' - prints three tab's.
Escapes are:
\n New line
\t Horizontal tab
\r Carriage return
\s Space
\\ Literal backslash ('\')
?t <field|(<expression>)>#<format>
Formats data returned by <field> (if using parenthesis,
a math <expression> is expected) as a readable string (GMT)
using according to the <format> specification.
See 'strftime(3)' for conversion specifications (passed
after #). If no conversion specifier characters are given,
format defaults to: '%d/%h/%Y %H:%M:%S'
?tl <field|(<expression>)>#<format>
Same as '?t', only the timestamp is converted relative to
the system's timezone.
?l <directive>
Prints the length of the field data based output string. Applies to
integer/float fields too, measuring their length in string form.
?m <expression>
Simplistic arithmetic and bitwise operations, compatible with all
integer fields. Floating point data types can only be added, subtracted,
multiplied and divided.
Arithmetic operators:
+ Add
- Subtract
* Multiply
/ Divide
% Modulo (integer remainder)
^ Power
~ Square root
$ Euclidean distance
Bitwise operators:
& AND
| OR
<< Bitwise left shift
>> Bitwise right shift
Example:
-d -print '{?m:size/(1024^2)}MB in {files} files, {?m:(size/files)/(1024*1024)}MB avg per file {?c:2:\t} {basedir}' -lom 'files>0' --sort size
Output:
...
356MB in 28 files, 12MB avg per file Some.Folder-TAG
...
?rs [/[<flags>]]:<directive>:<pattern>:<string|(directive)>
Before outputing/filtering the string returned by <directive>, substitute
part(s) matching regex <pattern> with <string|(directive)>.
Works on all types.
<flags> are:
i Case insensitive
g Match every instance
n Match-any-character operators don't match a newline
b Use POSIX Basic Regular Expression syntax
Replace float dots in 'score' field with commas before
printing to stdout:
-a -lom "score>7.0" -print '{?rs:score:[.]:\,} {?c:1:\t}{title}'
Output:
7,7 Some Movie
...
?rd [/[<flags>]]:<directive>:<pattern>
Delete matching regex <pattern> from string returned by <directive>.
Remove any (..) tags that may be present in tvrage shownames, before
doing a regex query in tvlog for 'Some Show':
-h -print '{name}' -l: '?rd/g:name:\([ ]+|\(\)\)\\(.*\\)\([ ]+|\(\)\)' -regexi '^Some Show$'
'Some Show (2004)' is matched as 'Some Show', without having to
wipe (2004) out of the database.
?b <directive>
Strip directory and suffix from filename stored in <directive>
?d <directive>
Strip last component from file name <directive>
?p <text>
Print <text>
?P <directive>#<format>
Print result of <directive> formatted per <format> (printf(3))
?L <expression>:<directive1>:<directive2>
Evaluates <expression> (LOM syntax) and executes <directive1> if
<expression> evaluates true and <directive2> if false.
?X <directive (FILESYSTEM)>:<directive>
Resolves <directive> (current data input) and executes <directive (FILESYSTEM)>,
using the result of <directive> as the path.
One instance where this could be usefull is when checking for dead links,
for example (link needs to be absolute to be valid):
-x / -R -lom "mode=10" -print "Valid: [ {(?X:(isread):(rlink))} ] {path} -> {rlink}"
Output:
...
Readable: [ 0 ] /usr/bin/pvresize -> lvm
Readable: [ 0 ] /usr/bin/iptables -> xtables-multi
Readable: [ 1 ] /usr/bin/servertool -> /usr/lib/jvm/default-runtime/bin/servertool
Readable: [ 1 ] /usr/bin/lrelease-qt4 -> /usr/lib/qt4/bin/lrelease
...
To simplify on this example, a filesystem field lookup ('isread' in this case)
is done, using the path to which the link is pointing (determined by calling 'rlink').
Returned value is the result of 'isread', using the value returned by 'rlink'.
<directive> executes on the data currently being processed (see DATA_STRUCTURES)
<directive (FILESYSTEM)> executes on filesystem data generated from given path
(result of <directive>) as requested by the directive itself (DATA_STRUCTURES,
under FILESYSTEM)
?x <directive>:<key>@<token>
Scans file returned by <directive> line by line, tokenizes (delimiting by space and tab)
and matches the first token to <key>. When matched, the result is everything after <key>,
unless a token is specifically requested:
-x /glftpd/ftp-data/users/ -lom "depth=1" -v -print "{?x:(path):ALLUP}"
Output:
13 1285419 688 0 0 0 0 0 0 0 0 0 1118 7488426 14483 0 0 0 0 0 0 0 0 0
Request the third token of 'ALLUP' line:
-x /glftpd/ftp-data/users/ -lom "depth=1" -v -print "{?x:(path):ALLUP@3}"
Output:
688
?Q <format>
Allows for nesting of the built-in -print(f) <format>.
Example usage (format a path to a userfile, pass it to '?x' to extract user flags):
-w -print "{user}: {?x:(?Q:(\{glroot\}/ftp-data/users/\{user\})):FLAGS}"
Output:
..
someone: 13ABCDEFGHI
sometwo: 3A
..
?s <directive>:<option>
Various string based operations. Checks the return value of <directive>
based on <option>, returns 1 if true, 0 if false.
<option> can be one of:
ascii - Is an ASCII string?
alphanumeric - Is an alphanumeric ASCII string?
numeric - Is a numeric ASCII string?
?u <uid>
Translates <uid> to user name using <glroot>/etc/passwd
?g <gid>
Translates <gid> to group name using <glroot>/etc/group
Hash functions:
Compute the selected message digest of the result from <directive>.
Binary must be compiled with --enable-crypto to use these directives (OpenSSL cryptographic libs).
Syntax: ?<option> <directive>
?S - SHA-1
?S224 - SHA-224
?S256 - SHA-256
?S3 - SHA-384
?S5 - SHA-512
?M4 - MD4
?M5 - MD5
?R - RIPEMD160
?B <directive>
Base64 encode result from <directive>.
?D <directive>
Base64 decode result from <directive>.
?h <directive>:<option>
Query the hashtable using <directive> as the key.
<option> can be one of:
c - Returns entry count
u - Gets the value of <directive>
p - Returns 1 if value has been accessed with 'u' option, 0 otherwise
?e <field>:<directive|value>
Write the result of <directive> or a static <value> to <field> of the currently processed log.
Strings are automatically converted to integers where necessary.
Mind that '{' and '}' need to be escaped with '\' whenever used inside the scope
of a directive.
Char directives:
{:t} Horizontal tab
{:n} New line
If #<format> is defined, it gets passed to the standard C printf(3) family function which outputs
the requested field (or in case of ?t and ?tl, strftime(3)). Make sure to enclose the entire
directive, including #<format>, in parenthesis:
-d -print '{(?t:time#%D %T)}'
Most of the data fields aren't processed using printf(3) by default for performance reasons,
therefore can't be formatted when referenced directly;
If you wish to format field data or results of a directive, use the ?P extended directive:
-d -print '{(?P:dir#Hello world - %.25s)}'
Read about '?P' above.
Extended directives can be used with with -exec[v], -print[f] and string filters
(-regex[i]/-match):
-d -l: "(?tl:time#%d %h)" -match "12 Mar" -print '{basedir} was created {(?tl:time#%d %h at %H\\:%M)}'
or
-d -l: "(?tl:time#%d %h)" -match "12 Mar" --silent -execv 'echo "{basedir} was created {(?tl:time#%d %h at %H\\:%M)}"'
Prints only records created on March 12 from the dirlog, custom formatted.
EXTRACTING DATA FROM BINARY LOGS:
ANY binary log record can easily be extracted as ASCII string and passed
to the shell using -exec, --postexec or --preexec (see --help)
Let's assume we want to get the path, number of files and directory
creation time out of dirlog, passing these three values for each record
to your script:
./glutil -d --silent -execv "/path/to/myscript.sh {dir} {files} {time}"
--/* myscript.sh: */-----------------------------------------------
#/bin/bash
echo $1 $2 $3
exit 0
-------------------------------------------------------------------
Output:
[x@y bin]$ ./glutil -d --silent -execv "./myscript.sh {dir} {files} {time}"
...
/site/archive/tv-../Sky.Special... 16 1277037362
/site/archive/tv-.../The.Simpsons.Special... 22 1277037472
/site/archive/tv-dvdrip/P/Prison.Break/Prison.Break.S02.. 28 1277065247
...
Now do the same, but apply a regex filter (-regex) to dirnames
to -exec ONLY what is matched and quit after first match (--imatchq):
./glutil -d --silent -execv "./myscript.sh {dir} {files} {time}" -regex \
"\/The\.Simpsons" --imatchq
Output:
/site/archive/tv-.../The.Simpsons.Special... 22 1277037472
/site/archive/tv-.../The.Simpsons.S04E05... 21 1277137472
WRITING RECORDS TO BINARY LOGS:
ANY binary log can be built, using only proper ASCII formated input
For this example, let's assume we want to add three new records to
the dirlog.
Write records in ASCII form to a temporary file:
--/* adatafile.tmp: */----------------------------------------------
dir /site/x264/Avengers.1080p.x264-GRP
user 101
group 101
files 49
size 9437184000
time 1377307789
status 0
dir /site/xvid/Die.Hard.DVDRIP.XViD-GRP
user 104
group 105
files 21
size 734003200
time 1377307789
status 0
dir /site/xvid/Fast.and.Furious.BRRIP.XViD-GRP
user 102
group 103
files 22
size 733103100
time 1377301789
status 1
-------------------------------------------------------------------
All records MUST be delimited by two new lines (\n\n), and the
last record must be followed by two new lines.
See MANUAL for required record fields (all must be defined).
Write info held in 'adatafile.tmp' to the dirlog:
./glutil -z dirlog < adatafile.tmp
Assuming there's no ERROR, 3 new records are appended to the data
file.
If you don't want to write to existing dirlog, use path overrides:
./glutil -z dirlog --dirlog=/path/to/dirlog.new < adatafile.tmp
ADDITIONAL LOG TYPES:
Besides support for glFTPd log types, glutil introduces new log
types, see /scripts folder for working examples.
For instance, 'imdb_get.sh' scrapes iMDB info off the web and
constructs iMDB binary data log, writing movie rating/scores/..
and the directory path to it.
iMDB log example usage:
Say we want to do something with releases (in this case just 'echo'),
with a score lower than 5.0, but higher than 0
-a --silent -lom "score < 5.0 && score" -execv "echo \">> {title} << is below the score limit {score}/5.0\""
Display all titles with score higher than or equal to 6.0, sort by score ascending:
-a -lom "score >= 6.0" --sort asc,score -print '{score} {:t} {title}'
Display all titles with score higher than 7.0 of genre Comedy,
sorted by score ascending:
-a -lom "score > 7.0" -l: genres -regexi "Comedy" --sort asc,score -print '{score} {:t} {title} / {genres}'
FOLDERS FILE (filesystem based dirlog rebuild only):
To avoid importing junk data into dirlog, a file defined by 'du_fld' macro
in glutil.c (or at runtime with --folders argument) can be used when
running in recursive mode (-r).
Each line should contain a relative to site root PATH (no / in front or
back required) and depth to scan at, separated by space.
Example:
MP3 2
TV-XVID 1
XVID 1
0DAY 2
APPS 1
ARCHIVE/APPS 2
ARCHIVE/TV-XVID 3
Depth value defines the exact level of a folder's directory tree, at
which the tool imports folders to dirlog, this way, avoiding irrelevant
entries.
For example:
A daily/weekly sorted release dir structure usually looks like:
/mp3/0601/some-album-1-la
/mp3/0601/some-album-2-la
/mp3/0602/some-other-album-1-la
..
We want to avoid importing /mp3/0602 folder itself into dirlog and
import only what is inside it.
MP3 2
This would NOT import 'mp3/0601/' and 'mp3/0602/' folders, but would
import 'some-album-1-la/', 'some-album-2-la/' and
'some-other-album-1-la/' .Folders inside 'some-album-..' folders
are not imported.
MP3 1
This would import '0601/' and '0602/' only, scanning all content
inside it.
There's no reason you can't do this:
MP3 1
MP3 2
But in this particular case, having dated folders in there likely
poses no advantage.
But maybe you'd want to have individual CD/DVD dirs:
XVID 1
XVID 2
This does two passes, one on each directory tree level, 1 does the,
releases 2 the folders inside them.
Passing '--full' when recursing ignores the folders file and imports
every single folder inside gl's site root into your dirlog (scanning
it for sizes/file counts).
ACCUMULATE DATA VALUES:
The tool can accumulate (add-up) any integer/floating point field across the whole
data source or just a part of it. For example, directory sizes from the dirlog can be
added up and stored into one of the available accumulators.
Filters are respected, anything not passing through will be ignored.
How and where a value accumulates, is defined via a LOM filter hook;
Directives within a LOM command scope, which define the accumulation
process, are not treated as filters and are ignored.
Examples:
Count up folder files and sizes from all dirlog records:
-d -lom "u64glob10 += size && u64glob11 += files" -postprint 'Total size: {u64glob10} bytes, files: {u64glob11}' --silent
Increment by one for each record matched (regex and LOM filters applied):
-d -regex "a" -lom "size > 0 && files > 0" -lom "u64glob7 += 1" -postprint 'Results: {u64glob7}'