Use grep to extract from file for testing with PyFunceble #219
spirillen
started this conversation in
Show and tell
Replies: 3 comments 4 replies
-
Okay, so you don't want the If we implement this it should be with a new argument. |
Beta Was this translation helpful? Give feedback.
3 replies
-
Touches #218. |
Beta Was this translation helpful? Give feedback.
0 replies
-
Is this still an open thing @spirillen ? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
This example is taken from one of my own use-cases where i would like to extract domains from EasyList and then test them against already added (known) and therefore blocked items, this will create a tested output of records that's needs to be added to my own RPZ lists.
The first step is to determine the regex string to be used for extracting the domains from
easylist/easylist_thirdparty.txt
which is mark by||domain.name.tld^
or||domain.name.tld^$third-party
This is done with the following regex string
A bit of explaining for the different stuff in the line
\|\|
as the|
by it self meansor
we need to add the backslash\
to make it a literal pipe|
.*
is a wildcard in regex as of the dot.
is the wildcard and the star*
mean any number of occurrences\.
This time we tells regex the dot (otherwise wildcard) is literally a dot.
as it is used for splitting the domain levels.\^
In regex a^
means the beginning of or starts with we needs to make it a literal with a backslash since the occurrence of the^
most be there and it is not a regex function.($|\$third-party)
is a grouping containing a OR argument (the|
)^
that is done by saying($
(In regex the dollar sign means end of string)|
\$third-party)
This is the alternative ending of the lines we like to extract matching literal^$third-party)
You can see the above code in action on the following two online examples
Now that we have developed out regex string, we will use it in combination with grep
The file from which we would like to extract the
cloudfront.net
spam domains is:$ grep -E '\|\|.*\.cloudfront\.net\^($|\$third-party)' easylist/easylist_thirdparty.txt
The
-E
in grep mean extended regex...Now we finally parse the grep + regex to the pyfunceble command
$ pyfunceble --adblock --database-type mariadb -d $(grep -iE '\|\|.*\.cloudfront.net\^($|\$third-party)' easylist/easylist_thirdparty.txt)
Notice that we now uses the
-d
(domain) parser of our records and not the-f
(file) parser argument as we used grep to extract from the file.In this example we have bypassed the usage of the
--filter
option, as it would have given us unwanted results like||d3rp5jatom3eyn.cloudfront.net^$domain=~my.na
That's it, Have fun and I hoped you found more to use pyfunceble too and alternative ways of usage.
Beta Was this translation helpful? Give feedback.
All reactions