Editorials

Extract Editorials from newspapers using the RSS feed provided by newspapers. Simply provide the tags which needs to be processed in a config file named "action_items.json".

Config file

Config file is a JSON file. Either pass using command line or use default file present in /config path.

Config parameters are:

"url_file"  : "path to the file where all tags related information is present",	"log_level":  "log level in python. Different log levels are DEBUG, INFO, WARNING, ERROR, CRITICAL",
"storage" : {
	"type" 		: "file or db, only one at a time",
	"path" 		: "path only applicable for file only",
	"host" 		: "hostname/ip address",
	"port" 		: "port number",
	"username" 	: "username, if username and password not given, it will try to connect without credential",
	"password" 	: "password, plaintext as of now",
	"database"	: "db name",
	"collection" 	: "collection name",
	"authdb"	: "authentication db, optional, default 'admin' will be used"
}

Action file

This file is a json file which will have all the information about the tags to process and the attributes of a tag to uniquely identify that tag. Action file will have below format.

"Newspaper title" : 
{	
	"url" : "url to the rss feed",
	"rss" : {
		"tags":["list of comma separated tags to process at rss page for each article"],
		"date_format" : "date format. Only month date and year format is to be supplied."
	},
	"page" : {
		"item" : {
			"tag" : "tag_name", 
			"attribute" : { "atttibute name" : "value" },
			"child_include" : ["comma separated tags to include within this tag to extract the content"],
			"child_exclude" : ["comma separated tags to exclude within this tag to extract the content"],
			"format" : "date format if the field is a date."
 		}
	}
}

All keys are compulsory except "child_include" , "child_exclude". and "format" has to be given if only the item is of type date.

.e.g

"The New York Time" : 
 {	
	"url" : "http://www.nytimes.com/svc/collections/v1/publish/www.nytimes.com/column/charles-m-blow/rss.xml",
	"rss" : {
		"tags":["description"],
		"date_format" : "%d %b %Y"
	},
	"page" : {
		"date" : {
			"tag" : "time", 
			"attribute" : {},
			"child_include" : ["none"],
			"format" : "%B %d %Y"
 		},
		"story" : {
			"tag" : "div",
			"attribute" : {"class":"css-53u6y8"},
			"child_include" : ["p"]
		}
	}
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
config		config
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Editorials

Config file

Action file

About

Releases

Packages

Languages

License

Dr-N00B/Editorials

Folders and files

Latest commit

History

Repository files navigation

Editorials

Config file

Action file

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages