upon search, the following should happen:
- a list of the first n items is returned, with links to the texts (or the default for n+x is: return all)
- this list can be browsed and paged to the next
- at the same time a breakdown of the most frequent 部類 and texts
is returned to other parts of the page, this can be used to further filter etc.
- in the future, additional aspects for the texts should also be available, like dynasty, author etc.
- this is realized by putting the whole result into redis:
- when doing a search, first check if it is in redis
- redis results will have an expiry, but will all be deleted upon a new index
- if not, do the search and put the resutls in redis, including the analysis
- if yes, just get the stuff
- when doing a search, first check if it is in redis
import app.main.views
key=u"原文"
ox = app.main.views.doftsearch(key)
ux = ox.decode('utf8')
s=ux.split('\n')
#this gives a list of the text number of all matches:
b = [a.split('\t')[1].split(':')[0] for a in s if "\t" in a]
from collections import Counter
#this counts b and gives the most frequent first
c=Counter(b)
#this gets just the sections
b1 = [a.split('\t')[1][0:4] for a in s if "\t" in a]
for t, k in c.most_common(10):
print t, k
for t, k in c1.most_common(10):
print t, k
- the results are stored using the search-key as key (e.g. u”原文”), in a list
- this makes it possible to access them by indes
- Q: what to do for different sort criteria?
- > store this in another key with the search criteria appended, e.g. u”原文-prev” etc.
- need a typololgy of possible search keys
- also store u”原文-texts” and u”原文-sections” for a hash of results for these parts:
- need a typololgy of possible search keys
- u”原文-texts” will have the result of b from above
- u”原文-sections” will have the results of b1 from above
- all keys will have a convenient expire attached.
import app.main.views
r=app.main.views.redis_store
key=u"原文"
ox = app.main.views.doftsearch(key)
ux = ox.decode('utf8')
s=ux.split('\n')
# we want to sort the results, since they come in unsorted!
s.sort()
subset = s[0:10]
r.rpush(key, subset)
r.lrange(key, 0, 20)=> Upgraded to 2.8.2, also needed to update conf.
wrote a script to import the ~/$mandoku/meta/ catalog files into redis:
/Users/chris/krp/mandoku/python/mandoku/mdcatalog2redis.py
I am using now the gitlab api with a private token to pull things in and send them to the user. This ensures that I have the latest version.
TODO:
- make a proper flask extension with configuration
- expose different branches, users forks etc.
- Got the template structure etc. ported over from krp_『道藏輯要』
for this, needed
- flask-Babel extension and initialization code,
- pybabel translation catalogs etc
- TODO: the ja message catalog is still fuzzy for some reason: investigate and correct.
- for the moment using the bootstrap files etc. from the old version, not the flask extension, might need to update this later. [2014-09-05T20:30:39+0900]
- now ready to put it together!
from app import redis_store as r
c = r.keys('zb:meta*')
a = [r.hgetall(b).keys() for b in c]
counter = Counter(itertools.chain(*a))RESULT:
LINKS 9271 TITLE 9271 RESP 7330 DYNASTY 7362 page 1528 040314a WITLIST 9271
DZ 1528 DZ0226 HFL 1528 ‘HFL\xe7\x8f\xa0\xe4\xb8\x8a022’ = HFL珠上022 CBETA_ID 4275 T50n2046 XWDZ 1528 XWDZ06p0828 ZHDZ 1527 ZHDZ19p0143 ID 9271 ZB5a0227
a = [[r.hgetall(b)[k] for k in r.hgetall(b).keys() if k == 'DYNASTY'] for b in c] c2 = Counter(itertools.chain(*a)) for c in c2.most_common(40).keys(): print c, cn
宋 1958 明 1064 唐 1052 清 1016 元 446 隋 153 西晉 141 失譯 137 民國 101 後漢 79 劉宋 78 漢 75 吳 63 東晉 52 梁 52 姚秦 43 元魏 43 日本 42 晉 33 新羅 31 □ 25 後秦 20 陳 20 金 20 後魏 16 天親菩薩造 15 龍樹菩薩造 14 高麗 12 方廣錩整理 12 北涼 12 魏 11 世親菩薩造 11 北周 10 南唐 9 無著菩薩造 9 達照整理 8 張總整理 8 闕譯 7 遼 7 方廣錩 6