Kanripo Web Interface

Search

fulltext search with kwic

upon search, the following should happen:

a list of the first n items is returned, with links to the texts (or the default for n+x is: return all)
this list can be browsed and paged to the next
at the same time a breakdown of the most frequent 部類 and texts is returned to other parts of the page, this can be used to further filter etc.
- in the future, additional aspects for the texts should also be available, like dynasty, author etc.
this is realized by putting the whole result into redis:
- when doing a search, first check if it is in redis
  - redis results will have an expiry, but will all be deleted upon a new index
- if not, do the search and put the resutls in redis, including the analysis
- if yes, just get the stuff

import app.main.views
key=u"原文"
ox = app.main.views.doftsearch(key)
ux = ox.decode('utf8')
s=ux.split('\n')
#this gives a list of the text number of all matches:
b = [a.split('\t')[1].split(':')[0] for a in s if "\t" in a]

from collections import Counter
#this counts b and gives the most frequent first
c=Counter(b)
#this gets just the sections
b1 = [a.split('\t')[1][0:4] for a in s if "\t" in a]
for t, k in c.most_common(10):
  print t, k
for t, k in c1.most_common(10):
  print t, k

redis implementation

the results are stored using the search-key as key (e.g. u”原文”), in a list
- this makes it possible to access them by indes
- Q: what to do for different sort criteria?

> store this in another key with the search criteria appended, e.g. u”原文-prev” etc.
- need a typololgy of possible search keys
  - also store u”原文-texts” and u”原文-sections” for a hash of results for these parts:
u”原文-texts” will have the result of b from above
u”原文-sections” will have the results of b1 from above
- all keys will have a convenient expire attached.

import app.main.views
r=app.main.views.redis_store
key=u"原文"
ox = app.main.views.doftsearch(key)
ux = ox.decode('utf8')
s=ux.split('\n')
# we want to sort the results, since they come in unsorted!
s.sort()
subset = s[0:10]
r.rpush(key, subset)
r.lrange(key, 0, 20)

=> Upgraded to 2.8.2, also needed to update conf.

metadata

wrote a script to import the ~/$mandoku/meta/ catalog files into redis: /Users/chris/krp/mandoku/python/mandoku/mdcatalog2redis.py

gitlab api

I am using now the gitlab api with a private token to pull things in and send them to the user. This ensures that I have the latest version.

TODO:

make a proper flask extension with configuration
expose different branches, users forks etc.

templates

Got the template structure etc. ported over from krp_『道藏輯要』 for this, needed
- flask-Babel extension and initialization code,
- pybabel translation catalogs etc
  - TODO: the ja message catalog is still fuzzy for some reason: investigate and correct.
- for the moment using the bootstrap files etc. from the old version, not the flask extension, might need to update this later. [2014-09-05T20:30:39+0900]
- now ready to put it together!

catalog

count ids etc:

from app import redis_store as r
c = r.keys('zb:meta*')
a = [r.hgetall(b).keys() for b in c]
counter = Counter(itertools.chain(*a))

RESULT:

LINKS 9271 TITLE 9271 RESP 7330 DYNASTY 7362 page 1528 040314a WITLIST 9271

DZ 1528 DZ0226 HFL 1528 ‘HFL\xe7\x8f\xa0\xe4\xb8\x8a022’ = HFL珠上022 CBETA_ID 4275 T50n2046 XWDZ 1528 XWDZ06p0828 ZHDZ 1527 ZHDZ19p0143 ID 9271 ZB5a0227

dynasties:

a = [[r.hgetall(b)[k] for k in r.hgetall(b).keys() if k == 'DYNASTY'] for b in c]
c2 = Counter(itertools.chain(*a))
for c in c2.most_common(40).keys():
   print c, cn

宋 1958 明 1064 唐 1052 清 1016 元 446 隋 153 西晉 141 失譯 137 民國 101 後漢 79 劉宋 78 漢 75 吳 63 東晉 52 梁 52 姚秦 43 元魏 43 日本 42 晉 33 新羅 31 □ 25 後秦 20 陳 20 金 20 後魏 16 天親菩薩造 15 龍樹菩薩造 14 高麗 12 方廣錩整理 12 北涼 12 魏 11 世親菩薩造 11 北周 10 南唐 9 無著菩薩造 9 達照整理 8 張總整理 8 闕譯 7 遼 7 方廣錩 6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kanripo Web Interface

Search

fulltext search with kwic

redis implementation

metadata

gitlab api

templates

catalog

count ids etc:

dynasties:

FilesExpand file tree

roadmap.org

Latest commit

History

roadmap.org

File metadata and controls

Kanripo Web Interface

Search

fulltext search with kwic

redis implementation

metadata

gitlab api

templates

catalog

count ids etc:

dynasties: