Skip to content

nytud/pseudo-anonimization

Repository files navigation

Pseudo anonimization

Usage


  • cli
  • server

Installation for CLI

  1. clone repository
  2. create virtual environment
windows: 
python3 -m venv .ve
unix like:
python3 -m venv .ve
  1. activate environment / resource termnal
windows:
& .ve/Scripts/Acitvate.ps1
unix like:
source .ve/bin/activate
  1. install requirements
pip install -r requirements.txt
pip install  https://huggingface.co/huspacy/hu_core_news_trf/resolve/main/hu_core_news_trf-any-py3-none-any.whl

Starting

start docker container for emtsv

docker run --rm -p5000:5000 -it mtaril/emtsv 

start application - result will be written to stdout

python .\anonimization.py --file-input "path/to/file" --format=[emagyar, huspacy]

Starting as server

First of all the .env file should be created based on the example.env file. The PORT and the GPU ids should be set.

docker compose up -d --build

the server is available on the previously allocated port. available endpoints:

  • /docs : SWAGGER based documentation of the API
  • /anonymization : segment based execution of the anonymization program
  • /tokenize/emagyar : only tokenizes the input
  • /tokenize/huspacy
  • /swap/emagyar
  • /swap/huspacy

all endpoints requires a file input or body:{"text":"text to process"}

component diagram @startuml agent text queue "morphological analysis" as morpho database "Hungarian given names" as given queue "generate form of pseudo anonymized name" as gen queue NER

component emtsv component huspacy component NerKor component PseudoAnonimizator as pseu

text --> pseu pseu -right-> NER NER --> NerKor NerKor --> pseu pseu -right-> morpho morpho -- emtsv morpho -- huspacy pseu -right-> given : select pseudo name pseu --> gen gen -- emtsv gen -- huspacy

@enduml

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published