Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A error when I use “splitter.split_by_sentences_wrapper”,please help check the error #7

Open
Amen-bang opened this issue May 27, 2022 · 5 comments

Comments

@Amen-bang
Copy link

when I use “splitted_from = splitter.split_by_sentences_wrapper(text1_prepared, lang_from)” return list,

But I see that there will be a conflict when insert sqlite ,specific error:

File "ling_test.py", line 36, in
aligner.fill_db(db_path, splitted_from, splitted_to)
File "lingtrain_aligner/aligner.py", line 498, in fill_db
db.executemany("insert into languages(key, val) values(?,?)", [("from", lang_from), ("to", lang_to)])
sqlite3.InterfaceError: Error binding parameter 1 - probably unsupported type.

@averkij
Copy link
Owner

averkij commented May 27, 2022

Hi!

Please, provide the text you trying to split and the lang_code.

@freetz13
Copy link

Please, provide the text you trying to split and the lang_code.

I got the same error. I followed this article, so there is no lang_code, there are just:

lang_from = "ru"
lang_to = "en"

@freetz13
Copy link

aligner.fill_db(db_path, splitted_from, splitted_to)

.fill_db() has a signature (db_path, lang_from, lang_to, splitted_from=[], splitted_to=[], proxy_from=[], proxy_to=[]), so set lang_from and lang_to, these are strings.

@averkij
Copy link
Owner

averkij commented Jul 4, 2022

Hello! Here is the working Colab

https://colab.research.google.com/drive/1_ics0YzWg5qIZIPhA1X_Wbfg0XZzRO-p

Please, try it with your texts. Let me know in case of further errors.

@francescofeston
Copy link

Hello @averkij

I am facing the same issue: TypeError: split_by_sentences_wrapper() got an unexpected keyword argument 'leave_marks'

In the following code I deliberatly left out the parameter "leave_marks" from the splitted_from and splitted_to variables because the source text is already kind of preformatted. Could you please help me out? Thanks

import os
from lingtrain_aligner import preprocessor, splitter, aligner, resolver, reader, vis_helper

text1_input = "HarryPotterSteinDerWeise.rtf"
text2_input = "HarryPotterandthe Philosopher.rtf"

with open(text1_input, "r", encoding="utf8") as input1:
text1 = input1.readlines()

with open(text2_input, "r", encoding="utf8") as input2:
text2 = input2.readlines()

db_path = "book.db"

lang_from = "de"
lang_to = "en"

models = ["sentence_transformer_multilingual", "sentence_transformer_multilingual_labse"]
model_name = models[0]

text1_prepared = preprocessor.mark_paragraphs(text1)
text2_prepared = preprocessor.mark_paragraphs(text2)

splitted_from = splitter.split_by_sentences_wrapper(text1_prepared , lang_from)
splitted_to = splitter.split_by_sentences_wrapper(text2_prepared , lang_to)

if os.path.isfile(db_path):
os.unlink(db_path)

aligner.fill_db(db_path, splitted_from, splitted_to)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants