A error when I use “splitter.split_by_sentences_wrapper”，please help check the error #7

Amen-bang · 2022-05-27T04:02:32Z

when I use “splitted_from = splitter.split_by_sentences_wrapper(text1_prepared, lang_from)” return list，

But I see that there will be a conflict when insert sqlite ，specific error：

File "ling_test.py", line 36, in
aligner.fill_db(db_path, splitted_from, splitted_to)
File "lingtrain_aligner/aligner.py", line 498, in fill_db
db.executemany("insert into languages(key, val) values(?,?)", [("from", lang_from), ("to", lang_to)])
sqlite3.InterfaceError: Error binding parameter 1 - probably unsupported type.

averkij · 2022-05-27T17:49:16Z

Hi!

Please, provide the text you trying to split and the lang_code.

freetz13 · 2022-06-22T08:02:10Z

Please, provide the text you trying to split and the lang_code.

I got the same error. I followed this article, so there is no lang_code, there are just:

lang_from = "ru"
lang_to = "en"

freetz13 · 2022-06-22T14:45:51Z

aligner.fill_db(db_path, splitted_from, splitted_to)

.fill_db() has a signature (db_path, lang_from, lang_to, splitted_from=[], splitted_to=[], proxy_from=[], proxy_to=[]), so set lang_from and lang_to, these are strings.

averkij · 2022-07-04T08:49:40Z

Hello! Here is the working Colab

https://colab.research.google.com/drive/1_ics0YzWg5qIZIPhA1X_Wbfg0XZzRO-p

Please, try it with your texts. Let me know in case of further errors.

francescofeston · 2022-10-22T08:47:14Z

Hello @averkij

I am facing the same issue: TypeError: split_by_sentences_wrapper() got an unexpected keyword argument 'leave_marks'

In the following code I deliberatly left out the parameter "leave_marks" from the splitted_from and splitted_to variables because the source text is already kind of preformatted. Could you please help me out? Thanks

import os
from lingtrain_aligner import preprocessor, splitter, aligner, resolver, reader, vis_helper

text1_input = "HarryPotterSteinDerWeise.rtf"
text2_input = "HarryPotterandthe Philosopher.rtf"

with open(text1_input, "r", encoding="utf8") as input1:
text1 = input1.readlines()

with open(text2_input, "r", encoding="utf8") as input2:
text2 = input2.readlines()

db_path = "book.db"

lang_from = "de"
lang_to = "en"

models = ["sentence_transformer_multilingual", "sentence_transformer_multilingual_labse"]
model_name = models[0]

text1_prepared = preprocessor.mark_paragraphs(text1)
text2_prepared = preprocessor.mark_paragraphs(text2)

splitted_from = splitter.split_by_sentences_wrapper(text1_prepared , lang_from)
splitted_to = splitter.split_by_sentences_wrapper(text2_prepared , lang_to)

if os.path.isfile(db_path):
os.unlink(db_path)

aligner.fill_db(db_path, splitted_from, splitted_to)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A error when I use “splitter.split_by_sentences_wrapper”，please help check the error #7

A error when I use “splitter.split_by_sentences_wrapper”，please help check the error #7

Amen-bang commented May 27, 2022

averkij commented May 27, 2022 •

edited

Loading

freetz13 commented Jun 22, 2022

freetz13 commented Jun 22, 2022

averkij commented Jul 4, 2022

francescofeston commented Oct 22, 2022

A error when I use “splitter.split_by_sentences_wrapper”，please help check the error #7

A error when I use “splitter.split_by_sentences_wrapper”，please help check the error #7

Comments

Amen-bang commented May 27, 2022

averkij commented May 27, 2022 • edited Loading

freetz13 commented Jun 22, 2022

freetz13 commented Jun 22, 2022

averkij commented Jul 4, 2022

francescofeston commented Oct 22, 2022

averkij commented May 27, 2022 •

edited

Loading