-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to train the first stage? #3
Comments
This I make it,with direct training clip model, but freeze the weight of the vit model, increase the batch can be, 64 characters can be opened on the colab p100 96batch. So easy@! |
【自动恢复】来信已收到,我将尽快回复你!
|
oh , I have train my Chinese CLIP as well. So the thing I need to do is to jointly train with latent diffusion model? |
直接说中文吧。我训练完,输入中文能出效果还可以的图,就是中国化差点,你第一步做完后可以出效果吗?还是说,输入中文,出来的结果很差,没理解语义
…---Original---
From: ***@***.***>
Date: Mon, Sep 19, 2022 21:34 PM
To: ***@***.***>;
Cc: ***@***.******@***.***>;
Subject: Re: [rinnakk/japanese-stable-diffusion] How to train the first stage?(Issue #3)
Hi, do you train your text encoder in CLIP way without latent diffusion. Or maybe train it with the diffusion model, how's the loss function and other detail? Would you like to share more detail about the training?
This I make it,with direct training clip model, but freeze the weight of the vit model, increase the batch can be, 64 characters can be opened on the colab p100 96batch. So easy@!
oh , I have train my Chinese CLIP as well. So the thing I need to do is to jointly train with latent diffusion model?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.Message ID: ***@***.***>
|
我可能得重训一下。。。他这个cross-attention机制,我那个模型维度可能对不上。你是直接没做进一步的finetune,就可以拿训好的CLIP去做中文生成了? |
我是直接在vl14上做clip的微调的,也就是这个日本博主的第一步造成,数据虽不多,但直接输入中文出图是OK的,可以理解语义,第二步我其实有了方案,只缺算力
…---Original---
From: ***@***.***>
Date: Mon, Sep 19, 2022 22:15 PM
To: ***@***.***>;
Cc: ***@***.******@***.***>;
Subject: Re: [rinnakk/japanese-stable-diffusion] How to train the first stage?(Issue #3)
直接说中文吧。我训练完,输入中文能出效果还可以的图,就是中国化差点,你第一步做完后可以出效果吗?还是说,输入中文,出来的结果很差,没理解语义
…
---Original--- From: @.> Date: Mon, Sep 19, 2022 21:34 PM To: @.>; Cc: @.@.>; Subject: Re: [rinnakk/japanese-stable-diffusion] How to train the first stage?(Issue #3) Hi, do you train your text encoder in CLIP way without latent diffusion. Or maybe train it with the diffusion model, how's the loss function and other detail? Would you like to share more detail about the training? This I make it,with direct training clip model, but freeze the weight of the vit model, increase the batch can be, 64 characters can be opened on the colab p100 96batch. So easy@! oh , I have train my Chinese CLIP as well. So the thing I need to do is to jointly train with latent diffusion model? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
我可能得重训一下。。。他这个cross-attention机制,我那个模型维度可能对不上。你是直接没做进一步的finetune,就可以拿训好的CLIP去做中文生成了?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.Message ID: ***@***.***>
|
我的微信g18818233178
…---Original---
From: ***@***.***>
Date: Mon, Sep 19, 2022 22:26 PM
To: ***@***.***>;
Cc: ***@***.******@***.***>;
Subject: Re: [rinnakk/japanese-stable-diffusion] How to train the first stage?(Issue #3)
我是直接在vl14上做clip的微调的,也就是这个日本博主的第一步造成,数据虽不多,但直接输入中文出图是OK的,可以理解语义,第二步我其实有了方案,只缺算力
…
---Original--- From: @.> Date: Mon, Sep 19, 2022 22:15 PM To: @.>; Cc: @.@.>; Subject: Re: [rinnakk/japanese-stable-diffusion] How to train the first stage?(Issue #3) 直接说中文吧。我训练完,输入中文能出效果还可以的图,就是中国化差点,你第一步做完后可以出效果吗?还是说,输入中文,出来的结果很差,没理解语义 … ---Original--- From: @.> Date: Mon, Sep 19, 2022 21:34 PM To: @.>; Cc: @.@.>; Subject: Re: [rinnakk/japanese-stable-diffusion] How to train the first stage?(Issue #3) Hi, do you train your text encoder in CLIP way without latent diffusion. Or maybe train it with the diffusion model, how's the loss function and other detail? Would you like to share more detail about the training? This I make it,with direct training clip model, but freeze the weight of the vit model, increase the batch can be, 64 characters can be opened on the colab p100 96batch. So easy@! oh , I have train my Chinese CLIP as well. So the thing I need to do is to jointly train with latent diffusion model? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> 我可能得重训一下。。。他这个cross-attention机制,我那个模型维度可能对不上。你是直接没做进一步的finetune,就可以拿训好的CLIP去做中文生成了? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.>
那可以留个联系方式多多交流啊~ 我打算重训CLIP对齐一下维度,不过现在在训huge版,得等一段时间了。后面第二步也打算做一些finetune。。。目前我这边的数据集大概就是wukong100M和zero23M两个。
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.Message ID: ***@***.***>
|
我一开始训的是CLIP的text encoder,但是训出来的在下游zero-shot表现一般般。(主要是词表的问题),但是生成类的似乎不太好评价下游效果。我后面会试试对齐一下维度做做看。谢谢大佬!! |
嗯,这次我就是这么干的。但单从clip中文适配效果而言,中文bert和clip中的vit拼接,再做fine tune效果更佳 |
话又说回来,要向sd的“中国化”效果较好,第二步还是得走的,text encoder/unet/vae联合训练,其中的个别可以冻住 |
emmm 你是在多大规模上finetune的? 我是直接用上亿数据去做训练的,所以用加载中文的robert预训练模型,效果会好很多。我记得之前用原生的不改词表大概0.2几,用中文Robert能到0.4几(在imagenet1k翻译过来的中文版)。也试过用clip的权重,只换掉词表,效果也不行,你的trick可能还是蛮重要的(但是这个实验成本挺高的hhh,可能得换小点数据看看) |
你这个是vit和text的权重都训练吗?我是放开text模型的权重进行训练,vit的冻住,不然16g的显存玩不动,但效果还可以,千万级的训练数据。 |
|
请问可以使用现有的中文CLIP text encoder权重吗?如:https://github.com/PaddlePaddle/ERNIE/tree/ernie-kit-open-v1.0/Research/ERNIE-ViL2 (这个是双向语言模型,不是单向的)这个模型的hidden states是768。 |
应该是不行的,现有的sd是基于openai的clip训练的;百度这个不行,第二阶段的unet VAE和clip都要有联系才行 |
Hi, do you train your text encoder in CLIP way without latent diffusion. Or maybe train it with the diffusion model, how's the loss function and other detail? Would you like to share more detail about the training?
The text was updated successfully, but these errors were encountered: