You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
in the paper it says that $x_t$ is the noise latent from the ground-truth views of the training data. Which ground-truth view are you referencing from?
In section 5.4 it says that
'For the generated frames $\{I_i\}_{i=1}^{K{\prime}}$ we denote $\hat{C}_i$ and $C_i$ the per-pixel color value for generated and ground-truth view $i$.'
in this equation it seems like the loss is calculated between the 32 generated frames $\hat{I_i}$ with its GT frames $I_i$.
Which GT frames are you comparing with? Is it the input sparse view? or is it from the train dataset video?
Thank you in advance for your time to reply to this issue.
Best regards
Frank
The text was updated successfully, but these errors were encountered:
Hello, @liuff19
Recently, I found your project very interesting and started reading your paper. However, I have some questions on some equations in the paper.
For equations (12)
$$\ L_{\text{diffusion}} = \mathbb{E}{x \sim p, \epsilon \sim \mathcal{N}(0, I), c{\text{view}}, c_{\text{struc}}, t} \left[ |\epsilon - \epsilon_{\theta} (x_t, t, c_{\text{view}}, c_{\text{struc}})|^2_2 \right] $$
in the paper it says that$x_t$ is the noise latent from the ground-truth views of the training data. Which ground-truth view are you referencing from?
In section 5.4 it says that$\{I_i\}_{i=1}^{K{\prime}}$ we denote $\hat{C}_i$ and $C_i$ the per-pixel color value for generated and ground-truth view $i$ .'
'For the generated frames
What do you mean by ground-truth view$C_i$ ?
It also appears in equation (13)
For equation (14)
$$\ L_{\text{conf}} = \sum_{i=1}^{K{\prime}} C_i \left( \lambda_{\text{rgb}} L_1(\hat{I_i}, I_i) + \lambda{\text{ssim}} L_{\text{ssim}}(\hat{I_i}, I_i) + \lambda{\text{lpips}} L_{\text{lpips}}(\hat{I_i}, I_i) \right) $$
in this equation it seems like the loss is calculated between the 32 generated frames$\hat{I_i}$ with its GT frames $I_i$ .
Which GT frames are you comparing with? Is it the input sparse view? or is it from the train dataset video?
Thank you in advance for your time to reply to this issue.
Best regards
Frank
The text was updated successfully, but these errors were encountered: