0
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

Mixture of Gaussians-VAE (MoG-VAE)

Last updated at Posted at 2025-02-12
1 / 9

1. Firstly

Thank you for reading!
I have recently studied a generative shape method for Point Clouds.
As part of my study, I read the paper “VAE with VampPrior”.
I wanted to compare the performance of MoG-VAE with the standard VAE.
So, I wrote this article in English to practice and improve my English.

2. Differences between VAE and MoG-VAE

VAE (Variational Autoencoder)

In VAE, the prior distribution $ p(z) $ is a simple Gaussian distribution.

$$
p(z) = \mathcal{N}(0, I) \tag1
$$

  • The latent space follows a standard normal distribution.
  • This makes the latent space continuous and smooth but less flexible.

MoG-VAE (Mixture of Gaussians VAE)

In MoG-VAE, the prior distribution $ p(z) $ is a Mixture of Gaussians (MoG).

$$
p(z) = \sum_{k=1}^{K} \pi_k \mathcal{N}(z | \mu_k, \Sigma_k) \tag2
$$

  • $ K $ is the number of Gaussian components.

  • $ \pi_k $ are mixture weights ($ \sum_{k=1}^{K} \pi_k = 1 $).

  • $ \mu_k $ and $ \Sigma_k $ are the mean and covariance of each Gaussian component.

  • This makes the latent space more flexible and diverse.

Key Difference in Latent Space

  • VAE: Single Gaussian → Limited capacity to capture complex structures.
  • MoG-VAE: Mixture of Gaussians → Better at capturing multimodal and complex latent distributions.

3. KL Divergence in VAE and MoG-VAE

In both VAE and MoG-VAE, the KL divergence term in the loss function measures how close the approximate posterior $ q(z|x) $ is to the prior distribution $ p(z) $. However, the calculation differs due to the difference in the prior distributions.

3-1. VAE (Standard Gaussian Prior)

The prior $ p(z) $ is a standard Gaussian distribution $ \mathcal{N}(0, I) $.
Thus, the KL divergence has a closed-form solution:

$$
\text{KL}(q(z|x) \parallel p(z)) = \frac{1}{2} \sum_{i=1}^{z} \left( 1 + \log(\sigma_i^2) - \mu_i^2 - \sigma_i^2 \right) \tag3
$$

  • $z$ is Latent dimension.
  • $ \mu_i $ and $ \sigma_i $ are the mean and standard deviation of $ q(z|x) $.
  • This term penalizes large deviations of $ q(z|x) $ from the standard Gaussian.

3-2. MoG-VAE (Mixture of Gaussians Prior)

For MoG-VAE, the prior $ p(z) $ is a Mixture of Gaussians (MoG):

$$
p(z) = \sum_{k=1}^{K} \pi_k \mathcal{N}(z | \mu_k, \Sigma_k) \tag4
$$

Thus, the KL divergence for MoG-VAE can be expressed as:

$$
\text{KL}(q(z|x) \parallel p(z)) = \frac{1}{2} \sum_{k=1}^{K} \pi_k \left( \mu_k^2 + \sigma_k^2 - \log \sigma_k^2 - 1 \right) \tag5
$$

  • $ \mu_k $ and $ \sigma_k^2 $ are the mean and variance of the $ k $-th Gaussian component.
  • The mixture weights $ \pi_k $ are computed using the Softmax function:

$$
\pi_k = \frac{\exp(\alpha_k)}{\sum_{j=1}^{K} \exp(\alpha_j)} \tag6
$$


  • Monte Carlo sampling can be used to approximate the KL divergence in cases where a direct analytical solution is not available.
  • Softmax normalization ensures that the mixture weights $ \pi_k $ sum to 1.
  • This approach makes the latent space more flexible and expressive compared to VAE, allowing the model to capture more complex data distributions.


4. Datasets for evaluation

  • I used the chair dataset from ModelNet.
  • The number of points in each point cloud is 5,000.

image.png

5. Training Settings

5-1. VAE Settings

  • Latent dimension$z$: 3
  • Learning rate: 1.0e-5
  • Optimizer: Adam
  • Total Loss: formula(7)

$$
L_\text{total} = \frac{1}{N} \sum_{i=1}^{N} | x_i - x_i^\text{reconst} |^2 + \frac{1}{2} \sum_{i=1}^{z} \left( 1 + \log(\sigma_i^2) - \mu_i^2 - \sigma_i^2 \right) \tag7
$$

5-2. MoG-VAE Settings

  • Number of Gaussian components$K$: 2
  • Latent dimension$z$: 3
  • Learning rate: 1.0e-5
  • Optimizer: Adam
  • Total Loss: formula(8)

$$
L_\text{total} = \frac{1}{N} \sum_{i=1}^{N} | x_i - x_i^\text{reconst} |^2 + \frac{1}{2} \sum_{k=1}^{K} \pi_k \left( \mu_k^2 + \sigma_k^2 - \log \sigma_k^2 - 1 \right) \tag8
$$

In this study, I chose three latent dimensions because they make the latent space easier to interpret. Based on my experience, three latent variables are enough to generate shapes. Therefore, I prioritized interpretability over using higher dimensions.

6. Both VAE Architectures

I used PointNet (point-wise convolution and max pooling) for the encoder and transposed convolution for the decoder.

image.png

6-1. Encoder

INPUT (N × 5000) → Conv1d (N × 64) → Conv1d (N × 128) → Conv1d (N × 1024)  
→ Adaptive Max Pooling → Linear (1024 → 512) → Linear (512 → 256) → Linear (256 → 9)  
→ Linear (9 → z) for μ (mean) and log σ² (log variance) 

6-2. Decoder

INPUT (z) → Linear (n_z → 1024) → Linear (1024 → 512)  
→ ConvTranspose1d (512 → 1024) → ConvTranspose1d (1024 → 2048)  
→ Linear (2048 → 5000)  

7. Performance Evaluation

The reconstructed quality of VAE and MoG-VAE is evaluated using Chamfer Distance (CD) and Earth Mover's Distance (EMD).


7-1. Chamfer Distance (CD) Formula

The Chamfer Distance (CD) measures the distance between two point sets $ P $ and $ Q $. It is defined as:

$$
\text{CD}(P, Q) = \frac{1}{|P|} \sum_{p \in P} \min_{q \in Q} | p - q |^2 + \frac{1}{|Q|} \sum_{q \in Q} \min_{p \in P} | q - p |^2 \tag9
$$

Where:

  • $ P $ and $ Q $ are two point sets
  • $ p $ and $ q $ are points from sets $ P $ and $ Q $ respectively
  • $ | \cdot | $ denotes the Euclidean distance
  • $|P|$ and $|Q|$ are the total number of points in sets $ P $ and $ Q $

Chamfer Distance computes the average squared distance from each point in one set to its nearest neighbor in the other set.


7-2. Earth Mover's Distance (EMD) Formula

Earth Mover's Distance (EMD) is a measure of the distance between two probability distributions over a region $ \mathbb{R}^d $. It can be defined as an optimization problem that finds the minimal cost to transform one distribution into another.It is defined as:

$$
\text{EMD}(P, Q) = \min_{\gamma \in \Gamma(P, Q)} \sum_{(p, q) \in P \times Q} \gamma(p, q) | p - q | \tag{10}
$$

Where:

  • $ P $ and $ Q $ are two distributions or point sets.
  • $ | p - q | $ represents the distance (typically Euclidean) between points $ p \in P $ and $ q \in Q $.
  • $ \Gamma(P, Q) $ is the set of all valid transportation plans (flow functions) between distributions $ P $ and $ Q $.
  • $ \gamma(p, q) $ is the amount of "mass" moved from $ p $ to $ q $.

8. Result

8-1. VAE Reconstrustion Quality

image.png

8-2. MoG-VAE Reconstrustion Quality

image.png

I couldn't see any difference between VAE and MoG-VAE from the visualizations.
Therefore, I should compare VAE and MoG-VAE using the absolute values of CD and EMD.
So, I compared them.

8-3. Comparison of VAE and MoG-VAE

Design MoG-VAE CD MoG-VAE EMD VAE CD VAE EMD
Design1 0.1929 0.2169 0.1935 0.2176
Design2 0.1278 0.1484 0.1286 0.1551
Design3 0.1570 0.1872 0.1567 0.1843
Design4 0.1422 0.1879 0.1462 0.1877
Design5 0.1516 0.1873 0.1497 0.1844
Design6 0.1993 0.2154 0.2031 0.2177
Design7 0.1823 0.2130 0.1761 0.2101
Design8 0.1371 0.1538 0.1336 0.1572
Design9 0.1607 0.2189 0.1714 0.2330
Average 0.1612 0.1921 0.1621 0.1941

image.png

The reconstruction quality of MoG-VAE is higher than that of VAE based on this result. In my opinion, this is because MoG-VAE has a more diverse and complex latent space.

Finally

I compared the reconstruction quality of VAE and MoG-VAE.
As expected, MoG-VAE showed higher quality compared to VAE.
Next, I will check the effect of changing the reconstruction loss function from MSE to CD.
Will the reconstruction quality improve after changing the loss function?
I'm excited to find out!
Thank you for reading.
If I have time, I will upload the code for this study to my GitHub!:stuck_out_tongue_closed_eyes:

Reference

0
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?