This autoencoder will compress the size from 3x128x128 to 3x16x16. This smaller size is easier for the rectified flow network to learn.
The block of the autoencoder is based on MobileNetV2, it is basically an inverted residual block (the channels are expanded inside the block instead of compressed).
For the encoder I use strided convolution to retain as much information as possible. But on the decoder, I use nearest neighbor upsampling because it yields the best reconstruction.
To increase the sharpness of the output, I added vgg loss and laplacian loss. Then to prevent the latent space from exploding, I added a loss term that will push the latent space from -1 to 1.
Watch video Writing Rectified Flow Network in Python Part 1 - The Autoencoder online without registration, duration hours minute second in high quality. This video was added by user Hilmi Yafi A 28 August 2024, don't forget to share it with your friends and acquaintances, it has been viewed on our site 104 once and liked it 1 people.