背景故事

之前有幸在peer teaching时听到学长提起过GAN,又想到了用GAN可以造纸片er人,于是就把我整得非常感兴趣,决定动手一试。这次是在新机器上进行的测试,环境是tensorflow2.4。

开始整活

值得一提的是,光是这个数据集就让人大呼过瘾

image_at_epoch_0002

这次由于是使用最简单的DCGAN入手,所以没有对训练集进行划分标签。下面逐步对代码进行解析。

首先还是import相关模块

import tensorflow as tf
import glob
import imageio
import matplotlib.pyplot as plt
import numpy as np
import os
import PIL
import tensorflow.keras.backend as K
from tensorflow.keras import layers
import time
from PIL import Image
from tensorflow.keras import Model
from IPython import display

然后会有一步特别神的操作,这一步是将GPU内存设置为按需使用,之前用老版本tf都没有。。。

gpus = tf.config.experimental.list_physical_devices('GPU')
for gpu in gpus:
    tf.config.experimental.set_memory_growth(gpu, True)

设置路径、各种参数

train_txt = r"C:\Users\1\Desktop\AIProject\Labels\Anime_images\Anime_labels_small.csv"
train_labels_savepath = r"C:\Users\1\Desktop\AIProject\Datasave\anime_labels_save.npy"
train_images_savepath =r"C:\Users\1\Desktop\AIProject\Datasave\anime_images_save.npy"
checkpoint_dir = r'C:\Users\1\Desktop\AIProject\CheckPoints\GAN_anime_checkpoints'
image_size = 128 #图片大小
BUFFER_SIZE = 150
BATCH_SIZE = 4
Filters = 512 # 网络FLITER主要规模,设置这个是为了方便调整网络大小
EPOCHS = 500 # 总epoch数
noise_dim = 100
num_examples_to_generate = 4 #展示图片的数目
num_examples_of_all = 64 #电脑实际生成的样本数目
lr_gen = 0.01 #生成器学习率
lr_dis = 0.001 #判别器学习率

下面是复用之前关于树叶样态分类博客里的代码,增加了许多小改动,用作对动漫图像的读取操作。

def generateds(txt):
    f = open(txt, 'r')  # 以只读形式打开txt文件
    contents = f.readlines()  # 读取文件中所有行
    f.close()  # 关闭txt文件
    x = []  # 建立空列表
    for content in contents:  # 逐行取出
        print(content)
        value = content.split(",")  # 以,分开,图片路径为value[0]
        img_path = value[0]  # GET图片路径
        print(img_path)
        img = Image.open(img_path)  # 读入图片
        img = img.resize((image_size, image_size))
        img = np.array(img)  # 图片变为3通道RGB
        img = (img - 127.5) / 127.5 # 数据归一化 (实现预处理)
        x.append(img)  # 归一化后的数据,贴到列表x
        print('loading : ' + content)  # 打印状态提示

    x = np.array(x)  # 变为np.array格式
    return x   # 返回输入特征x
if os.path.exists(train_images_savepath) and os.path.exists(train_labels_savepath):
    global all_num
    print('-------------Load Datasets-----------------')
    train_images_save = np.load(train_images_savepath)
    train_images = np.reshape(train_images_save, (len(train_images_save), image_size, image_size, 3))
    all_num = len(train_images_save)
else:
    print('-------------Generate Datasets-----------------')
    train_images = generateds(train_txt)

    print('-------------Save Datasets-----------------')
    train_images_save = np.reshape(train_images, (len(train_images), -1))
    # all_num是一个用作储存总图片张数的量,将会对训练时进度的可视化起作用
    all_num = len(train_images_save)
    np.save(train_images_savepath, train_images)

下面详细介绍一下网络结构, DCGAN总体大概就是这个样子,分为Generator和discriminator俩部分,我的代码在这个基础上做了一些修改。

DCGAN-PILAE-proposed-method-The-DCGAN-model-has-two-function-the-first-is-that-the-G

下面先从我代码中的G(Generator)介绍起走,我的generator在上图的基础上做了很多改进。博主之前有幸看到了resNet的结构,感觉很妙,于是想着在generator的结构里也试着加入类似resNet这样的一种结构,于是便定义了一个残差反卷积的BLOCK,用作图像生成部分。

class ResGANblock(Model):
    def __init__(self, filters):
        super(ResGANblock, self).__init__()
        self.batchnorm = layers.BatchNormalization()
        self.a = layers.LeakyReLU()

        self.conv_stride_1 = layers.Conv2D(filters, (3, 3), strides=1, padding='same', use_bias=False)
        self.trans_stride1_1 = layers.Conv2DTranspose(filters, (3, 3), strides=2, padding='same', use_bias=False)
        self.trans_stride1_2 = layers.Conv2DTranspose(filters, (3, 3), strides=1, padding='same', use_bias=False)

    def call(self, inputs):
        x = self.trans_stride1_1(inputs)
        x1 = self.batchnorm(x)
        x = self.a(x1)

        x = self.trans_stride1_2(x)
        x = self.batchnorm(x)
        out = self.a(x + x1)
        return out

然后是整个generator的结构,由于残差结构相对运算速度会更快一点,于是我连续使用了很多次resGanBlock,这样使得出现了(4imagine_size, 4imagine_size)的feature map,于是我在使用两次stride=2的正常卷积卷积得来的。这样我觉得可以使得图像里包含更多特征信息,让图像更好康。

def make_generator_model():
    model = tf.keras.Sequential()
    model.add(layers.Dense(int(image_size / 16) * int(image_size / 16) * Filters, use_bias=False, input_shape=(100,)))
    model.add(layers.LeakyReLU())
    model.add(layers.Dropout(0.3))
    model.add(layers.Reshape((int(image_size / 16), int(image_size / 16), Filters)))

    model.add(layers.Conv2DTranspose(Filters, (5, 5), strides=2, padding='same', use_bias=False))
    model.add(layers.BatchNormalization())
    model.add(layers.Activation('tanh'))
    model.add(layers.Dropout(0.3))

    model.add(layers.Conv2DTranspose(Filters / 2, (5, 5), strides=2, padding='same', use_bias=False))
    model.add(layers.BatchNormalization())
    model.add(layers.Activation('tanh'))
    model.add(layers.Dropout(0.3))

    model.add(ResGANblock(Filters / 4))

    model.add(ResGANblock(Filters / 8))

    model.add(ResGANblock(Filters / 16))

    model.add(ResGANblock(Filters / 16))

    model.add(layers.Conv2D(Filters / 2, (5, 5), strides=2, padding='same', use_bias=False))
    model.add(layers.BatchNormalization())
    model.add(layers.Activation('tanh'))
    model.add(layers.Dropout(0.3))

    model.add(layers.Conv2D(Filters / 4, (5, 5), strides=2, padding='same', use_bias=False))
    model.add(layers.BatchNormalization())
    model.add(layers.Activation('tanh'))
    model.add(layers.Dropout(0.3))

    model.add(layers.Conv2DTranspose(3, 5, strides=1, padding='SAME', use_bias=False))
    assert model.output_shape == (None, int(image_size), int(image_size), 3)
    model.add(layers.Activation('tanh'))

    return model

这里初始化Generator,送入噪声,产生一批样本

generator = make_generator_model()
noise = tf.random.normal([1, noise_dim])
generated_image = generator(noise, training=False)

再定义Discrimminator,为最普通的CNN卷积网络(本来打算用resnet的,后面发现resnet学习速度太快了,使得生成器loss爆炸,于是就改用了最普通的CNN)

def make_discriminator_model():

    model = tf.keras.Sequential()

    model.add(layers.Conv2D(512, (3, 3), strides=1, padding='same'))
    model.add(layers.BatchNormalization())
    model.add(layers.LeakyReLU())
    model.add(tf.keras.layers.MaxPool2D(pool_size=(2, 2), strides=2, padding='same'))
    model.add(layers.Dropout(0.3))

    model.add(layers.Conv2D(256, (3, 3), strides=1, padding='same'))
    model.add(layers.BatchNormalization())
    model.add(layers.LeakyReLU())
    model.add(tf.keras.layers.MaxPool2D(pool_size=(2, 2), strides=2, padding='same'))
    model.add(layers.Dropout(0.3))

    model.add(layers.Conv2D(128, (3, 3), strides=1, padding='same'))
    model.add(layers.BatchNormalization())
    model.add(layers.LeakyReLU())
    model.add(tf.keras.layers.MaxPool2D(pool_size=(2, 2), strides=2, padding='same'))
    model.add(layers.Dropout(0.3))

    model.add(layers.Conv2D(64, (3, 3), strides=1, padding='same'))
    model.add(layers.BatchNormalization())
    model.add(layers.LeakyReLU())
    model.add(tf.keras.layers.MaxPool2D(pool_size=(2, 2), strides=2, padding='same'))
    model.add(layers.Dropout(0.3))
    model.add(layers.Flatten())
    model.add(layers.Dense(1))
    model.add(layers.Activation('sigmoid'))
    return model

这里初始化Discrimminator,并送入新鲜出炉的假图像。

discriminator = make_discriminator_model()
decision = discriminator(generated_image)

再定义损失函数,使用标签平滑处理、优化器的参数balabala等系列操作

# 该方法返回计算交叉熵损失的辅助函数
cross_entropy = tf.keras.losses.BinaryCrossentropy(from_logits=True)


def discriminator_loss(real_output, fake_output):
    real_loss = cross_entropy(tf.ones_like(real_output)*0.8, real_output)
    fake_loss = cross_entropy(tf.zeros_like(fake_output), fake_output)
    total_loss = real_loss + fake_loss
    return total_loss


def generator_loss(fake_output):
    return cross_entropy(tf.ones_like(fake_output), fake_output)

generator_optimizer = tf.keras.optimizers.Adam(lr = lr_gen)
discriminator_optimizer = tf.keras.optimizers.Adam(lr = lr_dis)
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt")
checkpoint = tf.train.Checkpoint(generator_optimizer=generator_optimizer,
                                 discriminator_optimizer=discriminator_optimizer,
                                 generator=generator,
                                 discriminator=discriminator)

自定义训练过程,这里对于判别器的输入,我增加了噪声,从而加大对判别器的训练难度,稳定生成器的训练。

@tf.function #添加修饰器使得函数train_step被魔改
def train_step(images):
    noise = tf.random.normal([BATCH_SIZE, noise_dim])
    images1 = images + tf.random.normal([images.shape[0], image_size, image_size, 3], dtype=tf.float64) / 5.
    with tf.GradientTape() as gen_tape, tf.GradientTape() as disc_tape:
      generated_images = generator(noise, training=True)
      real_output = discriminator(images1, training=True)
      fake_output = discriminator(generated_images, training=True)
      disc_loss = discriminator_loss(fake_output, real_output)
      gen_loss = generator_loss(fake_output)
    tf.print("gen_loss:", gen_loss)
    tf.print("disc_loss:", disc_loss)
    gradients_of_generator = gen_tape.gradient(gen_loss, generator.trainable_variables)
    gradients_of_discriminator = disc_tape.gradient(disc_loss, discriminator.trainable_variables)
    generator_optimizer.apply_gradients(zip(gradients_of_generator, generator.trainable_variables))
    discriminator_optimizer.apply_gradients(zip(gradients_of_discriminator, discriminator.trainable_variables))

    
def train(dataset, epochs):
  for epoch in range(epochs):
    start = time.time()
    batch_num1 = BATCH_SIZE
    batch_num2 = batch_num1
    for image_batch in dataset:
      train_step(image_batch)
      tf.print("Process:{}".format(batch_num1 / all_num)) # 打印每一轮epoch的进程即0->1表示
      batch_num1 += batch_num2
      seed = tf.random.normal([num_examples_to_generate, noise_dim])
      generate_and_save_images(generator,
                             epoch + 1, seed) # 保存图像的函数,下文会讲

    # 每 5 个 epoch 保存一次模型
    if (epoch + 3) % 5 == 0:
      checkpoint.save(file_prefix = checkpoint_prefix)
    print('Time for epoch {} is {} sec'.format(epoch + 1, time.time()-start))

  # 最后一个 epoch 结束后生成图片
  display.clear_output(wait=True)
  generate_and_save_images(generator,
                           epochs, seed) 

最后设置保存示例图像的函数,这里面我其实偷偷用了一点小tricks。就是每次在生成图像的时候,我默认是让电脑生成64张图像,然后让判别器判定,将这64张图像中判别器评分最高的4张图像保存下来,这样会更好看一点。

def generate_and_save_images(model, epoch, test_input):
  # 注意 training` 设定为 False
  # 因此,所有层都在推理模式下运行(batchnorm)。
  predictions = model(test_input, training=False)
  out_put = discriminator(predictions, training=False)
  out_put = np.reshape(out_put, (len(out_put),))
  best = np.argsort(out_put)
  t = num_examples_to_generate ** 0.5
  for i in range(num_examples_to_generate):
    plt.subplot(t, t, i+1)
    plt.imshow((predictions[best[-i]] * 127.5 + 127.5) / 255.)
    plt.axis('off')
  plt.savefig(r'C:\Users\1\Desktop\AIProject\Pictures_save\Animes\image_at_epoch_{:04d}.png'.format(epoch))
  plt.show(block=False)
  plt.pause(4)
  plt.close("all")

下面是代码运行的示例,可以实时查看到loss值和训练进度。

gen_loss: 0.84012115
disc_loss: 2.2790134
Process:0.00024588148512417015
gen_loss: 1.01613903
disc_loss: 1.8529706
Process:0.0004917629702483403
gen_loss: 0.265191197
disc_loss: 1.66000307
Process:0.0007376444553725104
gen_loss: 1.13716352
disc_loss: 2.41881537
Process:0.0009835259404966806
gen_loss: 0.33330521
disc_loss: 2.66267729
Process:0.0012294074256208507
...

结果展示

当image_size=64,num_examples_to_generate=16时,经过一番训练生成了一些稍微正常点的图片image_at_epoch_0001 image_at_epoch_0003 image_at_epoch_0004 image_at_epoch_0005

当image_size=128,num_examples_to_generate=4时,此时图片则显得有些阴间

image1_at_epoch_0001 image1_at_epoch_0002 image1_at_epoch_0003 image1_at_epoch_0004

改进思路

不难看出,上述生成的图片中看出,每一轮epoch中生成的图片风格差别不大。表明DCGAN出现了模式崩溃。这时,为了让生成的图像更多元化,可以在生成器的输入噪声上做手脚,使用一个隐变量latent来替代生成器的噪声,同时加入另外一个辅助鉴别器,它与原来判别器共享前几层参数,来对送入原来判别器的图像进行解耦,重构原来的隐变量latent。在损失函数里面用到互信息。这就是infoGAN的基本操作。

然鹅悲伤的是,由于博主数理基础较为薄弱,GAN有很多数理内容看起来比较吃力,加上其他事情,于是最近GAN的玩耍暂时先搁置了。

参考资料:

深度卷积生成对抗网络 | Tensorflow Core