背景故事
之前有幸在peer teaching时听到学长提起过GAN,又想到了用GAN可以造纸片er人,于是就把我整得非常感兴趣,决定动手一试。这次是在新机器上进行的测试,环境是tensorflow2.4。
开始整活
值得一提的是,光是这个数据集就让人大呼过瘾
这次由于是使用最简单的DCGAN入手,所以没有对训练集进行划分标签。下面逐步对代码进行解析。
首先还是import相关模块
import tensorflow as tf
import glob
import imageio
import matplotlib.pyplot as plt
import numpy as np
import os
import PIL
import tensorflow.keras.backend as K
from tensorflow.keras import layers
import time
from PIL import Image
from tensorflow.keras import Model
from IPython import display
然后会有一步特别神的操作,这一步是将GPU内存设置为按需使用,之前用老版本tf都没有。。。
gpus = tf.config.experimental.list_physical_devices('GPU')
for gpu in gpus:
tf.config.experimental.set_memory_growth(gpu, True)
设置路径、各种参数
train_txt = r"C:\Users\1\Desktop\AIProject\Labels\Anime_images\Anime_labels_small.csv"
train_labels_savepath = r"C:\Users\1\Desktop\AIProject\Datasave\anime_labels_save.npy"
train_images_savepath =r"C:\Users\1\Desktop\AIProject\Datasave\anime_images_save.npy"
checkpoint_dir = r'C:\Users\1\Desktop\AIProject\CheckPoints\GAN_anime_checkpoints'
image_size = 128 #图片大小
BUFFER_SIZE = 150
BATCH_SIZE = 4
Filters = 512 # 网络FLITER主要规模,设置这个是为了方便调整网络大小
EPOCHS = 500 # 总epoch数
noise_dim = 100
num_examples_to_generate = 4 #展示图片的数目
num_examples_of_all = 64 #电脑实际生成的样本数目
lr_gen = 0.01 #生成器学习率
lr_dis = 0.001 #判别器学习率
下面是复用之前关于树叶样态分类博客里的代码,增加了许多小改动,用作对动漫图像的读取操作。
def generateds(txt):
f = open(txt, 'r') # 以只读形式打开txt文件
contents = f.readlines() # 读取文件中所有行
f.close() # 关闭txt文件
x = [] # 建立空列表
for content in contents: # 逐行取出
print(content)
value = content.split(",") # 以,分开,图片路径为value[0]
img_path = value[0] # GET图片路径
print(img_path)
img = Image.open(img_path) # 读入图片
img = img.resize((image_size, image_size))
img = np.array(img) # 图片变为3通道RGB
img = (img - 127.5) / 127.5 # 数据归一化 (实现预处理)
x.append(img) # 归一化后的数据,贴到列表x
print('loading : ' + content) # 打印状态提示
x = np.array(x) # 变为np.array格式
return x # 返回输入特征x
if os.path.exists(train_images_savepath) and os.path.exists(train_labels_savepath):
global all_num
print('-------------Load Datasets-----------------')
train_images_save = np.load(train_images_savepath)
train_images = np.reshape(train_images_save, (len(train_images_save), image_size, image_size, 3))
all_num = len(train_images_save)
else:
print('-------------Generate Datasets-----------------')
train_images = generateds(train_txt)
print('-------------Save Datasets-----------------')
train_images_save = np.reshape(train_images, (len(train_images), -1))
# all_num是一个用作储存总图片张数的量,将会对训练时进度的可视化起作用
all_num = len(train_images_save)
np.save(train_images_savepath, train_images)
下面详细介绍一下网络结构, DCGAN总体大概就是这个样子,分为Generator和discriminator俩部分,我的代码在这个基础上做了一些修改。
下面先从我代码中的G(Generator)介绍起走,我的generator在上图的基础上做了很多改进。博主之前有幸看到了resNet的结构,感觉很妙,于是想着在generator的结构里也试着加入类似resNet这样的一种结构,于是便定义了一个残差反卷积的BLOCK,用作图像生成部分。
class ResGANblock(Model):
def __init__(self, filters):
super(ResGANblock, self).__init__()
self.batchnorm = layers.BatchNormalization()
self.a = layers.LeakyReLU()
self.conv_stride_1 = layers.Conv2D(filters, (3, 3), strides=1, padding='same', use_bias=False)
self.trans_stride1_1 = layers.Conv2DTranspose(filters, (3, 3), strides=2, padding='same', use_bias=False)
self.trans_stride1_2 = layers.Conv2DTranspose(filters, (3, 3), strides=1, padding='same', use_bias=False)
def call(self, inputs):
x = self.trans_stride1_1(inputs)
x1 = self.batchnorm(x)
x = self.a(x1)
x = self.trans_stride1_2(x)
x = self.batchnorm(x)
out = self.a(x + x1)
return out
然后是整个generator的结构,由于残差结构相对运算速度会更快一点,于是我连续使用了很多次resGanBlock,这样使得出现了(4imagine_size, 4imagine_size)的feature map,于是我在使用两次stride=2的正常卷积卷积得来的。这样我觉得可以使得图像里包含更多特征信息,让图像更好康。
def make_generator_model():
model = tf.keras.Sequential()
model.add(layers.Dense(int(image_size / 16) * int(image_size / 16) * Filters, use_bias=False, input_shape=(100,)))
model.add(layers.LeakyReLU())
model.add(layers.Dropout(0.3))
model.add(layers.Reshape((int(image_size / 16), int(image_size / 16), Filters)))
model.add(layers.Conv2DTranspose(Filters, (5, 5), strides=2, padding='same', use_bias=False))
model.add(layers.BatchNormalization())
model.add(layers.Activation('tanh'))
model.add(layers.Dropout(0.3))
model.add(layers.Conv2DTranspose(Filters / 2, (5, 5), strides=2, padding='same', use_bias=False))
model.add(layers.BatchNormalization())
model.add(layers.Activation('tanh'))
model.add(layers.Dropout(0.3))
model.add(ResGANblock(Filters / 4))
model.add(ResGANblock(Filters / 8))
model.add(ResGANblock(Filters / 16))
model.add(ResGANblock(Filters / 16))
model.add(layers.Conv2D(Filters / 2, (5, 5), strides=2, padding='same', use_bias=False))
model.add(layers.BatchNormalization())
model.add(layers.Activation('tanh'))
model.add(layers.Dropout(0.3))
model.add(layers.Conv2D(Filters / 4, (5, 5), strides=2, padding='same', use_bias=False))
model.add(layers.BatchNormalization())
model.add(layers.Activation('tanh'))
model.add(layers.Dropout(0.3))
model.add(layers.Conv2DTranspose(3, 5, strides=1, padding='SAME', use_bias=False))
assert model.output_shape == (None, int(image_size), int(image_size), 3)
model.add(layers.Activation('tanh'))
return model
这里初始化Generator,送入噪声,产生一批样本
generator = make_generator_model()
noise = tf.random.normal([1, noise_dim])
generated_image = generator(noise, training=False)
再定义Discrimminator,为最普通的CNN卷积网络(本来打算用resnet的,后面发现resnet学习速度太快了,使得生成器loss爆炸,于是就改用了最普通的CNN)
def make_discriminator_model():
model = tf.keras.Sequential()
model.add(layers.Conv2D(512, (3, 3), strides=1, padding='same'))
model.add(layers.BatchNormalization())
model.add(layers.LeakyReLU())
model.add(tf.keras.layers.MaxPool2D(pool_size=(2, 2), strides=2, padding='same'))
model.add(layers.Dropout(0.3))
model.add(layers.Conv2D(256, (3, 3), strides=1, padding='same'))
model.add(layers.BatchNormalization())
model.add(layers.LeakyReLU())
model.add(tf.keras.layers.MaxPool2D(pool_size=(2, 2), strides=2, padding='same'))
model.add(layers.Dropout(0.3))
model.add(layers.Conv2D(128, (3, 3), strides=1, padding='same'))
model.add(layers.BatchNormalization())
model.add(layers.LeakyReLU())
model.add(tf.keras.layers.MaxPool2D(pool_size=(2, 2), strides=2, padding='same'))
model.add(layers.Dropout(0.3))
model.add(layers.Conv2D(64, (3, 3), strides=1, padding='same'))
model.add(layers.BatchNormalization())
model.add(layers.LeakyReLU())
model.add(tf.keras.layers.MaxPool2D(pool_size=(2, 2), strides=2, padding='same'))
model.add(layers.Dropout(0.3))
model.add(layers.Flatten())
model.add(layers.Dense(1))
model.add(layers.Activation('sigmoid'))
return model
这里初始化Discrimminator,并送入新鲜出炉的假图像。
discriminator = make_discriminator_model()
decision = discriminator(generated_image)
再定义损失函数,使用标签平滑处理、优化器的参数balabala等系列操作
# 该方法返回计算交叉熵损失的辅助函数
cross_entropy = tf.keras.losses.BinaryCrossentropy(from_logits=True)
def discriminator_loss(real_output, fake_output):
real_loss = cross_entropy(tf.ones_like(real_output)*0.8, real_output)
fake_loss = cross_entropy(tf.zeros_like(fake_output), fake_output)
total_loss = real_loss + fake_loss
return total_loss
def generator_loss(fake_output):
return cross_entropy(tf.ones_like(fake_output), fake_output)
generator_optimizer = tf.keras.optimizers.Adam(lr = lr_gen)
discriminator_optimizer = tf.keras.optimizers.Adam(lr = lr_dis)
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt")
checkpoint = tf.train.Checkpoint(generator_optimizer=generator_optimizer,
discriminator_optimizer=discriminator_optimizer,
generator=generator,
discriminator=discriminator)
自定义训练过程,这里对于判别器的输入,我增加了噪声,从而加大对判别器的训练难度,稳定生成器的训练。
@tf.function #添加修饰器使得函数train_step被魔改
def train_step(images):
noise = tf.random.normal([BATCH_SIZE, noise_dim])
images1 = images + tf.random.normal([images.shape[0], image_size, image_size, 3], dtype=tf.float64) / 5.
with tf.GradientTape() as gen_tape, tf.GradientTape() as disc_tape:
generated_images = generator(noise, training=True)
real_output = discriminator(images1, training=True)
fake_output = discriminator(generated_images, training=True)
disc_loss = discriminator_loss(fake_output, real_output)
gen_loss = generator_loss(fake_output)
tf.print("gen_loss:", gen_loss)
tf.print("disc_loss:", disc_loss)
gradients_of_generator = gen_tape.gradient(gen_loss, generator.trainable_variables)
gradients_of_discriminator = disc_tape.gradient(disc_loss, discriminator.trainable_variables)
generator_optimizer.apply_gradients(zip(gradients_of_generator, generator.trainable_variables))
discriminator_optimizer.apply_gradients(zip(gradients_of_discriminator, discriminator.trainable_variables))
def train(dataset, epochs):
for epoch in range(epochs):
start = time.time()
batch_num1 = BATCH_SIZE
batch_num2 = batch_num1
for image_batch in dataset:
train_step(image_batch)
tf.print("Process:{}".format(batch_num1 / all_num)) # 打印每一轮epoch的进程即0->1表示
batch_num1 += batch_num2
seed = tf.random.normal([num_examples_to_generate, noise_dim])
generate_and_save_images(generator,
epoch + 1, seed) # 保存图像的函数,下文会讲
# 每 5 个 epoch 保存一次模型
if (epoch + 3) % 5 == 0:
checkpoint.save(file_prefix = checkpoint_prefix)
print('Time for epoch {} is {} sec'.format(epoch + 1, time.time()-start))
# 最后一个 epoch 结束后生成图片
display.clear_output(wait=True)
generate_and_save_images(generator,
epochs, seed)
最后设置保存示例图像的函数,这里面我其实偷偷用了一点小tricks。就是每次在生成图像的时候,我默认是让电脑生成64张图像,然后让判别器判定,将这64张图像中判别器评分最高的4张图像保存下来,这样会更好看一点。
def generate_and_save_images(model, epoch, test_input):
# 注意 training` 设定为 False
# 因此,所有层都在推理模式下运行(batchnorm)。
predictions = model(test_input, training=False)
out_put = discriminator(predictions, training=False)
out_put = np.reshape(out_put, (len(out_put),))
best = np.argsort(out_put)
t = num_examples_to_generate ** 0.5
for i in range(num_examples_to_generate):
plt.subplot(t, t, i+1)
plt.imshow((predictions[best[-i]] * 127.5 + 127.5) / 255.)
plt.axis('off')
plt.savefig(r'C:\Users\1\Desktop\AIProject\Pictures_save\Animes\image_at_epoch_{:04d}.png'.format(epoch))
plt.show(block=False)
plt.pause(4)
plt.close("all")
下面是代码运行的示例,可以实时查看到loss值和训练进度。
gen_loss: 0.84012115
disc_loss: 2.2790134
Process:0.00024588148512417015
gen_loss: 1.01613903
disc_loss: 1.8529706
Process:0.0004917629702483403
gen_loss: 0.265191197
disc_loss: 1.66000307
Process:0.0007376444553725104
gen_loss: 1.13716352
disc_loss: 2.41881537
Process:0.0009835259404966806
gen_loss: 0.33330521
disc_loss: 2.66267729
Process:0.0012294074256208507
...
结果展示
当image_size=64,num_examples_to_generate=16时,经过一番训练生成了一些稍微正常点的图片
当image_size=128,num_examples_to_generate=4时,此时图片则显得有些阴间
改进思路
不难看出,上述生成的图片中看出,每一轮epoch中生成的图片风格差别不大。表明DCGAN出现了模式崩溃。这时,为了让生成的图像更多元化,可以在生成器的输入噪声上做手脚,使用一个隐变量latent来替代生成器的噪声,同时加入另外一个辅助鉴别器,它与原来判别器共享前几层参数,来对送入原来判别器的图像进行解耦,重构原来的隐变量latent。在损失函数里面用到互信息。这就是infoGAN的基本操作。
然鹅悲伤的是,由于博主数理基础较为薄弱,GAN有很多数理内容看起来比较吃力,加上其他事情,于是最近GAN的玩耍暂时先搁置了。
参考资料: