4.4 MiB
Copyright 2018 The TensorFlow Authors.
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
Neural style transfer
This tutorial uses deep learning to compose one image in the style of another image (ever wish you could paint like Picasso or Van Gogh?). This is known as _neural style transfer and the technique is outlined in A Neural Algorithm of Artistic Style (Gatys et al.).
Note: This tutorial demonstrates the original style-transfer algorithm. It optimizes the image content to a particular style. Modern approaches train a model to generate the stylized image directly (similar to CycleGAN). This approach is much faster (up to 1000x).
For a simple application of style transfer with a pretrained model from TensorFlow Hub, check out the Fast style transfer for arbitrary styles tutorial that uses an arbitrary image stylization model. For an example of style transfer with TensorFlow Lite, refer to Artistic style transfer with TensorFlow Lite.
Neural style transfer is an optimization technique used to take two images—a _content image and a style reference image (such as an artwork by a famous painter)—and blend them together so the output image looks like the content image, but “painted” in the style of the style reference image.
This is implemented by optimizing the output image to match the content statistics of the content image and the style statistics of the style reference image. These statistics are extracted from the images using a convolutional network.
For example, let’s take an image of this dog and Wassily Kandinsky's Composition 7:
Yellow Labrador Looking, from Wikimedia Commons by Elf. License CC BY-SA 3.0
Now, what would it look like if Kandinsky decided to paint the picture of this Dog exclusively with this style? Something like this?
Setup
Import and configure modules
import os
import tensorflow as tf
# Load compressed models from tensorflow_hub
os.environ['TFHUB_MODEL_LOAD_FORMAT'] = 'COMPRESSED'
2024-09-26 13:27:33.370332: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`. 2024-09-26 13:27:33.397724: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-09-26 13:27:33.397742: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-09-26 13:27:33.398369: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-09-26 13:27:33.403240: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-09-26 13:27:33.940619: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
import IPython.display as display
import matplotlib.pyplot as plt
import matplotlib as mpl
mpl.rcParams['figure.figsize'] = (12, 12)
mpl.rcParams['axes.grid'] = False
import numpy as np
import PIL.Image
import time
import functools
def tensor_to_image(tensor):
tensor = tensor*255
tensor = np.array(tensor, dtype=np.uint8)
if np.ndim(tensor)>3:
assert tensor.shape[0] == 1
tensor = tensor[0]
return PIL.Image.fromarray(tensor)
Download images and choose a style image and a content image:
content_path = tf.keras.utils.get_file('YellowLabradorLooking_new.jpg', 'https://storage.googleapis.com/download.tensorflow.org/example_images/YellowLabradorLooking_new.jpg')
style_path = tf.keras.utils.get_file('kandinsky5.jpg','https://storage.googleapis.com/download.tensorflow.org/example_images/Vassily_Kandinsky%2C_1913_-_Composition_7.jpg')
Visualize the input
Define a function to load an image and limit its maximum dimension to 512 pixels.
def load_img(path_to_img):
max_dim = 512
img = tf.io.read_file(path_to_img)
img = tf.image.decode_image(img, channels=3)
img = tf.image.convert_image_dtype(img, tf.float32)
shape = tf.cast(tf.shape(img)[:-1], tf.float32)
long_dim = max(shape)
scale = max_dim / long_dim
new_shape = tf.cast(shape * scale, tf.int32)
img = tf.image.resize(img, new_shape)
img = img[tf.newaxis, :]
return img
Create a simple function to display an image:
def imshow(image, title=None):
if len(image.shape) > 3:
image = tf.squeeze(image, axis=0)
plt.imshow(image)
if title:
plt.title(title)
chel_path = "/home/aneta/neural_transfer/neural_style_pytorch/images/chelm.jpg"
owce_path = "/home/aneta/neural_transfer/neural_style_pytorch/images/owce.jpg"
content_image = load_img(owce_path)
style_image = load_img(chel_path)
plt.subplot(1, 2, 1)
imshow(content_image, 'Content Image')
plt.subplot(1, 2, 2)
imshow(style_image, 'Style Image')
2024-09-26 13:27:41.285938: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:887] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node Your kernel may have been built without NUMA support. 2024-09-26 13:27:41.305101: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:887] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node Your kernel may have been built without NUMA support. 2024-09-26 13:27:41.305135: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:887] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node Your kernel may have been built without NUMA support. 2024-09-26 13:27:41.307059: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:887] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node Your kernel may have been built without NUMA support. 2024-09-26 13:27:41.307093: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:887] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node Your kernel may have been built without NUMA support. 2024-09-26 13:27:41.307109: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:887] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node Your kernel may have been built without NUMA support. 2024-09-26 13:27:41.418362: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:887] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node Your kernel may have been built without NUMA support. 2024-09-26 13:27:41.418411: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:887] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node Your kernel may have been built without NUMA support. 2024-09-26 13:27:41.418419: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2022] Could not identify NUMA node of platform GPU id 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2024-09-26 13:27:41.418444: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:887] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node Your kernel may have been built without NUMA support. 2024-09-26 13:27:41.418459: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1929] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 1768 MB memory: -> device: 0, name: NVIDIA GeForce RTX 3050 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6
Fast Style Transfer using TF-Hub
This tutorial demonstrates the original style-transfer algorithm, which optimizes the image content to a particular style. Before getting into the details, let's see how the TensorFlow Hub model does this:
import tensorflow_hub as hub
hub_model = hub.load('https://tfhub.dev/google/magenta/arbitrary-image-stylization-v1-256/2')
stylized_image = hub_model(tf.constant(content_image), tf.constant(style_image))[0]
tensor_to_image(stylized_image)
2024-09-26 13:27:43.666026: I external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:454] Loaded cuDNN version 8902 2024-09-26 13:27:43.752831: I external/local_tsl/tsl/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory 2024-09-26 13:27:45.370099: I external/local_tsl/tsl/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
Define content and style representations
Use the intermediate layers of the model to get the _content and style representations of the image. Starting from the network's input layer, the first few layer activations represent low-level features like edges and textures. As you step through the network, the final few layers represent higher-level features—object parts like wheels or eyes. In this case, you are using the VGG19 network architecture, a pretrained image classification network. These intermediate layers are necessary to define the representation of content and style from the images. For an input image, try to match the corresponding style and content target representations at these intermediate layers.
Load a VGG19 and test run it on our image to ensure it's used correctly:
x = tf.keras.applications.vgg19.preprocess_input(content_image*255)
x = tf.image.resize(x, (224, 224))
vgg = tf.keras.applications.VGG19(include_top=True, weights='imagenet')
prediction_probabilities = vgg(x)
prediction_probabilities.shape
TensorShape([1, 1000])
predicted_top_5 = tf.keras.applications.vgg19.decode_predictions(prediction_probabilities.numpy())[0]
[(class_name, prob) for (number, class_name, prob) in predicted_top_5]
[('megalith', 0.7610591), ('king_penguin', 0.20211922), ('ram', 0.014054936), ('hay', 0.0097798), ('bighorn', 0.0025422436)]
Now load a VGG19
without the classification head, and list the layer names
vgg = tf.keras.applications.VGG19(include_top=False, weights='imagenet')
print()
for layer in vgg.layers:
print(layer.name)
input_2 block1_conv1 block1_conv2 block1_pool block2_conv1 block2_conv2 block2_pool block3_conv1 block3_conv2 block3_conv3 block3_conv4 block3_pool block4_conv1 block4_conv2 block4_conv3 block4_conv4 block4_pool block5_conv1 block5_conv2 block5_conv3 block5_conv4 block5_pool
Choose intermediate layers from the network to represent the style and content of the image:
content_layers = ['block5_conv2']
style_layers = ['block1_conv1',
'block2_conv1',
'block3_conv1',
'block4_conv1',
'block5_conv1']
num_content_layers = len(content_layers)
num_style_layers = len(style_layers)
Intermediate layers for style and content
So why do these intermediate outputs within our pretrained image classification network allow us to define style and content representations?
At a high level, in order for a network to perform image classification (which this network has been trained to do), it must understand the image. This requires taking the raw image as input pixels and building an internal representation that converts the raw image pixels into a complex understanding of the features present within the image.
This is also a reason why convolutional neural networks are able to generalize well: they’re able to capture the invariances and defining features within classes (e.g. cats vs. dogs) that are agnostic to background noise and other nuisances. Thus, somewhere between where the raw image is fed into the model and the output classification label, the model serves as a complex feature extractor. By accessing intermediate layers of the model, you're able to describe the content and style of input images.
Build the model
The networks in tf.keras.applications
are designed so you can easily extract the intermediate layer values using the Keras functional API.
To define a model using the functional API, specify the inputs and outputs:
model = Model(inputs, outputs)
This following function builds a VGG19 model that returns a list of intermediate layer outputs:
def vgg_layers(layer_names):
""" Creates a VGG model that returns a list of intermediate output values."""
# Load our model. Load pretrained VGG, trained on ImageNet data
vgg = tf.keras.applications.VGG19(include_top=False, weights='imagenet')
vgg.trainable = False
outputs = [vgg.get_layer(name).output for name in layer_names]
model = tf.keras.Model([vgg.input], outputs)
return model
And to create the model:
style_extractor = vgg_layers(style_layers)
style_outputs = style_extractor(style_image*255)
#Look at the statistics of each layer's output
for name, output in zip(style_layers, style_outputs):
print(name)
print(" shape: ", output.numpy().shape)
print(" min: ", output.numpy().min())
print(" max: ", output.numpy().max())
print(" mean: ", output.numpy().mean())
print()
block1_conv1 shape: (1, 403, 512, 64) min: 0.0 max: 743.2428 mean: 32.208652 block2_conv1 shape: (1, 201, 256, 128) min: 0.0 max: 2573.0193 mean: 134.44252 block3_conv1 shape: (1, 100, 128, 256) min: 0.0 max: 8561.582 mean: 103.2595 block4_conv1 shape: (1, 50, 64, 512) min: 0.0 max: 11601.054 mean: 451.57892 block5_conv1 shape: (1, 25, 32, 512) min: 0.0 max: 1913.4247 mean: 32.27252
Calculate style
The content of an image is represented by the values of the intermediate feature maps.
It turns out, the style of an image can be described by the means and correlations across the different feature maps. Calculate a Gram matrix that includes this information by taking the outer product of the feature vector with itself at each location, and averaging that outer product over all locations. This Gram matrix can be calculated for a particular layer as:
$$G^l_{cd} = \frac{\sum_{ij} F^l_{ijc}(x)F^l_{ijd}(x)}{IJ}$$
This can be implemented concisely using the tf.linalg.einsum
function:
def gram_matrix(input_tensor):
result = tf.linalg.einsum('bijc,bijd->bcd', input_tensor, input_tensor)
input_shape = tf.shape(input_tensor)
num_locations = tf.cast(input_shape[1]*input_shape[2], tf.float32)
return result/(num_locations)
Extract style and content
Build a model that returns the style and content tensors.
class StyleContentModel(tf.keras.models.Model):
def __init__(self, style_layers, content_layers):
super(StyleContentModel, self).__init__()
self.vgg = vgg_layers(style_layers + content_layers)
self.style_layers = style_layers
self.content_layers = content_layers
self.num_style_layers = len(style_layers)
self.vgg.trainable = False
def call(self, inputs):
"Expects float input in [0,1]"
inputs = inputs*255.0
preprocessed_input = tf.keras.applications.vgg19.preprocess_input(inputs)
outputs = self.vgg(preprocessed_input)
style_outputs, content_outputs = (outputs[:self.num_style_layers],
outputs[self.num_style_layers:])
style_outputs = [gram_matrix(style_output)
for style_output in style_outputs]
content_dict = {content_name: value
for content_name, value
in zip(self.content_layers, content_outputs)}
style_dict = {style_name: value
for style_name, value
in zip(self.style_layers, style_outputs)}
return {'content': content_dict, 'style': style_dict}
When called on an image, this model returns the gram matrix (style) of the style_layers
and content of the content_layers
:
extractor = StyleContentModel(style_layers, content_layers)
results = extractor(tf.constant(content_image))
print('Styles:')
for name, output in sorted(results['style'].items()):
print(" ", name)
print(" shape: ", output.numpy().shape)
print(" min: ", output.numpy().min())
print(" max: ", output.numpy().max())
print(" mean: ", output.numpy().mean())
print()
print("Contents:")
for name, output in sorted(results['content'].items()):
print(" ", name)
print(" shape: ", output.numpy().shape)
print(" min: ", output.numpy().min())
print(" max: ", output.numpy().max())
print(" mean: ", output.numpy().mean())
Styles: block1_conv1 shape: (1, 64, 64) min: 0.022261642 max: 44157.473 mean: 355.99878 block2_conv1 shape: (1, 128, 128) min: 0.0 max: 74937.36 mean: 11677.908 block3_conv1 shape: (1, 256, 256) min: 0.0 max: 379654.12 mean: 10265.242 block4_conv1 shape: (1, 512, 512) min: 0.0 max: 3333814.5 mean: 183739.55 block5_conv1 shape: (1, 512, 512) min: 0.0 max: 238887.72 mean: 2304.3828 Contents: block5_conv2 shape: (1, 18, 32, 512) min: 0.0 max: 1443.768 mean: 16.313246
Run gradient descent
With this style and content extractor, you can now implement the style transfer algorithm. Do this by calculating the mean square error for your image's output relative to each target, then take the weighted sum of these losses.
Set your style and content target values:
style_targets = extractor(style_image)['style']
content_targets = extractor(content_image)['content']
Define a tf.Variable
to contain the image to optimize. To make this quick, initialize it with the content image (the tf.Variable
must be the same shape as the content image):
image = tf.Variable(content_image)
Since this is a float image, define a function to keep the pixel values between 0 and 1:
def clip_0_1(image):
return tf.clip_by_value(image, clip_value_min=0.0, clip_value_max=1.0)
Create an optimizer. The paper recommends LBFGS, but Adam works okay, too:
opt = tf.keras.optimizers.Adam(learning_rate=0.02, beta_1=0.99, epsilon=1e-1)
To optimize this, use a weighted combination of the two losses to get the total loss:
style_weight=1e-2
content_weight=1e4
def style_content_loss(outputs):
style_outputs = outputs['style']
content_outputs = outputs['content']
style_loss = tf.add_n([tf.reduce_mean((style_outputs[name]-style_targets[name])**2)
for name in style_outputs.keys()])
style_loss *= style_weight / num_style_layers
content_loss = tf.add_n([tf.reduce_mean((content_outputs[name]-content_targets[name])**2)
for name in content_outputs.keys()])
content_loss *= content_weight / num_content_layers
loss = style_loss + content_loss
return loss, style_loss, content_loss
Use tf.GradientTape
to update the image.
@tf.function()
def train_step(image):
with tf.GradientTape() as tape:
outputs = extractor(image)
total_loss, style_loss, content_loss = style_content_loss(outputs) # Return individual losses
grad = tape.gradient(total_loss, image)
opt.apply_gradients([(grad, image)])
image.assign(clip_0_1(image))
return total_loss, style_loss, content_loss # Return the losses
Now run a few steps to test:
train_step(image)
train_step(image)
train_step(image)
tensor_to_image(image)
Printing style and content losses
[0;31m---------------------------------------------------------------------------[0m [0;31mAttributeError[0m Traceback (most recent call last) Cell [0;32mIn[35], line 1[0m [0;32m----> 1[0m [43mtrain_step[49m[43m([49m[43mimage[49m[43m)[49m [1;32m 2[0m train_step(image) [1;32m 3[0m train_step(image) File [0;32m~/.local/lib/python3.10/site-packages/tensorflow/python/util/traceback_utils.py:153[0m, in [0;36mfilter_traceback.<locals>.error_handler[0;34m(*args, **kwargs)[0m [1;32m 151[0m [38;5;28;01mexcept[39;00m [38;5;167;01mException[39;00m [38;5;28;01mas[39;00m e: [1;32m 152[0m filtered_tb [38;5;241m=[39m _process_traceback_frames(e[38;5;241m.[39m__traceback__) [0;32m--> 153[0m [38;5;28;01mraise[39;00m e[38;5;241m.[39mwith_traceback(filtered_tb) [38;5;28;01mfrom[39;00m [38;5;28;01mNone[39;00m [1;32m 154[0m [38;5;28;01mfinally[39;00m: [1;32m 155[0m [38;5;28;01mdel[39;00m filtered_tb File [0;32m/tmp/__autograph_generated_filef4sxik6q.py:12[0m, in [0;36mouter_factory.<locals>.inner_factory.<locals>.tf__train_step[0;34m(image)[0m [1;32m 10[0m (loss, style_loss, content_loss) [38;5;241m=[39m ag__[38;5;241m.[39mconverted_call(ag__[38;5;241m.[39mld(style_content_loss), (ag__[38;5;241m.[39mld(outputs),), [38;5;28;01mNone[39;00m, fscope) [1;32m 11[0m ag__[38;5;241m.[39mld([38;5;28mprint[39m)([38;5;124m'[39m[38;5;124mPrinting style and content losses[39m[38;5;124m'[39m) [0;32m---> 12[0m ag__[38;5;241m.[39mld([38;5;28mprint[39m)(ag__[38;5;241m.[39mconverted_call([43mag__[49m[38;5;241;43m.[39;49m[43mld[49m[43m([49m[43mstyle_loss[49m[43m)[49m[38;5;241;43m.[39;49m[43mnumpy[49m, (), [38;5;28;01mNone[39;00m, fscope), ag__[38;5;241m.[39mconverted_call(ag__[38;5;241m.[39mld(content_loss)[38;5;241m.[39mnumpy, (), [38;5;28;01mNone[39;00m, fscope)) [1;32m 13[0m grad [38;5;241m=[39m ag__[38;5;241m.[39mconverted_call(ag__[38;5;241m.[39mld(tape)[38;5;241m.[39mgradient, (ag__[38;5;241m.[39mld(loss), ag__[38;5;241m.[39mld(image)), [38;5;28;01mNone[39;00m, fscope) [1;32m 14[0m ag__[38;5;241m.[39mconverted_call(ag__[38;5;241m.[39mld(opt)[38;5;241m.[39mapply_gradients, ([(ag__[38;5;241m.[39mld(grad), ag__[38;5;241m.[39mld(image))],), [38;5;28;01mNone[39;00m, fscope) [0;31mAttributeError[0m: in user code: File "/tmp/ipykernel_68342/329484522.py", line 8, in train_step * print(style_loss.numpy(), content_loss.numpy()) AttributeError: 'SymbolicTensor' object has no attribute 'numpy'
Since it's working, perform a longer optimization:
import time
start = time.time()
epochs = 10
steps_per_epoch = 100
step = 0
for n in range(epochs):
for m in range(steps_per_epoch):
step += 1
total_loss, style_loss, content_loss = train_step(image) # Capture the returned losses
print(".", end='', flush=True)
display.clear_output(wait=True)
display.display(tensor_to_image(image))
print(f"Train step: {step}")
print(f"Epoch {n+1}/{epochs}")
# Print losses at the end of each epoch
print(f"Total Loss: {total_loss.numpy():.4f}, Style Loss: {style_loss.numpy():.4f}, Content Loss: {content_loss.numpy():.4f}")
end = time.time()
print("Total time: {:.1f}".format(end-start))
Train step: 1000 Epoch 10/10 Total Loss: 1194236.2500, Style Loss: 548491.0000, Content Loss: 645745.2500 Total time: 70.0
Total variation loss
One downside to this basic implementation is that it produces a lot of high frequency artifacts. Decrease these using an explicit regularization term on the high frequency components of the image. In style transfer, this is often called the _total variation loss:
def high_pass_x_y(image):
x_var = image[:, :, 1:, :] - image[:, :, :-1, :]
y_var = image[:, 1:, :, :] - image[:, :-1, :, :]
return x_var, y_var
x_deltas, y_deltas = high_pass_x_y(content_image)
plt.figure(figsize=(14, 10))
plt.subplot(2, 2, 1)
imshow(clip_0_1(2*y_deltas+0.5), "Horizontal Deltas: Original")
plt.subplot(2, 2, 2)
imshow(clip_0_1(2*x_deltas+0.5), "Vertical Deltas: Original")
x_deltas, y_deltas = high_pass_x_y(image)
plt.subplot(2, 2, 3)
imshow(clip_0_1(2*y_deltas+0.5), "Horizontal Deltas: Styled")
plt.subplot(2, 2, 4)
imshow(clip_0_1(2*x_deltas+0.5), "Vertical Deltas: Styled")
This shows how the high frequency components have increased.
Also, this high frequency component is basically an edge-detector. You can get similar output from the Sobel edge detector, for example:
plt.figure(figsize=(14, 10))
sobel = tf.image.sobel_edges(content_image)
plt.subplot(1, 2, 1)
imshow(clip_0_1(sobel[..., 0]/4+0.5), "Horizontal Sobel-edges")
plt.subplot(1, 2, 2)
imshow(clip_0_1(sobel[..., 1]/4+0.5), "Vertical Sobel-edges")
The regularization loss associated with this is the sum of the squares of the values:
def total_variation_loss(image):
x_deltas, y_deltas = high_pass_x_y(image)
return tf.reduce_sum(tf.abs(x_deltas)) + tf.reduce_sum(tf.abs(y_deltas))
total_variation_loss(image).numpy()
56656.504
That demonstrated what it does. But there's no need to implement it yourself, TensorFlow includes a standard implementation:
tf.image.total_variation(image).numpy()
array([56656.504], dtype=float32)
Re-run the optimization
Choose a weight for the total_variation_loss
:
total_variation_weight=30
Now include it in the train_step
function:
@tf.function()
def train_step(image):
with tf.GradientTape() as tape:
outputs = extractor(image)
loss, style_loss, content_loss = style_content_loss(outputs)
loss += total_variation_weight*tf.image.total_variation(image)
grad = tape.gradient(loss, image)
opt.apply_gradients([(grad, image)])
image.assign(clip_0_1(image))
return loss, style_loss, content_loss
Reinitialize the image-variable and the optimizer:
opt = tf.keras.optimizers.Adam(learning_rate=0.02, beta_1=0.99, epsilon=1e-1)
image = tf.Variable(content_image)
And run the optimization:
import time
start = time.time()
epochs = 10
steps_per_epoch = 100
step = 0
for n in range(epochs):
for m in range(steps_per_epoch):
step += 1
total_loss, style_loss, content_loss = train_step(image) # Capture all three losses
print(".", end='', flush=True)
display.clear_output(wait=True)
display.display(tensor_to_image(image))
print("Train step: {}".format(step))
# Print losses at the end of each epoch
print(f"Epoch {n+1}/{epochs}")
print(f"Total Loss: {total_loss.numpy().item():.4f}, Style Loss: {style_loss.numpy().item():.4f}, Content Loss: {content_loss.numpy().item():.4f}")
end = time.time()
print("Total time: {:.1f}".format(end-start))
Train step: 1000 Epoch 10/10 Total Loss: 1565367.1250, Style Loss: 559976.9375, Content Loss: 678227.8750 Total time: 70.9
import time
start = time.time()
epochs = 10
steps_per_epoch = 100
step = 0
for n in range(epochs):
for m in range(steps_per_epoch):
step += 1
train_step(image)
print(".", end='', flush=True)
display.clear_output(wait=True)
display.display(tensor_to_image(image))
print("Train step: {}".format(step))
end = time.time()
print("Total time: {:.1f}".format(end-start))
Train step: 1000 Total time: 72.5
Learn more
This tutorial demonstrates the original style-transfer algorithm. For a simple application of style transfer check out this tutorial to learn more about how to use the arbitrary image style transfer model from TensorFlow Hub.