Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Using CycleGAN for Unusual Transformation: Ikebana from Portraits and Animals, Study Guides, Projects, Research of Computer Vision

This research explores the use of CycleGAN, a GANs technique, to transform portraits and animal photos into Ikebana, a Japanese art form. The study discusses the limitations of traditional GANs and how pre-trained object recognition and edge detection are used to improve the transformation process. The document also includes the objective function and training process of CycleGAN, as well as examples of successful transformations.

Typology: Study Guides, Projects, Research

2020/2021

Uploaded on 11/14/2021

pham-tien-thanh-cong
pham-tien-thanh-cong 🇻🇳

4.9

(39)

23 documents

1 / 12

Toggle sidebar

Related documents


Partial preview of the text

Download Using CycleGAN for Unusual Transformation: Ikebana from Portraits and Animals and more Study Guides, Projects, Research Computer Vision in PDF only on Docsity! IkebanaGAN: New GANs Technique for Digital Ikebana Art Mai Cong Hung!, Mai Xuan Trang’, Naoko Tosa!, Ryohei Nakatsu! 1 Kyoto University, Kyoto, Japan 2 Faculty of Computer Science, Phenikaa University, Ha Noi, Viet Nam hungmcuet@ gmail.com, trang.maixuan@phenikaa-uni.edu.vn, tosa.naoko.5c@kyoto-u.ac.jp, ryohei.nakatsu@design.kyoto-u.ac.jp Abstract. In this research, we have carried out various experiments to perform mutual transformation between a domain of Ikebana (Japanese traditional flower arrangement) photos and other domains of images (landscapes, animals, por- traits) to create new artworks via a variation of CycleGAN - a GANs technique based on cycle-consistency loss. A pre-trained process on object detection was added to improve the efficiency by avoiding over-transformation. Keywords: GANs, Cycle GAN, Ikebana, Image Transformation. 1 Introduction The rapid advance of Deep Learning in recent years raises an interesting ques- tion for both computer scientists and artists: "What is the role of AI/Machine Learn- ing/Deep Learning in the future art scene?". For instance, the machine learning tech- nique was used in artwork clustering tasks [1] as well as art evaluation [2]. On the other side, the application of AI, especially Deep Learning in art creation is of interest. One basic approach of AI toward art is to use the style transfer technique to transform normal photos or sketches into artworks of specific styles. On Deep Learning, style transfer tasks can be performed by applying generative models in GANs (Generative Adversarial Networks [3]). In the training of GANs, generator network G learns to gen- erate new data while discriminator network D tries to identify the generated data whether it is real or fake. The training process can be interpreted as a zero-sum game between G and D: G tries to maximize the probability of the generated data to lie on the distribution of target sets while D tries to minimize it. GANs training can converge even with a relatively small number of learning data. A large number of GANs variation has been developed by modifying the basic config- uration and performs impressive results on style transfer tasks. Among the variations of GANs, CycleGAN is an elegant method to study the mutual transformation between two sets of data [4]. In comparison to traditional GANs, in CycleGAN an inverse trans- formation of the generator network has been added to transform data on the target do- main back to the input domain. Also, two discriminators are used for the two domains. The training process on CycleGAN tries to minimize the error caused by applying a cycle of forwarding and backward transformation. CycleGAN is flexible and useful for art style transfer because it uses unpaired training sets and set-to-set level transfor- mation to learn the distribution of the target sets, which we could consider as an art style. Classic examples of CycleGAN and other style transfer techniques were developed by taking the transformation between two sets of data of relatively similar size, with themes or categories such as the transformation between artworks by Monet and land- scapes photos, winter and summer landscapes, or horse and zebra photos. So, what would happen if one performs a transfer between two sets of relatively different do- mains of objects. The authors proposed the idea of “unusual transformation [5],” which achieves a mutual transformation between two sets of different sizes and themes. Sev- eral examples were given by transforming portraits and animal photos into Ikebana, the Japanese art of flower arrangement, via CycleGAN. It is impressive that portraits and horse photos tum into Ikebana while one can still recognize the original shape of human faces and horses (Fig. 1). This "unusual transformation" concept would open anew way to create an original art style. Fig, 1. Transformation of portraits and horse photos into Ikebana by CycleGAN However, there are some limitations of traditional GANs techniques to perform this unusual transformation task. The experiments with Ikebana in [5] show some failures of CycleGAN to transform photos of complex backgrounds into abstract Ikebana (Fig. 2). In some cases, some photos were over-transformed so that we could not recognize the original shape of the main object. The structure of classic GANs techniques was not designed to learn specific high abstract representation and was difficult to learn an ob- ject with various sizes in a collection of photos. In our research, we would improve this limitation by mixing GANs with classic Computer Vision techniques. This idea ap- peared on CartoonGAN [6] when the authors used edge detection to emphasize the weight of edges to fit with the task of the anime-style transfer. Another interesting ex- ample is the Attentive Adversarial Network [7] which uses face recognition to improve the performance of art style transfer for selfie images. discriminate between different categories of data. Mathematically, the generative model learns the joint probability p(X, Y) of data instances set X and label set Y while the discriminative model learns the conditional probability p(Y|X). In recent years, GANs (Generative Adversarial Networks) [3] has become a big topic in Deep Learning as their generative model provides a powerful performance on the task of art style transfer with just a relatively small number of training data. The struc- ture of GANs could be described as in Fig. 4 with the basic configuration of two net- works, a generator network (G) and a discriminator network (D). The training of GANs is based on a minimax mechanism where network G learns to generate data from ran- dom noise while D tries to identify the generated data whether it is real or fake. In mathematics terms, the training process on G tries to maximize the probability of the generated data to lie on the distribution of target sets and the training process on D tries to minimize it. Recently, a large number of GANSs variation has been developed by modifying the basic configuration of this minimax mechanism. eye Discriminator (D) Training set Ea a7 cS eS. Random noise Fake data Fig, 4. The basic configuration of GANs Among the variations of GAN, CycleGAN [4] is an elegant method to study the set- to-set level of mutual transformation between two categories of objects. Its architecture consists of two generators and two discriminators as shown in Fig. 5. Given two image sets A and B, the core goal of CycleGAN is to learn two mappings G,,: A > B and Gea: B > A given the training samples: {a;}"., € A and {oy € Bwith the data distributions a ~ p,(a) and b ~ p,(b). The two discriminators are D, and D, where D,, aims to distinguish between images {a} and translated images {G,,(b)} and the same analogy applies to D,. The objective function of CycleGAN contains two types of loss: adversarial losses for matching the generated images to the target images; and cycle consistency loss for preventing the mappings G,, and G,, from contradicting each other. Adver sarial Loss: The adversarial loss applies to both mapping functions. e For the mapping function G,,: A > B and its discriminator Dz: Lean (Gap, Dg, A,B) = Ep~ppwllog De (b)] + Eg~pa(ayllog(1 — Da (Gan (a)))] (@ e For the mapping function Gg,: B > A and its discriminator D,: Loan (Gea, Da, B, A) = Ea-pa(allog Da(a)] + E,~pp@[log(1 — Da(Gea(b)))] (2) Cycle Consistency Loss: For each image, a from domain A, the generated image af- ter applying two transformation G,, and G,, should be similar to a: a > G4,(a) > Gg4(G4g(a)) © a. This is called forward cycle consistency. Similarly, for the back- ward path, we have backward cycle consistency: b > Gg4(b) > Gag(Gga(b)) © b. The cycle consistency loss is a combination of forwarding and backward cycle con- sistency losses: Leye(Gae» Gea) = Baxpate [lIGaa(Gae@) - all,] + Epp [IIGan(Gea(b)) — ll, ©) The full objective function of CycleGAN is a combination of the adversarial losses and the cycle consistency loss: L(Gapy, Gpa, Da, Dg) = Lean (Gap, Dp, A, BY + Lean (Gea, Da, BA) + eye (Gas, Gea) (4) where J is the weight of the cycle consistency loss. In the training phase, the parameters of the networks (G4g, Gg4,D,, and Dg ) are estimated by optimizing the full objective function: Gig, Gia = arg min max L(G4p, Gp, Da, Dp) (5) GaB.GBADA,DR L2 Loss Y Gas Gea \ real or fake ? when Gas generates a horse image from the zebra. Discriminator for domain B Real mage in domain 8 Fig, 5. The basic configuration of CyeleGAN ((4]) In general, generative models in CycleGAN learn the set-to-set level of transformation while the original GANs learn to generate data to fit in a target set. Therefore, Cy- cleGAN could be used to establish mutual conversion between these two groups of images such as the art styles of two artists. As in Fig. 4, CycleGAN converts horses into zebras and vice versa. In [5], the authors use CycleGAN to create Ikebana painting via the concept of Unusual Transformation. Zebras <> Horses Fig. 6. Horses-Zebras transfer (Image source [3]) 10 Fig, 8a. Experiment result A-B1: IkebanaGAN and CycleGAN both generate acceptable transformation (the first row is the original photo, the second row is the transformation by Cy- cleGAN, the last row is the transformation by IkebanaGAN) Fig, 9b. Experiment result A-B1: IkebanaGAN performs better than CycleGAN (the first row is the original photo, the second row is the transformation by CycleGAN, the last row is the trans- formation by IkebanaGAN) ll Fig, 9a. Experiment result A-B2: IkebanaGAN and CycleGAN both generate acceptable transformation (the first row is the original photo, the second row is the transformation by Cy- cleGAN, the last row is the transformation by IkebanaGAN) Fig, 9b. Experiment result A-B2: IkebanaGAN performs better than CycleGAN (the first row is the original photo, the second row is the transformation by CycleGAN, the last row is the trans- formation by IkebanaGAN) 12 6 Discussion and Conclusion Inthe examples which IkebanaGAN performs better than CycleGAN, we found that the original shape were well-preserved as in our assumption. IkebanaGAN would improve the successful rate as well as the quality of the unusual transformation of Ikebana and portraits more than animal photos. We consider the reason as the structures of objects in animal photos are more complex than human faces. As we mentioned before, the unusual transformation is a challenging task because of the different structures of the two data sets. We hope to improve that difficulty by providing some techniques that mixed Deep Learning-based style transfer and classic Computer Vision's object detection. We remark that because of the natural difficulty, the success rate of the transformation is still low. In the future, we would use another approach by mixing two GANs networks. We would provide a transformation between photos and sketches as well as sketches and Ikebana with the assumption that the sketch structure would remove the difficulty of over-transformation. References 1. Gultepe E., Conturo T. E., Makrehchi M., Predicting and Grouping Digitized Paintings by Style using Unsupervised Feature Learning. J Cult Herit. 2018 May-Jun;31:13-23. doi: 10.1016/j.cuther.2017.11.008. Epub 2017 Dec 20. PMID: 30034259, PMCID: PMC6051702. . Hung, M. C., Nakatsu, R., Tosa, N., Kusumi, T., “Leaming of Art Style Using AI and Its Evaluation Based on Psychological Experiments,” 2020 International Conference on Enter- tainment Computing (2020). 3. Creswell, A., et all, “Generative Adversarial Networks: An Overview,” IEEE Signal Pro- cessing Magazine, Vol.35, No.1, pp.53-65 (2018). 4, Zhu, J, Park, T., Isola, P., Efros, A. A.,""Unpaired Image-to-Image Translation Using Cycle- Consistent Adversarial Networks," 2017 IEEE International Conference on Computer Vi- sion (ICCV), pp. 2242-2251 (2017). 5. Hung, M. C., Nakatsu, R., Tosa, N., “Developing Japanese Ikebana as a Digital Painting Tool via AI,” 2020 International Conference on Entertainment Computing (2020). v 6. Chen, Y., Lai, Y. and Liu, Y., "CartoonGAN: Generative Adversarial Networks for Photo Cartoonization," 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.9465-9474 (2018). 7. Li, X., Zhang, W., Shen, T., “Mei, Everyone is a Cartoonist: Selfie Cartoonization with Attentive Adversarial Networks,” 2019 IEEE International Conference on Multimedia and Expo (ICME), pp.652-657 (2019). 8 Sato, S., “The Art of Arranging Flowers: A Complete Guide to Japanese Culture,” Harry N. Abrams (1965). 9. Tosa, N., Pang, Y., Yang, Q., Nakatsu, R., “Pursuit and Expression of Japanese Beauty Us- ing Technology,” in Frederic Fol Leymarie, Juliette Bessette, G. W. Smith eds., The Ma- chine as Art/The Machine as Artist, MDPI, pp.267-280 (2020) 10. Canny, J.,“A computational approach to edge detection,” IEEE Transactions on pattern analysis and machine intelligence (6), pp 679-698 (1986).
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved