How I built an AI Image Enhancer

What is a great image? It’s just the previous image with more contrast and saturation, right (/s) ? But at some point, the contrast or saturation is too high. And what if the creator of the image didn’t want more contrast, what if a photographer wants his photo to look dull to convey an emotion? What if an image is desaturated on purpose? Is the best image the most realistic one, or the most pleasing one?

You can’t transform an image without changing the message it conveys. However, you can make this image more pleasing visually. In this article I detail how I built an algorithm to enhance images and what I learned.

Image enhancers

We’ve all used image enhancers. They may be among the most used algorithms in the world. Take a photo with your smartphone and it’ll be enhanced with an algorithm. What does “enhanced” mean? Well, it can include multiple things:

  • Removing blur
  • Removing noise
  • Raising resolution
  • Adjusting lighting, contrast and saturation
  • Enhancing faces
  • Reconstructing missing information (colors, degraded images…)
  • Adding a filter, etc.

There are multiple ways to enhance an image. It can be done automatically with a simple mathematical analysis of the information in the image (“The sum of pixels is too low, it’s too dark, the brightness needs to be changed”). It can be carried out on the user’s decision with a simple mathematical analysis (“This image would be much better with a black and white filter”). And it can be done with deep learning (DL) algorithms, based on user’s decision or automatically.

However, when DL algorithms are used, they have one major constraint: they can be very slow to run on low-end devices.

An example of DL algorithm to enhance images is https://arxiv.org/pdf/1704.02470.pdf . In this paper, enhancing an image means improving its “color, texture and content” to make photos from mobile devices similar to photos from professional cameras. They combine multiple losses and use GANs to discriminate their “enhanced image” vs an already great “target image”. If the generator manages to enhance the image enough, the discriminator won’t notice the difference, otherwise you can train the generator based on the difference to make it better at fooling the discriminator. Basically, their goal is to do f(bad_image) = great_image. They need f. This is the goal of most DL algorithms to enhance images.

This is not exactly what I did.

Algorithm

In fact, I didn’t really want to improve images. I just stumbled upon an “aesthetic predictor” on Github.

It does what it says: it predicts how “aesthetic” an image is. So I used it on all my images, and it worked pretty well. The images with the highest score are indeed very great, and the images with the worst scores are quite unaesthetic. You can see some results here: http://captions.christoph-schuhmann.de/aesthetic_viz_laion_sac+logos+ava1-l14-linearMSE-en-2.37B.html

It’s based on AVA: “A largescale database for aesthetic visual analysis”.

Now.

  • I know algorithms to change brightness / saturation / contrast / blur / sharpness / etc. of an image
  • I know how to measure how aesthetic an image is.

So… Well I can transform an image once randomly, see if the result is better (aesthetic score improved), keep it if it’s better, or keep the previous image otherwise. And I can redo that again, and again, and again..

Add a decision tree on top of that, and you approximately have the algorithm I implemented. Each node is a bunch of transformations and I try new transformations on the best nodes of the graph.

NUlIE algorithm: a decision tree with an aesthetic predictor to enhance images. The algorithm does a transformation, the aesthetic predictor evaluates the image, the decision tree keeps track of what works and what doesnt.

Pros

  • You can decide on how to transform the image. With standard DL enhancers, it may be harder to control saturation / contrast / lighting / etc independently. With this algorithm it’s possible.
  • You can scale the processing time the way you want, the algorithm can run for 5 seconds or 5 hours and continuously try new transformations

Cons

  • With this specific aesthetic predictor, what “aesthetic means” isn’t clearly defined for photos, and it can include changing the content of an image if it’s a transformation the model has access to. If the model could remove an unaesthetic detail in an image, it would.
  • The model is overall pretty slow as it constantly needs to transform and evaluate an image. The current version could, however, be improved, but it requires multiple seconds to get a good enough result with a middle-end GPU

Results

In this part you’ll get what I meant with the first section of this article. My algorithm has two modes, a “soft” mode and a “hard mode”. The soft mode only has access to some basic transformations on contrast / saturation / brightness. However, the “hard” mode can do much more stuff. Let’s start with some results of the soft mode:

Soft mode

Enhanced dog (soft mode)
Enhanced snake (soft mode)
Enhanced dragon (soft mode)

Hard mode

Ok, same question as before, what is an “enhanced” image?

Is this image …:

… an enhanced version of this image?:

It’s just not the same image anymore. The “hard” mode is able to transform an image much more (posterize, solarize, invert, change hue, etc.). And, it can try to “enhance” an image by giving it a more artistic touch. I found these results quite interesting and I wanted to share them. Overall the algorithm is able to enhance most images but I think it has a bias towards making human faces darker. In theory this could be improved by designing an aesthetic predictor specifically for the task and by using a more accurate model…

That’s it!

Code

The algorithm is freely available as an extension for sd-webui:

NUl Image Enhancer as an extension in sd-webui

or as a standalone in command-line with python:

Requirements

  • An NVIDIA GPU with > 2GB VRAM (very slow on CPU)
  • CLIP
  • Pytorch

Models

You can also check Marques Brownlee’s video on how he evaluated smartphone cameras: https://www.youtube.com/watch?v=LQdjmGimh04 (and in fact he mostly evaluated the post-processed images, that is, the best “image enhancer” in modern smartphone cameras).

👏 If you want more articles like this! Thanks!

Revenir sur le Blog
Tous nos articles sont disponibles sur Medium

Actualités

  • Card image
    Transformers in Pytorch from scratch for NLP Beginners

    Everything you need in one python file, without extra libraries Two weeks ago, I wanted to understand Transformers. I read the original paper, I read articles I could find online, I listened to podcas...

    Wed, 17 Feb 2021 21:12:46 GMT

    Lire

    Wed, 17 Feb 2021 21:12:46 GMT

    Lire
  • Card image
    Why do we close nuclear reactors?

    Nuclear reactors may be closed for four main reasons: they reached their end of life, they had an accident, they had technical problems and couldn’t be repaired, or a political decision made them cl...

    Sun, 02 Apr 2023 22:23:52 GMT

    Lire

    Sun, 02 Apr 2023 22:23:52 GMT

    Lire
  • Card image
    How many people died because of the Chernobyl disaster?

    Several studies and organizations investigated deaths related to the Chernobyl accident. I present their results. This article is part of a series on the DEC Report. The DEC report is a 200+ pages fr...

    Sun, 02 Apr 2023 18:29:33 GMT

    Lire

    Sun, 02 Apr 2023 18:29:33 GMT

    Lire
  • Card image
    Energy, EROI and limits to growth

    What is the limit to the amount of energy we can produce? Is EROI the best metric for future constraints? Let’s see! In this article, I provide a simple algorithm to evaluate if a strategy is credi...

    Sun, 02 Apr 2023 14:07:27 GMT

    Lire

    Sun, 02 Apr 2023 14:07:27 GMT

    Lire
  • Card image
    How much fossil fuel do we consume each year?

    Can we really grasp how much fossil fuels we consume each year? Is it a lot? Not that much? Can we easily do an energy transition for climate change? Or is our consumption of fossil fuel so fundamenta...

    Sun, 02 Apr 2023 11:58:25 GMT

    Lire

    Sun, 02 Apr 2023 11:58:25 GMT

    Lire
  • Card image
    How I assessed the global potential of nuclear energy

    This article is part of a series on the DEC Report. The DEC report is a 200+ pages freely accessible report I wrote on climate change and energy. It assesses the world’s potential to tackle climate ...

    Sat, 01 Apr 2023 20:08:49 GMT

    Lire

    Sat, 01 Apr 2023 20:08:49 GMT

    Lire
  • Card image
    How I evaluated the world’s potential for wind energy

    This article is part of a series on the DEC Report. The DEC report is a 200+ pages freely accessible report I wrote on climate change and energy. It assesses the world’s potential to tackle climate ...

    Sat, 01 Apr 2023 17:46:21 GMT

    Lire

    Sat, 01 Apr 2023 17:46:21 GMT

    Lire
  • Card image
    How I evaluated the world’s potential for solar energy

    This article is part of a series on the DEC Report. The DEC report is a 200+ pages freely accessible report I wrote on climate change and energy. It assesses the world’s potential to tackle climate ...

    Sat, 01 Apr 2023 16:28:55 GMT

    Lire

    Sat, 01 Apr 2023 16:28:55 GMT

    Lire
  • Card image
    How I evaluated the world’s potential for hydroelectricity

    This article is part of a series on the DEC Report. The DEC report is a 200+ pages freely accessible report I wrote on climate change and energy. It assesses the world’s potential to tackle climate ...

    Sat, 01 Apr 2023 15:09:34 GMT

    Lire

    Sat, 01 Apr 2023 15:09:34 GMT

    Lire
  • Card image
    How I built an AI Image Enhancer

    What is a great image? It’s just the previous image with more contrast and saturation, right (/s) ? But at some point, the contrast or saturation is too high. And what if the creator of the image d...

    Sat, 01 Apr 2023 02:06:01 GMT

    Lire

    Sat, 01 Apr 2023 02:06:01 GMT

    Lire
  • Card image
    [Video] A Model for Language Acquisition

    In this video I introduce the prototype for language acquisition in the global Artificial General Intelligence project. https://medium.com/media/9a72c93624362c8105f1406f16ee1817/href The model I used...

    Wed, 26 Jan 2022 22:54:31 GMT

    Lire

    Wed, 26 Jan 2022 22:54:31 GMT

    Lire
  • Card image
    Simulons des pandémies

    Propagation d’un virus dans une population Cet article a pour but de transmettre un retour d’expérience sur la simulation de pandémies. Objectif? Comprendre les limites et l’intérêt de ces a...

    Thu, 22 Oct 2020 11:56:53 GMT

    Lire

    Thu, 22 Oct 2020 11:56:53 GMT

    Lire
  • Card image
    Neural Network in C++ From Scratch and Backprop-Free Optimizers

    In this article I’ll present a beginner-oriented framework implementing neural networks in C++. The main goal of this code is to understand the root of neural networks for beginners, it also allows ...

    Tue, 13 Oct 2020 23:03:00 GMT

    Lire

    Tue, 13 Oct 2020 23:03:00 GMT

    Lire
  • Card image
    A more parameter-efficient SOTA bottleneck! (2020/07)

    Linear Bottleneck with Efficient Channel Attention instead of Squeeze Excitation CNN are great blablabla… Let’s get to the point. SOTA for image classification on Imagenet is EfficientNet with 88....

    Sat, 25 Jul 2020 19:40:43 GMT

    Lire

    Sat, 25 Jul 2020 19:40:43 GMT

    Lire
  • Card image
    Visuels de l’apprentissage des réseaux de neurones

    Modification des représentations internes d’un réseau de neurones en cours d’entrainement 1. Le formalisme de l’apprentissage automatique L’apprentissage automatique est la science regroupan...

    Wed, 03 Jun 2020 19:07:53 GMT

    Lire

    Wed, 03 Jun 2020 19:07:53 GMT

    Lire
  • Card image
    Construire un serveur de Deep Learning en 2020

    [UPDATE 2020/10/03: Prise en compte des nouveaux GPUs de Nvidia] L’intelligence artificielle, à travers l’apprentissage profond, est une discipline bien établie. Les algorithmes utilisés progr...

    Sat, 16 May 2020 12:43:19 GMT

    Lire

    Sat, 16 May 2020 12:43:19 GMT

    Lire