A downloadable asset pack

Download NowName your own price


Guide to Training Your Drawings with Generative AI

1. Introduction

Hello, today I want to share with you a complete guide on how to use generative AI to create assets. My focus is for and towards other artists, whether to accelerate the workflow, explore interesting alternatives, due to technical limitations , or any other reason. This has allowed me to take control of the consistency and style of my creations, and I want to share this knowledge with others.

Steps to Follow:

  • We will gather a dataset of images.
  • We will automatically obtain their descriptions using a Colab notebook.
  • We will train an algorithm with the images and descriptions to obtain a safetensor file that we can use anywhere.

The platform you choose to generate with your LoRa is a different matter, you'll see many options. In this guide, I am focusing on helping you obtain a safetensor file.

1.1 Pre-training:

Dataset Preparation:

To start, you will need to collect your own drawings. I started with 35, but even a small dataset can be useful to generate a basic model that can be iterated and improved over time. When selecting the drawings for your dataset, it is important to maintain consistency in what you want to highlight. For example, my drawings of trees are plump and circular, a characteristic I wanted to emphasize.

Collection of Drawings

Drawing of a plump tree
House Drawing
Another drawing of a plump tree
Another Drawing
Drawing of a detailed tree Drawing of a detailed tree
Drawing of assets and landscapes
Drawing of a tree
Top-down City
Drawing of a plump tree with colors
Colors I want to emphasize
The images can have different sizes, but not too much variation. I recommend using standard resolutions such as 1024x1024, 780x1024, and 1024x780. Preparing the dataset can take a few hours, but if you have few images, you will need to work on their quality and resolution.

Examples and Clarifications:

Before starting, I will show some generations of this style and make a few clarifications:

--------------------------------------------------------------------------------------------------

Prompt: houtline, best quality, line-up, line_art, 2d_outline, fornitures, props, magic tree, multiple_views, gemstones, sprite_sheet, white_background, simple_background, halloween_(theme)

--------------------------------------------------------------------------------------------------

Magic tree generation with prompt
Magic tree generation with the given prompt.
Furniture generation with prompt
Furniture generation with the given prompt.

--------------------------------------------------------------------------------------------------

prompt: houtline, best quality, line-up, line_art, 2d_outline, fornitures, props, village_fornitures, wooden, walls, wooden_structures, gemstones, sprite_sheet, white_background, simple_background, halloween_(theme)

--------------------------------------------------------------------------------------------------

Village furniture generation
Village furniture generation with the given prompt.
Church furniture with ghosts generation
Church furniture with ghosts generation with the given prompt.

------------------------------------------------------------------------------------------------

Prompt: houtline, best quality, line-up, line_art, 2d_outline, fornitures, props, church, Tombs, ghost, tinny_ghost, church_fornitures, wooden, walls, wooden_structures, gemstones, sprite_sheet, white_background, simple_background, halloween_(theme)

------------------------------------------------------------------------------------------------

Here I will show other drawing models. The previous ones are Cheyenne, but you can also move the LoRa to other models.

_CHINOOK_ by Aurety

CHINOOK generation
CHINOOK generation

DynaVisionXL

DynaVisionXL generation
DynaVisionXL generation

Cheyenne By Aurety

Cheyenne generation
Cheyenne generation

Once this is shown, we will proceed with the training, but not before explaining:

What is a LoRa?

A LoRa is a specific type of generative AI model, designed with specific ingredients to produce coherent and consistent results in certain tasks. To better understand the difference between a LoRa and other types of models, let's look at the following categories:

Foundation Models

Model Description
Stable Diffusion A general model with a vast amount of information, it is open source and you can download and run it on your own machine.
DALL-E Another general model with a vast amount of data, it is used in ChatGPT and Copilot.
MidJourney A general subscription model.

Checkpoints

Models Description
PonyDiffusionXl, DinavisionXL, Juggernaut XL, AnimagineXL, AutismixXL Smaller models than foundation ones, but they still require a considerable amount of computing power.

LoRa

LoRas are models prepared with specific ingredients for specific tasks. They are used in conjunction with foundation models or checkpoints as a guide to produce coherent and consistent results in a specific task. They act like a guardrail that limits the model's creativity to produce more predictable and useful results in certain contexts.

It is recommended to familiarize yourself with image generation in Stable Diffusion to better understand how to use a LoRa training.

1.2 Tagging the Dataset for Training:

I use Colab because I don't have a computer capable of performing this type of training, but the free computing power available is more than enough for the task we are going to do.

Access the Colab Notebook

Let's go step by step:

Colab Configuration

Run and connect your notebook to Drive and name the project. This will create three folders: Loras/project_name/dataset

Next, go to your Drive and upload your photos to the dataset folder. Now, we move directly to step number 4 of the notebook.

Step 4 of the Colab notebook

Clarifications regarding this: there are two vision models in this notebook, one is anime and it is better for extracting tags from characters, while photography is better with general photos. The result will be something like this:

Anime: 1cat, fur, animal, blue eyes, sitting, chair, depth_of_field, indoor

Photography: A cat sitting in a chair in a bedroom.

You can choose either one. To train my style, I used photography (BLIP) and for my nurse, I used anime (waifudiffusion).

The threshold determines the level of sensitivity. For this step, I recommend leaving the parameters as suggested by the notebook itself. This process takes about 4 minutes.

Finally, we add an activation word:

Activation word configuration

Where it says "hatsune miku," we make a change and use an activation word. For my own style, I used "Houtline." This will activate all the features dictated in the training, and we will write it in the prompt when generating. Be careful not to use words that can be confused with other tags, like "cat," "dog," "girl," or any very general word.

Final configuration

Before continuing, remember to disconnect and delete the current run.

2. Training Notebook Setup

Now we have the dataset ready and tagged for training. I use a notebook from the same author, the link is below:

Hollowstrawberry's Lora Trainer

The first step:

Initial Colab setup

Insert the same name you used previously in the project, then select the base model for training. Each one has its advantages.

Model Description Recommended Use
Stable Diffusion SDXL base 1.0 Suitable model for realistic images. Diverse, but I use it to generate assets.
Pony Diffusion SDXL Optimized for anime and NSFW content. Generation of anime-related content and good for NSFW.
Animagine XL Specialized in anime, less effective with NSFW content. Very flexible, assets also look good in Animagine.

A LoRa trained on one of these models has more or less influence if used in other base models. For example, training my images in Stable Diffusion SDXL base 1.0 would make it harder to generate successfully in Pony Diffusion XL. Keep this in mind. For creating assets, I use SDXL base 1.0 since Cheyenne is the best model I've used for generating these and its base is SDXL.

Let's continue.

Base model configuration

Activation tags: If we used a trigger word in the previous notebook, we leave this at 1 and continue.

2.1- Training Configuration

The following explains the key parameters to set up the training:

Parameter Description
num_repeats Number of times the training will iterate with each image.
Epochs The model will train on a set of images for this number of epochs. Each epoch consists of processing all the images in the dataset once.
batch_size Number of images the model will compare in each epoch. A higher batch_size can speed up the training but may also require more memory.

The configuration of these parameters will affect the performance and effectiveness of the generative model training.

Let's go through the math:

I always try to stay within the threshold of 300 to 500 total steps.

Number of images multiplied by num_repeats and divided by batch_size, and then multiplied by the number of epochs, would look like this:

Number of Images num_repeats batch_size epochs Total Steps
10 20 6 10 10 x 20 / 6 x 10 = 400
50 4 6 10 50 x 4 / 6 x 10 = 400
100 2 6 10 100 x 2 / 6 x 10 = 400

We set according to our calculations and go down to training.

Configuration of train_batch_size

We go to train_batch_size and configure it, I usually set it to 6.

2.2. Optimizer

Next is the optimizer. I've only used two: adamW8bits (for datasets with many images) and prodigy (for datasets with few images, my favorite for training characters).

Optimizers

Keep in mind that the notebook author recommends an argument for each optimizer. When you change the optimizer, also change the argument.

Once this is done, run the notebook, and the training will begin. This process takes between 1.5 to 3 hours. It should not exceed that time since Google provides a limited amount of compute time daily. After 3 hours of training, the notebook will disconnect, and it will stop wherever it has reached.

The final files will be in the output folder on Google Drive.

3. Image Generation

Here you can choose whatever you want. There are many platforms to upload your LoRa. I will use Civitai because I want to show a free alternative, but there are certainly many more options.

Civitai

Upload your model and follow the form. It's not a big deal, but you have to wait for some verifications. It's a bit tedious to do it again (check the privacy settings; there's a way to keep the model hidden for yourself. Once uploaded, I recommend making it public and waiting a bit before switching it to draft mode). I've been writing this guide for several hours, so I'll leave the link to my model and do the tests with it.

Upload model

Go to your profile and choose your LoRa.

Choose model

Choose a base model, write a prompt, and generate.

Generate images

I asked for winter trees; the results are:

Winter trees

Also, some houses:

Houses

I'll leave the link to my model here and post the images in case you want to copy the prompt or something like that. I will also upload a folder with many generations using my model. You can either just look at it or clean up the assets and use them.

This is the end of the guide. I'm not an expert in this, but I've done several tests and thought it would be helpful to share it with those related to image generation. It should be noted that no generated result is a good final output. You will always need to repair, select, clean, and work on everything you generate. But it's a good starting point and can be useful for making mock-ups and placeholders that you will later replace. In any case, I hope it is useful to you.

<style> .box { width: 200px; height: 200px; background-color: #3498db; margin: 20px; display: inline-block; animation: move 4s infinite alternate; } @keyframes move { from { transform: translateX(0); } to { transform: translateX(300px); } } </style>

Download

Download NowName your own price

Click download now to get access to the following files:

generative_sampler.rar 166 MB

Comments

Log in with itch.io to leave a comment.

Wow.. very clearly written and easy to understand. I followed most of this on the first read through.

(+1)

It's certainly impressive that someone can carry out a Lora Training in their first shot. It is not a great science either, but it requires someone to tell you what to do and what not to do, and I am happy to be able to clarify it for others, as I am committed to the democratization of knowledge.