Guide to training your drawings in generative AI
A downloadable asset pack
Guide to Training Your Drawings with Generative AI
1. Introduction
Hello, today I want to share with you a complete guide on how to use generative AI to create assets. My focus is for and towards other artists, whether to accelerate the workflow, explore interesting alternatives, due to technical limitations , or any other reason. This has allowed me to take control of the consistency and style of my creations, and I want to share this knowledge with others.
Steps to Follow:
- We will gather a dataset of images.
- We will automatically obtain their descriptions using a Colab notebook.
- We will train an algorithm with the images and descriptions to obtain a safetensor file that we can use anywhere.
The platform you choose to generate with your LoRa is a different matter, you'll see many options. In this guide, I am focusing on helping you obtain a safetensor file.
1.1 Pre-training:
Dataset Preparation:
To start, you will need to collect your own drawings. I started with 35, but even a small dataset can be useful to generate a basic model that can be iterated and improved over time. When selecting the drawings for your dataset, it is important to maintain consistency in what you want to highlight. For example, my drawings of trees are plump and circular, a characteristic I wanted to emphasize.
Collection of Drawings
Examples and Clarifications:
Before starting, I will show some generations of this style and make a few clarifications:
--------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------
prompt: houtline, best quality, line-up, line_art, 2d_outline, fornitures, props, village_fornitures, wooden, walls, wooden_structures, gemstones, sprite_sheet, white_background, simple_background, halloween_(theme)
--------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Prompt: houtline, best quality, line-up, line_art, 2d_outline, fornitures, props, church, Tombs, ghost, tinny_ghost, church_fornitures, wooden, walls, wooden_structures, gemstones, sprite_sheet, white_background, simple_background, halloween_(theme)
------------------------------------------------------------------------------------------------
Here I will show other drawing models. The previous ones are Cheyenne, but you can also move the LoRa to other models.
_CHINOOK_ by Aurety
DynaVisionXL
Cheyenne By Aurety
Once this is shown, we will proceed with the training, but not before explaining:
What is a LoRa?
A LoRa is a specific type of generative AI model, designed with specific ingredients to produce coherent and consistent results in certain tasks. To better understand the difference between a LoRa and other types of models, let's look at the following categories:
Foundation Models
Model | Description |
---|---|
Stable Diffusion | A general model with a vast amount of information, it is open source and you can download and run it on your own machine. |
DALL-E | Another general model with a vast amount of data, it is used in ChatGPT and Copilot. |
MidJourney | A general subscription model. |
Checkpoints
Models | Description |
---|---|
PonyDiffusionXl, DinavisionXL, Juggernaut XL, AnimagineXL, AutismixXL | Smaller models than foundation ones, but they still require a considerable amount of computing power. |
LoRa
LoRas are models prepared with specific ingredients for specific tasks. They are used in conjunction with foundation models or checkpoints as a guide to produce coherent and consistent results in a specific task. They act like a guardrail that limits the model's creativity to produce more predictable and useful results in certain contexts.
It is recommended to familiarize yourself with image generation in Stable Diffusion to better understand how to use a LoRa training.
1.2 Tagging the Dataset for Training:
I use Colab because I don't have a computer capable of performing this type of training, but the free computing power available is more than enough for the task we are going to do.
Let's go step by step:
Run and connect your notebook to Drive and name the project. This will create three folders: Loras/project_name/dataset
Next, go to your Drive and upload your photos to the dataset folder. Now, we move directly to step number 4 of the notebook.
Clarifications regarding this: there are two vision models in this notebook, one is anime and it is better for extracting tags from characters, while photography is better with general photos. The result will be something like this:
Anime: 1cat, fur, animal, blue eyes, sitting, chair, depth_of_field, indoor
Photography: A cat sitting in a chair in a bedroom.
You can choose either one. To train my style, I used photography (BLIP) and for my nurse, I used anime (waifudiffusion).
The threshold determines the level of sensitivity. For this step, I recommend leaving the parameters as suggested by the notebook itself. This process takes about 4 minutes.
Finally, we add an activation word:
Where it says "hatsune miku," we make a change and use an activation word. For my own style, I used "Houtline." This will activate all the features dictated in the training, and we will write it in the prompt when generating. Be careful not to use words that can be confused with other tags, like "cat," "dog," "girl," or any very general word.
Before continuing, remember to disconnect and delete the current run.
2. Training Notebook Setup
Now we have the dataset ready and tagged for training. I use a notebook from the same author, the link is below:
Hollowstrawberry's Lora Trainer
The first step:
Insert the same name you used previously in the project, then select the base model for training. Each one has its advantages.
Model | Description | Recommended Use |
---|---|---|
Stable Diffusion SDXL base 1.0 | Suitable model for realistic images. | Diverse, but I use it to generate assets. |
Pony Diffusion SDXL | Optimized for anime and NSFW content. | Generation of anime-related content and good for NSFW. |
Animagine XL | Specialized in anime, less effective with NSFW content. | Very flexible, assets also look good in Animagine. |
A LoRa trained on one of these models has more or less influence if used in other base models. For example, training my images in Stable Diffusion SDXL base 1.0 would make it harder to generate successfully in Pony Diffusion XL. Keep this in mind. For creating assets, I use SDXL base 1.0 since Cheyenne is the best model I've used for generating these and its base is SDXL.
Let's continue.
Activation tags: If we used a trigger word in the previous notebook, we leave this at 1 and continue.
2.1- Training Configuration
The following explains the key parameters to set up the training:
Parameter | Description |
---|---|
num_repeats | Number of times the training will iterate with each image. |
Epochs | The model will train on a set of images for this number of epochs. Each epoch consists of processing all the images in the dataset once. |
batch_size | Number of images the model will compare in each epoch. A higher batch_size can speed up the training but may also require more memory. |
The configuration of these parameters will affect the performance and effectiveness of the generative model training.
Let's go through the math:
I always try to stay within the threshold of 300 to 500 total steps.
Number of images multiplied by num_repeats and divided by batch_size, and then multiplied by the number of epochs, would look like this:
Number of Images | num_repeats | batch_size | epochs | Total Steps |
---|---|---|---|---|
10 | 20 | 6 | 10 | 10 x 20 / 6 x 10 = 400 |
50 | 4 | 6 | 10 | 50 x 4 / 6 x 10 = 400 |
100 | 2 | 6 | 10 | 100 x 2 / 6 x 10 = 400 |
We set according to our calculations and go down to training.
We go to train_batch_size and configure it, I usually set it to 6.
2.2. Optimizer
Next is the optimizer. I've only used two: adamW8bits (for datasets with many images) and prodigy (for datasets with few images, my favorite for training characters).
Keep in mind that the notebook author recommends an argument for each optimizer. When you change the optimizer, also change the argument.
Once this is done, run the notebook, and the training will begin. This process takes between 1.5 to 3 hours. It should not exceed that time since Google provides a limited amount of compute time daily. After 3 hours of training, the notebook will disconnect, and it will stop wherever it has reached.
The final files will be in the output folder on Google Drive.
3. Image Generation
Here you can choose whatever you want. There are many platforms to upload your LoRa. I will use Civitai because I want to show a free alternative, but there are certainly many more options.
Upload your model and follow the form. It's not a big deal, but you have to wait for some verifications. It's a bit tedious to do it again (check the privacy settings; there's a way to keep the model hidden for yourself. Once uploaded, I recommend making it public and waiting a bit before switching it to draft mode). I've been writing this guide for several hours, so I'll leave the link to my model and do the tests with it.
Go to your profile and choose your LoRa.
Choose a base model, write a prompt, and generate.
I asked for winter trees; the results are:
Also, some houses:
I'll leave the link to my model here and post the images in case you want to copy the prompt or something like that. I will also upload a folder with many generations using my model. You can either just look at it or clean up the assets and use them.
This is the end of the guide. I'm not an expert in this, but I've done several tests and thought it would be helpful to share it with those related to image generation. It should be noted that no generated result is a good final output. You will always need to repair, select, clean, and work on everything you generate. But it's a good starting point and can be useful for making mock-ups and placeholders that you will later replace. In any case, I hope it is useful to you.
<style> .box { width: 200px; height: 200px; background-color: #3498db; margin: 20px; display: inline-block; animation: move 4s infinite alternate; } @keyframes move { from { transform: translateX(0); } to { transform: translateX(300px); } } </style>Status | Released |
Category | Assets |
Author | AI_lab |
Tags | AI Generated, artificial-intelligence, Drawing |
Download
Click download now to get access to the following files:
Comments
Log in with itch.io to leave a comment.
Wow.. very clearly written and easy to understand. I followed most of this on the first read through.
It's certainly impressive that someone can carry out a Lora Training in their first shot. It is not a great science either, but it requires someone to tell you what to do and what not to do, and I am happy to be able to clarify it for others, as I am committed to the democratization of knowledge.