skip to Session 4 - AI Aided Writing
Session 3 - Outline for Students
Overview. In Session 3 we look at more complex aspects of AI image generation such as:
- using one image's layout to structure next image (Part A1)
- achieving consistent character appearance through a long sequence of images (Part A2)
- upscaling existing images to a much higher quality for printing etc (Part A3)
and
- producing long video from simple text or image beginnings (Part B1)
Part A1. Arrangement of objects in an image using base image
A1a. The problem. In Session 1 where image generation was introduced, the focus was on describing the image wanted in words then letting the tool do the rest. The user's only input really was at the prompt stage and in selecting between images. So the user may get an image that matches the prompt but the objects within it may be in the wrong place or of an incorrect relative size for the desired outcome.
One way to get better control is to start with the rough shape of the image you want. To show this below first is a standard DreamStudio production of a prompt "man biting dog".
A1b. Using one image to generate another Here a particular image has been selected because it has the general required layout. The details happen not to meet the user requirement however. Here we download it and save it on our computer. On many tools this may not be necessary but it also shows the case where your starting image in not from DreamStudio.
A1c. Same image layout, different style. In the image below the previous image has been uploaded and then the slider has been set to 16% which means that it has been degraded considerably. To the human eye it seems as if there is little information remaining, however to the tool there is plenty. The prompt is changed slightly and a different style is chosen (neon). Notice how the results retain the same general layout as the original as desired.
Note that there are no hard and fast rules about the percentage to choose. Set it too high however and you will simply reproduce the original, or something very much like it.
So our final selected image might be this one.
A1d. Steps and seeds. Before leaving DreamStudio it is worth looking briefly at the Advanced section of the panel - see below.
Steps This is the number of stages the tool uses between starting with a noisy image and producing the final one. In general it is good to stay with the default but do experiment and see what effect it has on your project. Potentially with some tools changing this may have an effect on the costs involved and the time to produce an image.
Seed As covered elsewhere, the seed is a number like 549836 which is put into a random noise generator to produce the initial image. If you repeat the seed then you should end up with a very similar end image. Normally it is set to 'auto' which will ensure a different value every time.
Part A2. Consistent characters across multiple images
In Session 4 of this course an AI Chat tool is used to develop a fictional story set in an English country public house. In that is a girl, Jamie Carter, still a teenager living with her parents who run the local bakery. She is known in Upper Duckpond for her precocious nature and love of literature.
Imagine that the story is to be illustrated with various images of the characters going about their investigations (it is a detective story). To achieve this Midjourney is given the prompt "Teenager Jamie is an observant, intelligent, and quietly witty girl". Midjourney is used here as they have achieved quite a good character consistency method, which will doubtless improve further.
From this one was chosen. The challenge is then to create other images that are consistent with this one.
The first request was to show Jamie in a garden.
The Midjourney prompt was as follows. It is explained below.
Jamie in a garden --cref https://cdn.midjourney.com/4195d3b9-defd-4689-a047-40107d18e2cd/0_2.webp --cw 0
Explanation of prompt. The --cref prompt is followed by a URL. That URL (web address) is obtained from the properties of our Jamie picture above. The --cw 0 apparently says just to focus on copying the face. For more see Midjourney support. Remember that you can repeat the same request again and again until you have the image with the look you want. Also many of the images will probably turn out to be useful even if not quite as originally desired. Luck plays a big role in AI art.
Next here is Jamie on a bike.
Then, as in the story, Midjourney is asked for images of Jamie (using the original picture) in a bar. This produces a fairly convincing set as below, at first glance at least.
Looking more closely however, in at least one of the images (see below) Midjourney has given Jamie extensive tattoos - not what was expected! Of course, if we mentioned this to ChatGPT which wrote the fiction story seen in Session 4 no doubt this would be incorporated into its story about 'The Goose's Gobble' public house.
To conclude this 'character consistency' discussion, the tools will continue to improve so you can now create your illustrated story, make a fantasy social media influencer, or do whatever you can imagine!
Part A3. Upscaling images (adding fine detail).
If you wish to get your artwork printed in anything other than a postcard size then you have to ensure that it has the immense detail needed when it is blown up to a large size, for example for a wall poster. The following AI tool is shown as our first example. We do not recommend you sign up unless you have deeply researched it yourself as there is no free option and prices are substantial (Pro plan $39/mo, Premium plan $99/mo at 20 March 2024). Do view the examples on their homepage though which explain clearly what is being done.
The tool is Magnific AI Magnific.ai and if you scroll down on their homepage you can see a lot of Before and After images. Note that the tool cannot magically obtain detail from a relatively blurred image (as is shown on many a police drama) - instead it 'reimagines' each area of the original. The effect can be stunning as you will see. There is an FAQ page if you wish to know more.
Many other AI Art tools offer 'upscaling' but we do not look at those further here.
Part B. AI Video
B1. OpenAI Sora - video from text. openai.com/sora. This is only mentioned briefly here but as it seems to be leading the field it is worthwhile visiting the website to view the examples which are extremely impressive. It seems as if it may be some considerable time before Sora is publically available, and when it is the demand for it and the resource requirements are going to be extreme. Do not expect free use!
OpenAI's video introducing Sora
B2. Runway - video from text, video editing and more runwayml.com. This gives you some FREE CREDITS when you sign up and is then available at a monthly subscription of USD 15 as at mid March 2024. It is a well established tool which does a range of useful things which makes it a little difficult to define. These include:
- text and/or image to image
- text and/or image to video
- training an AI to make custom sets of images
- image expansion (at edges) and infinite image, also image upscaling
- erase and replace, and backdrop remix
- super slow motion for video
- video audio improvement and subtitle addition, also green screening
Text to video with Runway. This has progressed very powerfully recently and now (August 2024) offers 'Gen-3' which generates ten seconds video (which quickly eats up your credits). For example, the following is a clip from when it was given a Midjourney image of a princess talking to a man, wit the prompt "Princess talks to her new lover".
Giving Runway Gen-3 an image lets you peek into other possible worlds ... pic.twitter.com/Oyl3lMlOB1
— The Silver AI Project - free AI training materials (@SilverAIProject) August 2, 2024