Yep, definitely the case. Need CUDA cores, without them it's doing other and more work. It's funny the apple silicon will probably see faster performance increase than the Radeons just because that's where the people wrenching on this stuff are. I'm still in intel land there for OSX and will be for quite a while (audio software reasons). I'm operating slowly with a 2070RTX and agree, once you've got process, it is a much better value proposition to pay for cloud inferencing. Merging of large checkpoints, building intricate models, aren't possible on consumer hardware until you get into silly money and datacenter boards will always be faster in any case. 

There's facility and process for separation of elements but were I working professionally in this fashion I would not be expecting final output from inferencing. I would use it as a stock library, at this point in the tech anyway. Even the simplest use of the thing- generating backgrounds, saves so much work and allows for so much different _kind_ of work that if that was all it did it would be a revolution. Separation or direction of elements during inferencing has seen improvement in leaps and bounds, though. Not long ago it was just inpainting (creating masks), and autogeneration of partitions with math, then partition by token/attention and now even automatable manipulation of masking and weights by partition by mask. So it's definitely doable but would require each to build pipeline for their needs. Those tools are still in early stages but the ComfyUI¹ has a 'workflows' conceit for doing a lot of that with visual programming. I haven't moved on from grado library webui yet so no first hand experience with it. Already have pipelines for my purpose in that, the usual, no time to move to the bleeding edge~

For that particular need, the stock featureset of Automatic1111² can do growth, called "outpainting", of a given input. There's extensions to expand on steering capabilities for that but it does a fine job in my experience, as long as the checkpoint and tokens you're using "knows" something of the input.
Talking about art is like dancing about architecture so I hope I'm being clear. Also, I should say this is all down at the consumer level, folks building huge base models or using it in enterprise are much more likely to be using pytorch and the diffusers lib.

soup
¹ https://github.com/comfyanonymous/ComfyUI
² https://github.com/AUTOMATIC1111/stable-diffusion-webui

On Thu, Feb 15, 2024 at 9:52 AM John Stoffel <john@stoffel.org> wrote:

You have to have a recent nVidia CPU to get good results, I was trying
using AMD GPUs and it just wasn't fast at all.  So instead of spending
$500 on a GPU, I think spending $20 on some cloud rental might be a
good use of my money and time.

Still watiing on <blanking on his name> to post his setup instructions
from New Orleans so we can all start poking at it.


Now one big issue I see is that once get generate an image, can you
get a version broken into layers?  Do professional artists work mostly
in layers on computers so they can more easily re-compose their
layout? 

For some of the examples, I though it was really neat, but I'd like to
shift the result to the left say to emphasize the background more. 

But since I'm a terrible artist without any training or much talent, I
leave it to others to answer these questions.