Most Efficient Flux? GGUF + Hyper-SD Comparisons

5.0

0 reviews

12.2K

1.2K

Description

Came across the Hyper-SD Lora by ByteDance, which is trained for Flux Dev-1. I was curious to if it would apply to the GGUF version, so I did a quick try and comparisons. Below are some quick notes on speed and image quality & accuracy.

TLDR take backs:

1) Seems like the hyper-sd lora is optimized for the original flux-dev. I was surprised that dev-1 at 8 steps ran so much faster than that of GGUF 8 steps. Even within GGUF, it seems like 20 steps with no hyper-sd lora actually ran faster than the 16 steps version. Would love to know if GGUF performs better underlimited VRAM conditions?

2) Tbh quality and fidelity loss on hyper-sd 8 step versions were noticeable to me. So I'd rather go with the 16 step version. Dev-1 + 16 step lora for those with enough VRAM.

3) Looks to me like text adherence was off to most of the images anyway, but this can be alleviated by using the fine tuned clip model created by zer0init !

Hope it helps, and if anyone comes across a GGUF optimized version of Hyper SD, please let me know!

Setup and settings:

Hardware: RTX 4090

Image Size: 832 x 1216 (sorry for the unusual spec, but I made sure all tests were done with the same image size)

CFG: 3.5

Model (hyper-sd lora) Strength: 0.14

Prompt：A spaceship with a large LED panel displaying text "hellow world", black and white only, flat vector art

I used this prompt to test fidelity, because elements like vector, black and white only, and text can be more clearly identified and less prone to bias. Also the "hellow world" a typo, but I decided to keep it anyway, to see if the model would auto-correct it due to vector similarity.

Test results:

Dev fp8 8 steps:

6.06s

6.09s

6.15s

Fidelity: not following black and white only, showing color, text was off

Dev fp8 16 steps:

11.8s

11.78s

Fidelity: flat and monochromatic, but text was off

Dev fp8 20 steps no hyper-sd:

14.75

14.71

14.72

Fidelity: benchmark, still not always showing LED and text

GGUF q8 8 steps:

10.5 sec

10.24 sec

10.36 sec

Fidelity: sometimes low image quality, not accurately displaying LED panel, or not showing LED panel at all

GGUF q8 16 steps:

19.87 sec

19.89 sec

19.92 sec

Fidelity: flat and monochromatic, but text could be off

GGUF 20 steps no hyper sd:

16.3 sec

16.27 sec

16.98 sec

Fidelity: showing text, but not accurately showing LED panel

Some sample images for your reference:

Dev-1 + 8 step hypersd lora + zer0init clip encoder: undesirable color is showing, and text is off

Dev-1 + 16 step hypersd lora + zer0init clip encoder: most accurate rep so far. the fine tuned clip encoder certainly helps

GGUF q8 + 8 step hypersd lora: quality and accuracy lost, starting to show a lot of gray

GGUF Q8 + 16 step hypersd lora : style is accuratel, but text is off ( can be allieviated using the zer0init clip encoder)

Discussion

(No comments yet)

Loading...

Author

Stonelax

111.3K

957

456.3K

Reviews

No reviews yet

Versions (1)

- latest (a year ago)

Node Details

Primitive Nodes (7)

CLIPTextEncodeFlux (1)

Note (4)

PrimitiveNode (1)

UnetLoaderGGUF (1)

Custom Nodes (12)

ComfyUI

- VAEDecode (1)
- VAELoader (1)
- DualCLIPLoader (1)
- SamplerCustomAdvanced (1)
- KSamplerSelect (1)
- BasicScheduler (1)
- BasicGuider (1)
- RandomNoise (1)
- LoraLoader (1)
- SaveImage (1)
- UNETLoader (1)

ComfyUI Essentials

- SDXLEmptyLatentSizePicker+ (1)

Model Details

Checkpoints (0)

LoRAs (1)

Hyper-FLUX.1-dev-16steps-lora.safetensors

OpenArt

Workflows

Active Sessions