Caption/Segment/OCR - Variety of Vision Tasks with Florence2
5.0
0 reviewsDescription
https://github.com/kijai/ComfyUI-Florence2
https://huggingface.co/microsoft/Florence-2-base
This is a way to achieve a variety of visual processing tasks using the latest Microsoft/Florence-2 models
Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks. Florence-2 can interpret simple text prompts to perform tasks like captioning, object detection, and segmentation. It leverages our FLD-5B dataset, containing 5.4 billion annotations across 126 million images, to master multi-task learning. The model's sequence-to-sequence architecture enables it to excel in both zero-shot and fine-tuned settings, proving to be a competitive vision foundation model.
Because it is so easy to use, I must recommend it to everyone to try it.
And welcome to my Channel
https://youtube.com/@cyberdicklang
https://space.bilibili.com/339984
Discussion
(No comments yet)
Loading...
Reviews
No reviews yet
Versions (1)
- latest (a year ago)
Node Details
Primitive Nodes (4)
DownloadAndLoadFlorence2Model (1)
Florence2Run (2)
Simple String (1)
Custom Nodes (4)
Model Details
Checkpoints (0)
LoRAs (0)