Replies: 15 comments 62 replies
-
I currently use |
Beta Was this translation helpful? Give feedback.
-
This may not be helpful but I can say that I personally have not found much use out of genai in Frigate. Especially after moving to the Jina models, it can read text (making text on vans, shirts, etc. searchable) and is very specific. Perhaps you may find it similar and it would be easier to just get a genai chat model for HA and not worry about Frigate. |
Beta Was this translation helpful? Give feedback.
-
After a couple of beta releases of 0.15, we plan to post a survey asking what use cases users have found with the genai feature. Like Nick, I thought it was pretty nice at first, but once we integrated the Jina CLIP model, using an "image caption" approach for searches has proved to meet all my needs and exceed my expectations. |
Beta Was this translation helpful? Give feedback.
-
I have not been using dev 0.15 for very long.. but I tend to agree that genai is fun at first, but I am not seeing much use for it. The semantic search is doing everything I need. |
Beta Was this translation helpful? Give feedback.
-
I've spun up the OpenWebUI and installed the llava:7b model. Am I to assume you just connect on port 3000 in the Frigate config or 11434? I've gone to the Explorer and tried to re-generate a description of an image and get nothing. I would have throught the CPU would go throught he roof on the Ollama instance as I'm not running a GPU - similar to when I query the model on Open WebUI. Any pointers appreciated. genai: |
Beta Was this translation helpful? Give feedback.
-
I tried to use GenAI with Ollama, but getting a lot of errors about parallel requests:
Both Frigate and Ollama running on the host with nVidia Quadro P5000 with 16GB VRAM What's strange is that looking at Still can't see anything generated. |
Beta Was this translation helpful? Give feedback.
-
I am trying to get genai running but I am not seeing ANYTHING in frigate so maybe I should validate my config I have a global config item of the following by default I have it set to enabled false so only select cameras have it enabled:
Then on my doorbell I have it set to true
I have tried triggering events by waking past the doorbell but I don't see anything gen AI related in the logs at all and no activity on my ollama host. I have open web UI installed on the same host as frigate and I as able to test the connection from it to the ollama host (my gaming PC with GPU) and that works, I also made sure to run ollama with the model persisting to avoid the timeout and unloading. I see lots of comments from the devs about the usefulness so here is my use case.
|
Beta Was this translation helpful? Give feedback.
-
If I wanted to generate a notification that includes the AI description once it is complete is there a specific mqtt topic / message type I would be looking for? I presume an event / alert fires first then the description is generated after? I am hoping I can do something like the following.
|
Beta Was this translation helpful? Give feedback.
-
Just wanted to ask, can you perform a face recognition with semantic or genai after frigate detects a person? |
Beta Was this translation helpful? Give feedback.
-
I have been playing around with Ollama, in particular playing around with different prompts to give more context e.g. explaining where the camera is (above front door facing the street in a residential neighbourhood), asking it to describe more elements of what it sees and in a more structured format. One point I've noticed is that when restarting Frigate so that prompt changes can take effect, then going back to a particular event and re-generating to get a before/after, is that the response suggests that only one image has been used, even when specifying using thumbnails. Looking at this code, am I correct in thinking that frigate/frigate/embeddings/maintainer.py Line 347 in 7fdf42a This makes it slightly more difficult to experiment by regenerating, but short of storing a full collection of thumbnails not sure it's easily improved upon? |
Beta Was this translation helpful? Give feedback.
-
Is there a debug logging specific to the gen AI that can be turned on? I am not seeing it actually add descriptions to most events.. I am able to use my config via Openweb UI just fine so I suspect something is up in frigate. Over all I am also contemplating abandoning the frigate config due to it being object level and concurrency being a potential cause of issues since I mainly want to use it for notifications I am considering just using the HA LLM plugin and attaching it to my automations there. |
Beta Was this translation helpful? Give feedback.
-
Prompts and models discussion. I have been experimenting with the models available in the ollama vison catalog and I think I am really down to only llava-llama3 and llama3.2-vision working the best at the moment, with me leaning toward llava-llama3. This is mostly because they seem to give better context to images, fewer made up details or wrong details. They also seem to understand prompt better. I have been testing with snapshots as those are the only consistent way to test prompts outside frigate since not all models support image sequences and frigate doesn't persist them. I am largely focused on getting descriptions that are short and useful for text to speech announcements to enhance notifications but I am not quite there yet due to model inconsistency I am still getting some novels written about the image. These prompts both assume a snapshot where the image is not cropped in on the subject, I may go back to trying clips but that only works with some models.
Example output: "The person closest to the camera appears to be wearing a light-colored puffer jacket and pants with a hood pulled over their head. The image shows a blurry face, making it difficult to discern any facial features or details." "A man wearing a white hoodie and sweatpants is walking away from the camera in the center of the frame. He appears to be in motion." This one is good but too long: "The individual closest to the camera appears to be a middle-aged woman with long gray hair and glasses. She is wearing a dark jacket with white trim on the zipper. The woman has a necklace featuring what appears to be a small white stone or pearl. Upon closer inspection, it seems that she may also be holding an object in her right hand; however, due to the angle of the image, this cannot be determined for certain." This one included the time stamp from the camera which I would like to filter out: "The main object closest to the camera is a woman with brown hair wearing a red shirt and blue jeans. She appears to be standing near a wall. The image also contains some text at the top left corner which reads "11/26/2018 08:03:29 PM Fri"."
My trigger zone is intentionally a ways up the driveway to ensure the car is in clear view before triggering. Correct but too simple: "The car closest to the camera is silver with a black tire." Close but not quite rigth "The car in the image is a small, red hatchback with a white license plate. It has black handle grips and tinted windows. The car appears to be in good condition and well-maintained." Probably one of the most accurate but also a bit long: "The car in the image is white with black stripes on the hood. It has a silver grille and headlights that are turned on. The car also features a large spoiler at the back, which adds to its sporty design. The car's color contrasts sharply with the gray sidewalk it is parked on." |
Beta Was this translation helpful? Give feedback.
-
I've got a weird issue, I can't get the genai or embeddings code to generate any debug logs. I can configure other packages like frigate.comms.mqtt or frigate.recordings.maintainer to debug, but when I do the same for the genai code I never see any debug messages. Am I going mad? I can't see any obvious reason the embeddings/genai packages wouldn't work.
|
Beta Was this translation helpful? Give feedback.
-
I have created a survey that we want to use to get feedback from users that have been using the GenAI feature for at least 1 week. The feedback will be very helpful in understanding users use cases and how the feature can be improved. The survey is anonymous: https://docs.google.com/forms/d/e/1FAIpQLSehVvK6oVKzikuKzwlIGz2eiaUwhYQq24ZWy9Iqe5NWSrI2Aw/viewform |
Beta Was this translation helpful? Give feedback.
-
Hats off to the devs @NickM-27 for making it an official feature, I actually ended up rolling my own inference pipeline + semantic search on top of Frigate a few months back to do the same thing. Though that uses plain Transformers to batch process images as opposed to realtime events. I'm glad there's a polished usable version now :) As for the sentiment that genai descriptions aren't of much use, I agree that it is indeed limited when it comes to describing static images. However, I think there's more potential instead to feed the LLM model more metadata (when the object entered/ exited the frame etc.) and combining it with the camera orientation/ spatial aware idea from here, to link across multiple detections and come up with explanations for a set of events, rather than individual detections. In short, detection events can be grouped first by spatial and/or temporal proximity, then sent to the LLM to try and reason about it. Captures across multiple events can be useful to obtain possible narratives and reconstruct scenarios, and might offer better insight (when you want to find a misplaced object etc.)
{
"model": "llava-llama3",
"prompt":"Describe a possible storyline from a security camera: 1 frame taken from front driveway at 4:45pm, and 2 frames taken from backyard at 5:01pm, 5:05pm. List prominent persons and objects, what they might be doing, and where objects may have been placed after this series of events. Be as specific as possible.",
"stream": false,
"images": [...]
}
produces:
`"response": ""1. 4:45 pm - Front driveway\n- A person, who is wearing a black jacket and pants, is seen walking up the front steps of the house.\n- The house has a garage door visible in the background.\n\n2. 5:01 pm - Backyard frame 1\n- The same person from the previous scene is now shoveling snow in front of a shed in the backyard.\n- There are two shovels present in this scene, one being used by the person and another placed on the ground near the shed.\n\n3. 5:05 pm - Backyard frame 2\n- The same person is still shoveling snow in the backyard, as seen from a different angle of the backyard.\n- A tree stump can be spotted in this scene.\n\nAfter these events, it appears that the person has been clearing the snow from the front and backyard. The snow removed from the front steps may have been piled near the garage door. The shed and tree stump seem to have been left as they were. The second shovel might have been used temporarily or was replaced with a new one during the process of shoveling snow. It's not clear whether any other objects were moved or placed in these scenes, but given that it's winter time, there could be more snow removal happening outside the frame of the security camera footage.""`
I suppose the endgame is some sort of LLM system with object permanence, which is a more power capability on top of facial rec. |
Beta Was this translation helpful? Give feedback.
-
I wanted to start a discussion for those that are currently using the 0.15 dev builds of Frigate and have tried Ollama models, finding the good & bad of each. This will become more active I hope once the 0.15 version and beyond are released officially. I went down the rabbit hole a little while ago with Ollama when the dev builds included access, however, had difficulty installing it and getting the models running correctly...
I followed this guide a few days ago; which was alright but I still had difficulty getting the Ollama server to connect with the webui. Then I tried just using the method from Open WebUI in particular the bundled version This worked quite well.
This lead me to start trying different models, ideally one that I can use for Home Assistant as well as for Frigate. The llava-llama3 model was "alright" for images, but terrible for chat responses/Home Assistant. The model listed in the other guide from @maxi1134 finalend/llama-3.1-storm:8b-q8_0 was really good for the chat, almost like the current ChatGPT, but doesn't do any images at all.
So.... I was wondering what others using Ollama were experiencing and what models they had been testing out? Hopefully we can use this as a discussion for exploring locally hosted AI Ollama.
Beta Was this translation helpful? Give feedback.
All reactions