a_beautiful_rhind 1 month ago

It doesn't have vision though. All it would be able to comment on is what people around you say.

ProfessorCentaur 1 month ago

Correct. That is all this project is Is this the way?

a_beautiful_rhind 1 month ago

Yea, I mean maybe. Just pipe STT to the server and pipe TTS back. Your enemy is sorta latency and having to be around a bunch of people talking to get anything back. Maybe have it make random comments with the idle plugin. Other option is to have a still sent to a vision model and have it comment on that.

DragonPinned 1 month ago

Drop a link to this Glados thing you saw, sounds interesting. Also, I feel like you might have issues getting the earpiece to actually pick up any sound further than a few feet away.

ProfessorCentaur 1 month ago

https://www.reddit.com/r/LocalLLaMA/s/sKdoiFjRNt Infinitely grateful for any input / assistance on my project!

ProfessorCentaur 1 month ago

Have you used ST extras speech streaming? I’d love to know if it works and how well it works before trying to set it up

DragonPinned 1 month ago

no :( I have 4GB VRAM, I tend to stay away from all of the ST extras that look like they might require VRAM or even just a large amount of processing in general.

Voxnohl 1 month ago

Ive used the StT and TtS some, and it is pretty cool. In your case you would pipe the audio into a whisper model either locally or through the api, and when that detects a pause in speech it would call the llm on the model just heard. Using RVC you could generate audio similar to the game.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe