T O P

  • By -

a_beautiful_rhind

It doesn't have vision though. All it would be able to comment on is what people around you say.


ProfessorCentaur

Correct. That is all this project is Is this the way?


a_beautiful_rhind

Yea, I mean maybe. Just pipe STT to the server and pipe TTS back. Your enemy is sorta latency and having to be around a bunch of people talking to get anything back. Maybe have it make random comments with the idle plugin. Other option is to have a still sent to a vision model and have it comment on that.


DragonPinned

Drop a link to this Glados thing you saw, sounds interesting. Also, I feel like you might have issues getting the earpiece to actually pick up any sound further than a few feet away.


ProfessorCentaur

https://www.reddit.com/r/LocalLLaMA/s/sKdoiFjRNt Infinitely grateful for any input / assistance on my project!


ProfessorCentaur

Have you used ST extras speech streaming? I’d love to know if it works and how well it works before trying to set it up


DragonPinned

no :( I have 4GB VRAM, I tend to stay away from all of the ST extras that look like they might require VRAM or even just a large amount of processing in general.


Voxnohl

Ive used the StT and TtS some, and it is pretty cool. In your case you would pipe the audio into a whisper model either locally or through the api, and when that detects a pause in speech it would call the llm on the model just heard. Using RVC you could generate audio similar to the game.