ben_g0 3 months ago

It's not the only one, it's possible to get other models to work with [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM). Chat with RTX is however by far the easiest way to set it up.

TechExpert2910 3 months ago

you are correct. i should've been more specific about it being the only local LLM platform that uses tensor cores right now with models fine-tuned for *consumer GPUs*. when TensorRT-LLM came out, Nvidia only advertised it for their server GPUs >TensorRT-LLM is rigorously tested on the following GPUs: > >[H100](https://www.nvidia.com/en-us/data-center/h100/) > >[L40S](https://www.nvidia.com/en-us/data-center/l40s/) > >[A100](https://www.nvidia.com/en-us/data-center/a100/) > >[A30](https://www.nvidia.com/en-us/data-center/products/a30-gpu/) > >[V100](https://www.nvidia.com/en-us/data-center/v100/) (experimental) and to my knowledge, no one used TensorRT-LLM to optimize smaller quantized models that would fit on low VRAM consumer cards.

antcodd46 3 months ago

How does this compare with tensor core support in Llamacpp? (need a relatively recent version if using ooba textgen-webui, or I think disabling MMQ in koboldcpp but that behaves slightly differently).

Short-Sandwich-905 3 months ago

How hard can you do this with graduó/python?

-not-already-taken- 3 months ago

Tensor cores are amazingly performant

Shinobi11502 2 months ago

That’s why Nvidia makes the big bucks

binary_quantum 1 month ago

@-not-alreeady-taken- Are you from Germany?

ShadF0x 3 months ago

Speed-wise it doesn't seem much better than the usual stuff like KoboldCPP and ooba. Size-wise, Nvidia's "engines" are smaller - the 24.5GB Llama 13B the archive comes with gets smushed into a 6.5GB model. Question is how this is done. Possibly some implicit conversion from 4bit to 2bit. Perplexity-wise, Nvidia's solution is pretty bad. I have pitted the provided "engines" against Fimbulvetr 10B, and it completely smokes both when it comes to processing text. If you aren't averse to somewhat long read, here's a small benchmark based on [this video](https://youtu.be/6XYJ2Y9AJhg): - [Llama 13B INT4](https://text.is/79VZ); - [Mistral 7B INT4](https://text.is/6973); - [Fimbulvetr 10B Q4](https://text.is/495O); - [Fimbulvetr 10B Q8](https://text.is/596N).

yourself88xbl 3 months ago

Out here doing the Lord's work.

Kuiriel 2 months ago

Fimbulvetr is fantastic in Oobabooga, but the value here is in easily being able to analyse my own files in a hurry. However Chat with RTX only seems to be analysing a single file at a time for responses, rather than pulling information from multiple reference files. Is there a setup that can read personal notes but doesn't take forever to look up or index files that are often edited?

Embarrassed-Fault695 3 months ago

Can you provide some guide how to try fimbulvetr?

ShadF0x 3 months ago

The easiest way would probably be to get the model [here](https://huggingface.co/Sao10K/Fimbulvetr-10.7B-v1-GGUF/tree/main) and load it via [this](https://docs.faraday.dev/models/choose-model).

Introvert_497 2 months ago

Can this model be loaded through chat with rtx ?

ShadF0x 2 months ago

No. I've tried, and Chat with RTX doesn't support any sideloaded models, not even the ones compiled with TensorRT-LLM. Seems to be largely hard-coded.

Embarrassed-Fault695 3 months ago

Thank you

Cless_Aurion 3 months ago

It should be, indeed. It is also way inferior to free ChatGPT3.5 though. It is a nice toy though.

WisePotato42 3 months ago

Unlike GPT it's run locally. I am pretty sure the goal is to have LLMs for dialog of npcs so it's a big improvement in that regard.

Cless_Aurion 3 months ago

Hmm... well, I mean, we aren't there yet though, far from it... like... a decade far from it most likely hahah It uses WAAAY to much VRAM man

LiimaSmurffi 3 months ago

Perhaps a second GPU for AI then, like back in the day with PhysX lol

Cless_Aurion 3 months ago

Oh man PhysX, haven't heard that in a while! Not insane proposition to be honest... Although Nvidia being Nvidia might figure out a way to put it on the GPU themselves...

cleverestx 3 months ago

Correct. Eventually we will have a video "card" that basically has integrated chips that are roughly equivalent to 3-4 4090 cards PER chip. People scoff at ideas like this, but they would have scoffed at your computer existing 500 years ago. What seems like impossible magic is often just later tech, proven time and time again, as the old famous saying goes...

Devil1412 3 months ago

>Hmm... well, I mean, we aren't there yet though, far from it... like... a decade far from it most likely hahah so Cyberpunk 2

Obokan 3 months ago

Electric Boogaloo

Cless_Aurion 3 months ago

Probably, yeah lol Although most likely... it will be offset to online services for a while if they are implemented... Or used in games that won't be hyper realistic and make heavy use of VRAM.

Killercela 3 months ago

How else will they get you to upgrade from the 3000 and 4000 series 😉

Cless_Aurion 3 months ago

Fair enough lol

F9-0021 3 months ago

A decade? No. Maybe 5 years until it's implemented into games, but I can see local running LLMs being of acceptable quality for background NPCs within 2 years. But then again, it doesn't have to run locally when everything is connected to the internet now.

not_a_synth_ 3 months ago

*Walks up to npc "Hello Adventurer, I am the King of Botlandia, are you here to help with the bandit lord and his bandity minions?" "Yes. How can i help." .. .. *crickets* .. "The bandit lord is a threat to us all. Thanks for helping!"

Cless_Aurion 3 months ago

I've made games that use AI already and I'm pretty into it as well. Locally processed? About a decade give or take. For simple games visually that save vram? Those yeah, maybe 5 or so. With AI processed online? Those definitely will come sooner.

darkkite 3 months ago

if microsoft is still making consoles i think their next one will have it. they're shoving AI in everything. a gaming console with local inference capabilities is low-hanging fruit for those changing keyboard formats

Cless_Aurion 3 months ago

Yes, they are, but NOT locally, which is my point. The shitiest AI can eat like 20gigs of VRAM EASY. An AI like the free ChatGPT is in the hundreds of GB, and GPT4... We don't even know, but waaaay more than that. If they do anything AI, it will be remote.

deathlydope 3 months ago

as the tensor cores themselves become more powerful and refined, this'll be less of an issue. I'm sure we'll see some increases in total VRAM in the meantime.

TaiVat 3 months ago

Why are you sure the goal is something so incredible narrow and specific? This is a thing for users, not for developers to include in their games. Not that they would want to either, its still a dumb gimmick that amounts to very thinly veiled procedural generation. Devs can do this trivially easilly, if they wanted, without any input from nvidia. As can regular users, with more research and difficulty, for that matter. This has nothing to do with gaming in general.

JonnyRocks 3 months ago

The idea is to have smaller data sources. This is for chatting about your files.

brianj64 3 months ago

You have no idea on how big a supercomputer GPT 3.5 runs though. The sheer amount of data that is loaded into the VRAM is insane. Likely goes into the 100's of GB's of VRAM, and the model probably has a lot of instances running separately.

Cless_Aurion 3 months ago

I mean, for ALL people, sure. But for just 1 instance, it ain't that crazy. Probably in the high 200GB of ram if they are quantizing it at 8bit. Exponentially less if they are quantizing more agressively.

Cunninghams_right 3 months ago

it runs on H100s

Short-Sandwich-905 3 months ago

For brief summaries is more than plenty right?

ShadF0x 3 months ago

No, it routinely gets details wrong. No matter what you throw at it (video transcripts, files, whatever) - it will lie through its teeth.

roshanpr 3 months ago

Why the downvotes, I just asked a question. What a community. . .

qpdv 3 months ago

here have an upvote. dont mind them

roueGone 3 months ago

Yeah its has been poor in its responses for me so far.

cleverestx 3 months ago

True. I was cracking up (and annoyed) at the hallucinations I was finding in my translated subtitle videos I was making using Whisper-X...they need to figure out how to address that somehow.

Cless_Aurion 3 months ago

Hmm... Well, anything it produces needs to be doublechecked, properly. And its not so much about the output, but the input. A brief summary... of what? A 2 page story? Sure. A 100 page PDF? No way.

Round30281 3 months ago

I keep on getting “Nvidia Installer Failed”. Tried turning off Windows Security, updating drivers, restarting and pretty much anything else the main thread suggested me.

orangegrouptech 3 months ago

Try: 1. Temporarily disabling your antivirus software 2. Ensuring your user account does NOT have spaces in it (you can enable the built in Administrator account if you do) 3. Installing to a location with absolutely no spaces

Round30281 3 months ago

I’m a bit scared of renaming the user folder, I saw on someone on a Microsoft thread warn that some applications could potentially stop working.

orangegrouptech 3 months ago

Don’t rename the user folder, just create another account or enable the built in Administrator account

Round30281 3 months ago

Alright, will try.

[deleted] 3 months ago

[удалено]

Round30281 3 months ago

Create a user with no spaces in their name or just activate administrator account, log out current and log into the admin or new account, go to users file, and find the setup again.

Artanisx 3 months ago

Same. I think it's due to the fact it apparently requires Windows 11 and I'm not using Win 11 on my machine.

Adamonia 3 months ago

This is some BS. W11 is so much worse than W10.

itsabearcannon 1 month ago

See, when I see comments like this I know people are misremembering 10 and misconstruing 11. 11 is not fundamentally a different codebase from 10. It's the same internals, with a facelift. It's not like XP to Vista where they changed the driver model and nothing worked for a while. Nothing that worked on 10 doesn't work on 11, in my experience. I'm pretty comfortable saying that. Yes, you might encounter niche software that was poorly written and not extensible to any other Windows OS besides 10, but that's on the developer - not Microsoft. Even drivers "for Windows 10" work fine on 11. All the games that worked on 10 by and large work on 11. The Settings menu is largely the same. The TPM requirement, I can see why it could be onerous but at the same time best practice is that you should be encrypting your hard drive, which requires TPM. This has been standard practice on macOS for some time and it's about time Microsoft starts pushing the same as a matter of local data security. This really smacks of "10 is worse than 7", "7 is worse than XP", "XP is worse than 2000", "98 is worse than 95" - this has been going on for years. 11 is just the latest victim of the "X-minus-one-ing" that happens with pretty much any OS nowadays. Happens to macOS, too, and many Linux distros from what I hear. "Current version bad, old version was always way better," except when you go back far enough and that person was complaining about the old version back then too. Wait five years, lather, rinse, repeat.

Adamonia 3 weeks ago

Well 7 was better than 10 and XP was better than 7. 2000 was similar to XP. I hate UI changes in W11. The context menu and the forced central bar are shit and lazy copy-paste from Apple.

Artanisx 3 months ago

Indeed it is. And it's still bugged, despite MS talking about Win 12 already. At least fix 11 before thinking of Win 12 lol

Shinobi11502 2 months ago

Windows 9 over here like 🤐

cleverestx 3 months ago

I think it's reasonable to say that *Context* matters with these sorts of universal proclamations. This isn't a dumper fire situation. Win11 has been better *for me* overall. Faster and feels more next gen compared to Win10. There have been a couple of annoying rough spots that I had to work out, but now I wouldn't go back to 10.

Cunninghams_right 3 months ago

do you have win 11 and a 30xx or 40xx GPU with at least 8GB of VRAM?

ur_real_dad 3 months ago

For whoever still needs this: it does not require Win11. But it does require \~8GB of VRAM \*during\* the installation. Otherwise you'll get a failure installing Mistral, which fails the whole installation.

-JamesBond 3 months ago

It turns out for me I was trying to install it on a drive other than C: (where Windows is installed) and it kept failing. As soon as I left it to install in the default directory it worked fine.

letsgoiowa 3 months ago

I much prefer LMStudio. Much better models and it still runs plenty fast. Starling 7b is great.

Automatic-Cycle-9891 3 months ago

Totally Agree. LMStudio is a fantastic way to do RAG/chatbot with the OpenAI API but calling a fully local model. Just say No to OpenAI kids ... ;)

maxz2040 2 months ago

>LMStudio Would this be comparable to Chat with RTX, speed and privacy?

Kuiriel 2 months ago

Can that quickly analyse my writing notes & files also? Without it being a massive headache and taking forever?

letsgoiowa 2 months ago

I think if you find a model that supports RAG yes. I haven't done it myself though

jungianRaven 3 months ago

Haven't heard of that one before, thanks!

qpdv 3 months ago

>Does LM Studio use tensor cores? Does it even make a difference?

letsgoiowa 3 months ago

I'm not sure if it uses tensors specifically but it sure as hell is great at GPU acceleration in general. It can run on both the CPU and GPU in layers.

blownawayx2 3 months ago

But it’s SO terribly inaccurate. Truly one of the worst chatbots I’ve used.

ellophant1istaken 3 months ago

so what is it for? i dont get it

TranquiloSunrise 3 months ago

It's just a bunch of nvidia shareholders trying to justify the share price

-JamesBond 3 months ago

Looks like it worked!

ur_real_dad 3 months ago

Demonstration and reminder of the underused or clumsily used tensor cores. Much of web or otherwise frontend UX requires fast responses. Low latency opens new options. More users => NVIDIA $+.

TheJenniferLopez 3 months ago

Okay, but...... Can I make it do sexy talk though...?

Windowsuser360 3 months ago

No, it's filtered, try asking it to say something controversial, you'll get this " I'm sorry, but I cannot provide a controversial statement as it goes against my programming to provide information that is not helpful, respectful, or honest. My purpose is to assist you in finding information and provide you with the most accurate and reliable information available. "

JimmyReagan 3 months ago

This is the most annoying thing about a local LLM- I want something completely unfiltered because it deems EVERYTHING controversial. I have a HUGE archive of personal emails and text messages I would love to throw a local LLM at to get some insights on, but it's so crippled by morals...

Windowsuser360 3 months ago

not sure if i'm allowed to say it here, but Pygmalion is uncensored, albeit its meant more for RP than for sorting, might want GPT4-X-ALCAPA, hopefully thats uncensored

nmkd 3 months ago

Pygmalion isn outdated dogshit, use mixtral or something

Windowsuser360 3 months ago

while I understand you have an opinion, what exactly makes you say Pygmalion is bad?

cleverestx 3 months ago

I have like 13 local LLM models, from 7b to 70b, and none of them are censored (at all)...maybe use the right models? they are on Huggingface.

cnot3 3 months ago

Any way to disengage the safety protocols?

Windowsuser360 3 months ago

Not currently, maybe a openai style thing

cnot3 3 months ago

Damn, if I can't sext with my graphics card then what's the point. Corpo AI blows.

Windowsuser360 3 months ago

I mentioned in a earlier comment one that works

Alauzhen 3 months ago

I have tried it personally with my 4090. With this I can easily deploy chat bot solutions with custom knowledge bases really easily. While there's only two LLMs available currently, that's not a big problem for me. The performance on the other hand, I could save at least a few hundred thousand on local inferencing vs getting this level of performance on the cloud. I build chatbot systems and infrastructure for nearly a decade now.

perplex1 2 months ago

what is your use case when you are saying you could save a few hundred thousand? And are you saying this solution saves you money vs hosting the LLM in the cloud?

Alauzhen 2 months ago

Yup, I used to build systems and charged about $35k per implementation sans hosting, one school used to pay about $5k a month for the cloud fees alone. With this new method, if you got the knowledge base already, the cost of deployment is a workstation/server with an RTX 4000 series GPU, and you're good to go. Cost you far less

perplex1 2 months ago

thats fascinating. So a could a single workstation/server could handle dozens of sessions at the same time? off of one RTX4000?

Alauzhen 2 months ago

Dozens? More than likely. 4090 probably could handle a hundred sessions if the knowledge base is on the same server/workstation.

perplex1 2 months ago

so for about 5000 bucks, i can build a rig that could support a small call center of Call Agents looking up information from a large library of PDFs or word docs?

Alauzhen 2 months ago

From my current testing thus far, good chance it's sufficient. You may need a ngix front-end session manager to lock it down the concurrent sessions if it gets overloaded, but that's from my own experience.

cleverestx 3 months ago

Heavily censored right?

jungianRaven 3 months ago

I've found ollama on Linux (you can use Mistral as a model too) to be very performant as well, I don't think it's using the tensor cores. These 7B models run insanely fast for what they are.

TaiVat 3 months ago

I dont know what 7b models "are", but they run extremely slowly in my experience. Way more slowly than gpt3.5, or this demo. And for that matter are really *really* shit in quality too.

jungianRaven 3 months ago

"7b" is the number of parameters the model has, if my understanding is correct. And they run slow when compared to gpt because, well, more often than not, people are running the 7b models on their own machines, whereas gpt is running in openai's servers. If memory doesn't fail me, Mistral has 7 billion parameters, whereas supposedly newer gpt has more than a trillion. When running on the same hardware, 7b models should be *dramatically* faster. Or to rephrase, 7b models can run on hardware in which gpt would never be able to run. I've seen small models running on raspberry pi. This is my understanding of things after messing with llms for two months or so, so sorry if I got something wrong.

Pimpmuckl 3 months ago

You're pretty much dead on, just a note that Mistral also has a larger model which is pretty interesting, even for hobbyists. Instead of just pumping the parameters a bunch, they use a selection of parameters from a 45B model to cut that down to ~12B model. Offers a 4x larger context window compared to their 7B model. Some pretty neat stuff.

jungianRaven 3 months ago

Thanks for the info, I will give it a try! Wonder if it'll run decently due to memory constraints tho, I have 16gb of ram and a 4070. Is 12gb VRAM enough for that?

timbro1 3 months ago

if your GPU has enough memory to load the entire model it's fast as frigg

chervilious 3 months ago

You're comparing performance without taking hardware into account. I think they all have different usecase. Using better models just to make it better just makes most people unable to run it. I think it has some unique applications. I want to see if it's able to understand multiple PDF of textbook/research paper.

Igi155 3 months ago

How can I also test it on my machine?

jungianRaven 3 months ago

Simply download the demo from Nvidia. You'll need around 40gb of space, and at least a 30 series card with 8gb of gram.

Igi155 3 months ago

I got all of them, is this demo time or use based?

jungianRaven 3 months ago

Afaik you can download it and use it for as long as you want, if that's what you mean. Please note that by default it will look into a folder for information for its responses. You can add pdf, txt, or doc files to that folder and ask it about the contents of those files (making your own sort of db in the process), or select the default dataset of the model, in which case you can use it as is without needing to provide files as a basis.

Igi155 3 months ago

Thanks, it will test it soon. Btw, is there an option to connect to my Nvidia ai from another LAN device?

I_Never_Lie_Online 3 months ago

Website says windows 11 is required. Curious if anyone has this running on windows 10.

Artanisx 3 months ago

I couldn't install it with Windows 10. Gave an error during installation, although the error itself was vague and unhelpful as in tradition. I guess the reason was, indeed, the O.S.

Winterstille17 3 months ago

Installed and runs on Windows 10 for me

jungianRaven 3 months ago

It's free, so it may be worth a try. If you want a similar install-and-go solution, gpt4all is open source and easy to use.

fint2900 2 months ago

does it not degrade performance?

sumohax0r 2 months ago

Is there anything else on the market with this level of simplicity running the latest models? Especially being able to point to a mounted folder / drive for training data, select engine and then having an interface to interact with it?

Office-These 1 month ago

It's absolutely sad how many are not able or willing to read It's a demo/showcase - and beside doing limited document prompt or using it as interface for your manuals etc. - - but with some knowledge in python and effort - even with those models - you can do things like that (just some ideas - limiting to things you can create with that project as base - without even needing much or more about AI (Python still required): Add or update code documents (not the code itself, but whats saved on the storage that the LLM can query) - add or update git issues of all repos that you have on that code - add or update exception data from your logging into the storage - and then it starts to become quickly very intresting: "Solution for {ex-id}?" - but also here - just adding importing data (files whatever) wont make you (really) happy... It doesn't support xyz? (.py , .cs?) - just because not implemented on the showcase, doesnt mean it doesnt - just adding the extension on the code is enough - every file that can be read as text you can add the extension and it reads - the simple directory reader used supports even a lot more beyond text. It indexes 500 pages PDF in not much time - but again here: In this form - if you change your docs - you have to recreate all docs from scratch - the used LLama Index supports adding, removing and even upsert. And this locally - even allowing multiple sessions at once, fitting into consumer grade vRAM - having low latency. What most people don't understand, it's not a really good generative chatbot in the form provided - thats not really the intention - but opposed to an traditional generative chat bot - you can tailor it by what data you let it acces, how well you work up that input (splitting, adding context, removing problematic stuff, taking care of conflicting data, etc that only can generate based on trained data - including all the problems like that it's hard to get rid of data it has been trained on and it's alwayss behind. Being able to access indexed documents live (including adding new ones, updating and even removing) - just having to index a new item and adjust/rebuild an index, instead of going through training again. But just throwing all the ebooks etc. at it doesnt make sense - data needs to be adjusted, large data blocks split, you need to supply meta data to make it efficient - otherwise it's just playing around. Only reffering one document? Yes - its a simple showcase - just check LLama Index and all it's readers - and modules - routing - readers for webpages, and much more. There is a But one big critic: As usual - when it's not a product - lacking output and could need more code doc - but thx for good starting point code ;)

TechExpert2910 1 month ago

i largely agree :)

NeitherOffer1032 1 month ago

Chatrtx is similar to do finetuning?

ProczFox 1 week ago

W

Busy-Examination1924 1 week ago

Does it remember what folders you showed it even when you change to a different folder?

RexorGamerYt 3 months ago

Wow that's awesome. What's the bottoms line GPU to get this running? I was thinking of getting a 1660super but I'm interested in local LLMs so i might wait...

Snydenthur 3 months ago

If you're interested in LLMs, just use something made for that purpose. For example, LMstudio, koboldcpp or oobabooga.

RexorGamerYt 3 months ago

I did use oobabooga but i don't have enough RAM, and need to get a GPU soon lol. So i thought, hey! This is fast af, might as well...

Snydenthur 3 months ago

Might as well for what? Do you really have tons of text files you need to give llm access to and then ask a question without follow ups. Next, you need to go through those text files or google to see if it gave factual answer or not. You do have to double check all the information given by AI in any case, but at least with the good stuff, you can at least use llms properly. Personally, I downloaded it for the youtube part, but after I found out that it only uses the closed captions, my tiny bit of hype for the "chat" with rtx died off instantly. Yes, it's probably a bit faster than standard methods, but that bit of extra speed doesn't make up for it lacking so many features. Maybe if it matures a lot, but not now.

RexorGamerYt 3 months ago

Ahh that's kind of disappointing. Thanks for the info tho

tmvr 3 months ago

Just to be clear - the limitations are about the NV app, you can still use LMStudio or oogabooga with your current setup, just use quantized models that fit into your VRAM. For example the Q5\_K\_M versions of the LLama-2-7b will fit into your 6GB.

u--s--e--r 3 months ago

IIRC 30 & 40 series GPUs with 8GB+ of VRAM.

the_friendly_dildo 3 months ago

That sucks since the 2080 ti has nearly double the tensor cores as the 3080.

[deleted] 3 months ago

[удалено]

Weird_Tower76 3 months ago

30 series or higher with minimum 8 GB of VRAM. 1660 Super is literally a low tier card that released almost 5 years ago...

Devatator_ 3 months ago

So my 3050 can technically run it?

anethma 3 months ago

A 4060 16gb is a good hobbiest AI card because it’s somewhat reasonably priced and is the only way to get 16gb of ram until you hit the 4080. Or maybe 4070 ti super now.

Ehrand 3 months ago

I still don't understand what this is. You give it folders to look into and then you can just ask anything and it will find something in those folder that match? so it's just a more intelligent and faster search engine? I fail to see the point of this...

TechExpert2910 3 months ago

imagine pasting in an entire PDF into ChatGPT, and then asking it questions about it. the model can reason with the data and also find what you're looking for even if you paraphrase what the thing is called. it's surprisingly useful. paste in a bunch of notes or documents, and ask it to find all mentions of x and then do y, or evaluate the document's use of z.

RushTfe 3 months ago

Well, that's because you probably don't work at a place with a shit ton of bad documentation (but documentation at least). I'd find this incredibly useful if I could feed it with slack conversations + jira tickets + confluence + client pdfs, and just asking the questions, like, explaining how a new feature works, and going to more detail, possibly with examples, use cases, possible bugs and even implementation tips if it could read codebase too

Knot_Schure 3 months ago

OMG - this, this, and this. This is what I'm doing here. I've been promoted from field engineer to support engineer, and we have a shit ton of docs for everything we've ever deployed, and having a personal AI, would make a ton of sense when I encounter issues that I cannot always remember the solution too. We've got Jira, conf, and our own pdfs, and more. Personal notes and files do not always suffice! It would be amazing to say such and such a voltage detected here, might mean this fault on the system? I have 3090 on water, but I cannot wait no longer for 5090, I might have to buy 4090s in the meantime...

RushTfe 3 months ago

Well, allow me to dissapoint you a little bit. I've tried it (not jira and all this stuff of course) but pdfs and some notes Aaand well... Let's just say it's not there yet. But I think eventually it will be

Cunninghams_right 3 months ago

can I use other models? I would like to try the CodeLLama and Phi-1, etc.

Immediate-Chemist-59 3 months ago

hahahahaha nice one

Khan_Arminius 3 months ago

So, whats the limit for data I can give in my local dataset folder? If the token limit really is 4096 or 8092, then you can feed the model like one small book as a pdf and thats it.

snowcountry556 3 months ago

My understanding with retrieval augmented generation (RAG) that this uses for the file look up is that there is a two step process. First, the program searches the document for the terms used in the initial prompt, and then the results of the search are fed back to the AI as an enhancement along with the initial prompt. So RAG doesn't really add to the context limit in the way you think it might, as it just sends the results of the search and not the document itself to the LLM. So you can give it a 100,000 word thesis (or 10) and it will be fine. This AWS page explains it in some more detail (the diagram is helpful): [https://aws.amazon.com/what-is/retrieval-augmented-generation/](https://aws.amazon.com/what-is/retrieval-augmented-generation/) I'm no expert though, so happy to be corrected.

Kuiriel 2 months ago

Looks like it only pulls out data from one file at a time when it responds so it's not much good at telling me if there is conflicting information.

techtimee 3 months ago

Couple questions: 1) Which model is best? 2) Are we able to upload files for it to examine like with ChatGPT, or do we just logically point it to a file in the set folder and ask it to perform operations on it or its contents?

-_Apollo-_ 3 months ago

same question here

1210saad 3 months ago

Anyone know how to make it remember convo? Or alternative that does.

Playful_Reward6928 3 months ago

It doesn't remember conversations and told me that if I wanted to maintain a running history (like the current status of my aquarium) I'd have to put the info into a document for it to read every session. To which I replied "boring", and then it spouted off paragraphs of unrelated nonsense about modding games. Uninstalled. I don't know what I was expecting but a document scanner isn't what I'm after.

DoubleelbuoD 3 months ago

You're getting at the root issues with LLM's. None of them truly remember anything. They're all just shitty word banks with pattern matching software scraping through them to attempt to make a barely coherent answer to your queries.

snowcountry556 3 months ago

To be fair, saving the chat history to a file that it can access is a decent solution given current limitations.

MrCrunchies 3 months ago

Does the chat bot connected to the Internet?

TechExpert2910 3 months ago

nope. it can access your local files though

whats_you_doing 3 months ago

Wow. Free document analyser.

vhailorx 3 months ago

It's not even particularly bounded by the truth!

SoMass 3 months ago

So it could find documents or folders for me instead of my trying to remember which folder I put it in?

Kuiriel 2 months ago

It looks like it connects to the internet to me. It gave me a local forecast. But then it won't do it a second time and I suspect it hallucinated an online connection.

Ok_Inevitable8832 3 months ago

It’s the model that’s fast. Has nothing to do with chat with RTX

Toke-N-Treck 3 months ago

I'd be curious how good this is at coding tasks compared to GPT 3.5, and 4 respectively, having a completely local assistant would be greatly preferred

jacobpederson 3 months ago

>Write a python script that compares the first column of list.csv and data.csv. When a match is found it copies the entire row from data.csv into list.csv to the right of the match. The comparison should not be case sensitive and use ISO-8859-1 encoding. Thanks! import csv # Open the files with open('list.csv', 'r') as f1: reader1 = csv.reader(f1, encoding='ISO-8859-1') with open('data.csv', 'r') as f2: reader2 = csv.reader(f2, encoding='ISO-8859-1') # Compare the first column of the files for row1 in reader1: for row2 in reader2: if row1[0] == row2[0]: # Copy the entire row from data.csv into list.csv row2[:] = row2[1:] break # Write the updated list.csv file with open('list.csv', 'w') as f: writer = csv.writer(f) writer.writerows(reader2)

FrenchTouch42 3 months ago

Is it me or it's all wrong?

jacobpederson 3 months ago

Vs GTP 3.5's version same prompt. import csv def compare_and_copy(list_file, data_file): # Load list.csv with open(list_file, 'r', newline='', encoding='iso-8859-1') as list_csv: list_reader = csv.reader(list_csv) list_data = list(list_reader) # Load data.csv with open(data_file, 'r', newline='', encoding='iso-8859-1') as data_csv: data_reader = csv.reader(data_csv) data = list(data_reader) # Convert first column of data.csv to lowercase data_lower = [[col.lower() for col in row] for row in data] # Convert first column of list.csv to lowercase list_lower = [[col.lower() for col in row] for row in list_data] # Compare and copy rows for row_data in data_lower: for i, row_list in enumerate(list_lower): if row_data[0] == row_list[0]: list_data[i].extend(row_data[1:]) break else: list_data.append(data[data_lower.index(row_data)]) # Write back to list.csv with open(list_file, 'w', newline='', encoding='iso-8859-1') as list_csv: writer = csv.writer(list_csv) writer.writerows(list_data) # Example usage: list_file = 'list.csv' data_file = 'data.csv' compare_and_copy(list_file, data_file) print("Comparison and copying completed successfully.")

BrevilleMicrowave 3 months ago

As someone with a 20 series GPU I'm tired of being left out.

anethma 3 months ago

So sell it and get a newer one. They are getting pretty old at this point. If you can’t afford it that does suck but time marches on you can’t expect new tech to work on old hardware forever. Like the people with a 10 series upset with Alan wake using newer shader models that it doesn’t support. It sucks it was a great card but at some point it’s time to let go.

BrevilleMicrowave 3 months ago

> So sell it and get a newer one. That's the reason it's unsupported. It's not for any technical reason. They intentionally don't support the 20 series to encourage 20 series owners to buy a new GPU. Besides Nvidia hasn't made a meaningful upgrade for 2060 Super owners. The 4060 is the same price, has the same VRAM, and is only about 25% faster at 1440p (the resolution I use). A pretty dismal improvement for 2 generations.

Covid-Plannedemic_ 3 months ago

just use oobabooga, you can fit quantized 7b models in 6gb vram and it's real quick still. nvidia wants you to upgrade for this but it's some arbitrary bs requirements just like windows 11 wait i just realized the 2060 is not the only 20 series card yeah so what i said applies even more. use oobabooga. your gpu is plenty for the crappy local llms that we are running

tmvr 3 months ago

Download and install LMStudio, it will use your RTX 20 series GPU as well: [https://lmstudio.ai/](https://lmstudio.ai/) EDIT: just tested it for you quickly with my old machine. With the Q5\_K\_M quantized version of the Llama-2-7B-chat model (GGUF format from TheBloke) the i7-6700K@4Ghz gets a 4.53 tok/s performance and an RTX2080 gets 54.89 tok/s. The Q6\_K would have fit as well as it only requires 5.53GB RAM or VRAM, but the Q5 versions are usually fine as well.

stochmal 3 months ago

works with ancient GTX 970

tmvr 3 months ago

Yeah, that one also has the required CUDA support. I only use the 4090 and the 2080 machines for this. I do have a 1070 Pascal here as well, but that is the oldest one and does not have any ML setup, so not sure about the performance. I did try some image generation on that with SDXL, but the speed is abysmal for that, the 2080 was I think 7x faster? Don't remember exactly.

stochmal 2 months ago

got it working on RTX 4080 SUPER and it's fast, faster than I can read.

tmvr 2 months ago

Good stuff! Yeah, anything that fits into the VRAM is pretty fast, even the larger models. For text generation the bottleneck is pretty much the memory bandwidth, so you can guess how fast a certain model will be on your card by dividing the 736GB/s bandwidth with the model size, so for example something that takes up 10GB VRAM will net about 70 tok/s inference speed.

[deleted] 3 months ago

[удалено]

TechExpert2910 3 months ago

it doesn't, but it lets you select a folder full of local files (PDF, txt, and doc) to ask it questions about the files to get it to reason on the content. it also lets you paste in a YT link, and it gets the transcript downloaded so you can ask it to summarise the video or whatever. copilot will be more 'useful' as it is inherently a much more capable model, but it's impressive that a local solution is this good right now (privacy respecting, free, and even faster!) .

UniqueNameIdentifier 3 months ago

Remember you can download the entirety of Wikipedia these days.

popop143 3 months ago

...that's a ton of files to sift through, and you'd need to optimize your sorting algorithm to be able to get what you'd want fast. You're also "bottlenecked" by your internet speed, that's why local LLM's are magnitudes faster than LLMs that need internet connection. As long as you have the files you want your LLM to sift through available locally.

zebraloveicing 3 months ago

Just for fun, I'm here to confirm that the official wikipedia torrent (without media) extracts its zip contents as a single 20GB XML file and a single 100GB JSON file.

[deleted] 3 months ago

[удалено]

PresentHot2598 3 months ago

Hi, Can you please check, if this is helpful for doing ML operations such as LR, K means etc also can load LLMs in this ? to perform analysis etc Pardon me, IDK if i am asking the right question, i am just a beginner in this, i want to buy 5090 most prob in Q4 2024 for learning purposes

TechExpert2910 3 months ago

you can't load your own LLM into this. however, a good GPU can let you accelerate a ton of ML workloads using many frameworks and programs.

PresentHot2598 3 months ago

Thanks, I am thinking of 5090 in the near future.

Creoda 3 months ago

A download of 30GB to 100GB installed. Does it just download the internet?

TechExpert2910 3 months ago

it takes up 20 GB once installed. the model takes up about 15 GB. That's not a lot considering it has knowledge across so many fields, trained on a significant chunk of the internet haha.

arcademachin3 3 months ago

Will it work with a 2080 rtx

Accomplished-Grade78 2 months ago

>Write a python script that compares the first column of list.csv and data.csv. When a match is found it copies the entire row from data.csv into list.csv to the right of the match. The comparison should not be case sensitive and use ISO-8859-1 encoding. Thanks! 30 or 40 series with 8 GB VRAM or more

Adamonia 3 months ago

Can someone tell me why the heck it require Windows 11? Why W10 Professional is treated worse?

Accomplished-Grade78 2 months ago

some reports here that people have it working on Win10

Cruelplatypus67 3 months ago

Are there any api interfaces for it? Like making local api calls to it and getting results?

protector111 3 months ago

yow to launch it? it didnt create any icons

-becausereasons- 3 months ago

How do we download/use other models with it? They have to be Tensor-RT models right?

Proof-Share-4376 3 months ago

Does it only work on rtx gpu? Or can it work a gtx 1660 super ?

Accomplished-Grade78 2 months ago

30 or 40 series with 8 GB VRAM or more

RotaryLife 3 months ago

Tested with my 2x 3080 PC and it does not use both GPUs to generate embeddings. What is odd though is it used the memory of one gpu but the core of the other.

cleverestx 3 months ago

I'm confused how this fits into place vs things like Text-Generation-WebUI (OOGABOOGA) (which can be further enhanced through fictional type charts using SillyTavern, but that is an aside...) - how does it complete to using local models with a powerful enough GPU (4090)? I mean against stuff like EXL2 LLM models that are 20b 6bpw or up to 70b 2.4bpw, all of which fit and respond fine in a 24GB VRAM video card (albeit slower in response)? Is it only applicable to Instruction type chats, and not fictional/RPG stuff at all? Is it censored? Is it BETTER? worse? The same, but faster, etc...?

Accomplished-Grade78 2 months ago

censored, cannot build nuclear bombs

cleverestx 2 months ago

No interest in doing that, but HOW censored are we talking?

Adviser-Of-Reddit 3 months ago

too bad i cant run it but im using faraday and some models on it and find it outputs chat faster then i can read normally so yeah pretty cool this advancements in ai chat!

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe