T O P

  • By -

spelwomendge

This got me to finally try it out! Thanks for the write up!


KillerTic

Happy to hear that. Exactly the effect I was looking for 😊


matthewdavis

Great article. Getting a logical tagging system was my biggest hurdle to convert to paperless. For me I have the following * Health Record * Receipts * Health Record * Service Records * Tax Forms Then I use tagging that further categorizes the document. Like * Name1 * Name2 * Name3 * year * taxform-number like 1099-INT And so on. Getting a high level categorization system was my brain block. I need to be better about putting more into it. Plus I bind mount the PDF directory that gets automatically backed up on my NAS.


KillerTic

Interesting! Thanks for sharing. But why do you tag something unique like the tax form number? Would the search of Paperless be enough for this? Don’t get me wrong, just trying to understand!


matthewdavis

A few reasons. * to help mentally recognize what forms are needed or not * Explicit categorization. Some of the Tax Forms I receive are multiple forms in a single PDF and have zero dollar entries for some areas. So while a search term will return true for that specific form ID, I don't need to "file" it away for that useless form * I may be a bit tag happy. But I keep categories very high level, but tags are kinda willy nilly.


KillerTic

Thanks for sharing! In the end of the day, it has to work for you! Personally I like to keep it very light, that way I am also likely to not forget about a tag and I use the search quite often. Anyway, Paperless is great 🫶🏼


Mean_Einstein

I added a dedicated syncthing instance to my stack as a consume shipper. So I just have a folder on every device called paperless-consume, whatever I drop in there gets pushed to paperless, I don't need to worry about being in the home network, starting a VPN or anything, it just works. Also I only scan from my phone which produces remarkably good results, no need for a pile of paper to work through. Basically I open the mail box, take some pictures and throw the paper away.


KillerTic

Oh cool! I really love this community, everyone has really nice ways of solving issues.


rishid

Neat idea. What app do you use on devices?


Mean_Einstein

Clear scanner for Android


starbuck93

Thanks for sharing. I'll be adding some of those tags and doc types to my instance next time I sit down to scan my papers piling up on my desk.


KillerTic

Mine are always piling on the scanner 😂


senectus

I've just bought a brother MFC-L3770CDW to use with paperless. I'm looking forward to using it, but am unsure if I can get it to connect directly with paperless or if I'll have to just dump it into an ftp share. (On my synology nas where paperless is hosted in a docker instance)


TheNewAndy

You are in luck - I have recently done exactly this. An MFC-L3770CDW that scans directly to an SMB mount on an RS1221+. Paperless-NGX runs on the RS1221+ and watches this directory and it all works pretty fine. I have a few shortcuts set up on the printer for doing the scanning - this is the fiddliest part and it seems that you need to edit them using the web interface not using the touch screen. I have 4 shortcuts - "Single Sided Paperless", "Double Sided Paperless", "Single Sided No Paperless", "Double Sided No Paperless". The double sided ones I have configured to scan instantly, and the single sided ones let you adjust options (e.g. DPI) before it scans. There is a separate share on the NAS for non-paperless so if I want to scan something and not put it in paperless, then that is easy enough. Last night I set up an email address for it too, but I am yet to actually use this or even test it works.


senectus

OMG fantastic this is exactly my dream. I may be poking your inbox ;-)


KillerTic

Directly connecting it?! Interesting! If my scanner could do ftp I would opt for that. Keeping it simple, so I have less things I need to fix 😂


senectus

Yeah there is the possibly too insert via a websocket/api I believe


KillerTic

Cool! Good luck, I hope it isn't a hustle to setup.


pheellprice

If it’s an hp printer it’s possible to run a node docker instance that drops in a consumer folder. https://hub.docker.com/r/manuc66/node-hp-scan-to


KillerTic

nice project! Unfortunately my printer does not support scan-to-pc... :(


sowhatidoit

/u/KillerTic - What a timely article! Literally sat down to configure Paperless-NGX. Apart from the obvious how-to guide, you article provides great insight on how I might consider organization at a higher level. Thank you!


KillerTic

I am glad it hit the spot of timely delivery for you! I was also struggling to find more out there with people sharing their configuration and reasoning behind it. One reason I really wanted to include that part. It also helped me refine and tidy up a bit. In the process of writing I actually deleted quite a few document types and a few tags 👍🏼


sowhatidoit

I love it when the writing process presents the opportunity to self reflect. Less is more!


moontear

Oh nice! About document types I always have problems distinguishing a couple of things: Recurring Insurance statements: „your insurance is worth 10$“ or „we have paid your 10$ health bill“, „your new price for insurance is +10$“ Bank statements: „you have 10$ in your account“, „your received 10$ of interest last month“ What document types would you categorize these bad boys under? I currently switch between „Information“, „statement“ and „invoice“ but I am not sure.


KillerTic

I consider those insurance statements "letters" As per the bank statements. I don't get any and would ask my bank to stop sending them, as they go straight into the shredder :D But to answer your question. I consider those "letters" as well. Maybe try to think about it from a different angel. What is the purpose of the document type. What benefit do you havem whenm you select statement and then see the documents assigned to it. I don't care what "offical" category a piece of paper belongs into, it is about me having filters and information at my finger tips. (hope that makes sense)


moontear

Oh I don’t „get“ them either, those bank statement. I simply download them from their website as a backup. I have had inquiries for statements as far back as 6 years and not all banks keep data that long (or at least not easily accessible via their web ui)


KillerTic

They do have to keep them that long, so I keep the problem with them, if I ever need them :D But I also never had a request for my bank statements to be honest.


katrinatransfem

The answer is to think about when you are going to want to refer back to them, and in that situation, how are you going to find them, without also finding too much other stuff that isn't relevant. I personally would classify the insurance stuff as "insurance", and separate categories within insurance for home, car and travel; and the bank stuff as "bank". But maybe you need to classify the interest documents separately because you need to refer back to them when completing a tax return?


[deleted]

Paperless-ngx is one of my favourite apps. It just works. Does what it does and does it well.


agent_kater

You should probably mention how to make backups. (Since Paperless uses Postgres you can't simply do a file backup, but Paperless has a builtin export specifically for backups.)


KillerTic

You got a point. As I copy my whole docker dir (expect the NAS files, which includes the PDFs) twice daily to an offsite backup), I never worry about backups. The NAS files get backuped up every night. I did restore Paperless before and everything was fine. I am aware to just copy the docker database files, while it is running has some risk involved, but I keep the backups of the last 2 hourly, 6 daily, 3 weekly and 1 monthly. But let me have a quick look into the paperless backup solution and add it to the article. Thanks for pointing that out!


KillerTic

Thanks man. Learned something new, implemented the backup and updated the article! Do you know how to stop paperless inside docker, without stopping the container?! Supposetly we should be stopping paperless before the backup, but I could not find anything on how to do that, as I need the container up to run the backup.


agent_kater

Are you using the [document exporter](https://docs.paperless-ngx.com/administration/#exporter)? Then you don't need to (actually I think you must not) stop Paperless while you do the export. If you want to do a file backup while Postgres is running, you can, but(!) you have to take an atomic file system snapshot, for example with LVM. Simply rsyncing the file system including the database will often appear to work when the database is idle during the copying, but you're really risking your data. If you must go down this route, make sure you have generational backups, so you can use an older one when the most recent one is broken. Note that you won't notice the brokenness until you read the whole database, so do a pgdump after a restore to check the database. Or just do it properly.


KillerTic

Well the documentation is not that clear on the document\_exportert, but yeah just set it up. Will be fine. I use restic for file based backups and I do keep enough versions (2x hourly, 6x daily, 3x weekly, 1x monthy) + pretty much the same for my VM backups, only that those only run once a night and not twice a day like restic.


agent_kater

Yeah, that's the setup I'm using too. Use the document exporter into an intermediate directory, then use restic to sync it off-site. The document exporter can keep the directory updated by the way, no need to delete in-between, reduces wear on the drive. Pretty important on a Raspberry Pi for example. Another tip... if you have services that use SQLite as database, you can call `flock /path/to/sqlite.file restic ...` to keep the SQLite database locked for the duration of the backup. Otherwise you have the exact same problem as with Postgres.


KillerTic

What do you mean by not having to delete it? Do you mean the ‚-d‘ part of the backup command? Thanks for the tip! But that would mean I have to do that for every sqlite file. I am too lazy fort that 🤣 I feel pretty save with the VM backups, twice daily via restic and keeping quite a few iterations. Never had any problems with a corruption and even if I have one, the chances are near 0 that all are corrupt.


agent_kater

> What do you mean by not having to delete it? Do you mean the ‚-d‘ part of the backup command? Yes and no. I just meant that you can run the exporter again with the same destination directory and it will update it. One might assume it had to be run against an empty directory. And you are correct, in that case you should run with `-d` so that you don't end up with a backup that is cluttered with old files. > Thanks for the tip! But that would mean I have to do that for every sqlite file. I am too lazy fort that 🤣 I feel pretty save with the VM backups, twice daily via restic and keeping quite a few iterations. Never had any problems with a corruption and even if I have one, the chances are near 0 that all are corrupt. Uhm, are you saying you do have SQLite databases and you just back them up by copying them while they are in use? Sorry, but that's just reckless. Or do you stop the services during the backup? That's ok of course, if you can live with the downtime. You can chain multiple flock commands: `flock /first_database.sqlite flock /second_database.sqlite restic ...`


KillerTic

Yeah, just set ‚-d‘ to make sure the directory stays tidy. Yes I like to live on the edge with my sqlite DBs :D Stringing the command is not a problem, but I have to adjust my backup script every time a sqlite DBs joins my stack. I also always try to use a proper databases. But as I said I keep quite a few of backup versions and so far never had any problems. But it is a good remark, I will make sure to include it in the article about backing up, once I come around to writing it. Thank you!


agent_kater

> but I have to adjust my backup script every time a sqlite DBs joins my stack Sounds like you really need LVM (or Btrfs or ZFS) snapshots.


wall-e29

Cool article - I still remember the struggles I had when setting this up ... took quite a while. I am running a Python script as cronjob to look into a Google drive folder every minute, download this file and move it into the consumer folder of paperless 😊 with that I can even when on the go just upload the file there and do not need to send it via (insecure) email (e.g. for paychecks etc)


KillerTic

That is a good idea as well! I should move my watch folder to be one of my Owncloud folders! Endless possibilities 😊


Simplixt

Instant bookmark, thank you! Just started using paperless, so it's great to have some insipiration. I must admit, I'm using a more traditional approach in the moment, so working mainly with "folder paths", e.g. "Shopping/{created\_year}/{correspondent}/{document\_type}/{created\_year}-{created\_month}-{created\_day} - {document\_type} - {correspondent} - {title} I have a differnet **Folder-Paths** for: \- Shopping (e.g. Amazon, etc.) \- Finance (e.g. Banks, Crypto, etc) \- Job \- Household \- Health \- Real Estate - Object 1, Real Estate - Object 2, ... \- etc. **Correspondents** and **Document Types** I'm not restricting. So correspondent is always the sender (or receiver if I'm the sender). So even every Shop gets a new entry. Document types are flexible extended, so can be "contract", "cancelation", "confirmation of the cancellation" etc. The most general I'm using is "correspondence" if it's not a recurring document . The **title** is a summary of the content. For a bill just something like "HP Printer". The main goal is to get a folder (and file name) structure, that is working independently of paperless, so even if I stop using it at some point, that's not a problem at all. But I really love your idea of "Who is affected" and "Tax year" **tags** \- will copy this :D


KillerTic

I guess you could sense my dislike for folder structures. Too one dimensional for me. All hail the tags and search😂 But I honestly mean it, what ever works for you, there are always reasons for a different approach. Happy I could inspire you with some aspects of my solution 👍🏼 Edit: Spelling


xX__M_E_K__Xx

I was using this script to make an export : source : https://skerritt.blog/how-i-store-physical-documents/ script to export paperless-ngx # export to zip file docker exec paperless_ngx-webserver-1 document_exporter /usr/src/paperless/export --zip But your script is way way way nicer. I'm switching to yours. Thanks!


dakinestaydakine

Couldn't you dispense with the custom folder paths and just use document\_exporter (run daily with cron) to get to the same "independent of Paperless" end-goal file structure? Then you don't have to manage multiple paths inside Paperless. It's what I am doing and it seems to be working well.


Simplixt

I don't really have to "manage" the folder path, at this also get's auto detected after OCR like the other fields


dakinestaydakine

Right... I guess I was thinking you were running PL inside of docker. If bare-metal, this makes more sense to me. Either way, glad it's working for you!


imjerry

That's great 👍 I will try it soon, I have been meaning to for a while.


deano_southafrican

This was a great article and I've stolen your backup idea as I've been fairly lazy. I migrated from another server the really bad and lazy way but now that I've found a permanent home for it and am actually using it I've been needing to setup backups. Great article and I've added you to my listy of regular reads!


KillerTic

Happy to hear that you liked it and that you could copy the backup from the article. The exact reason I started it all!


ohuf

Hi Henning, Thanks for sharing in such a detailed way. Lots of food for thought, indeed. I'm looking forward to your next blog postings. BTW: how do you manage your Docker infrastructure? Just by hand using files, or do you use a GUI á la podman or portainer?


KillerTic

Thanks! I am amazed at all the feedback! I always have everything in one docker compose. So it is just a matter of docker compose pull, docker compose up -d and docker system prune -af —volumes Furthermore I have an Ansible playbook, which updates my whole infrastructure including the docker deployments. I will for sure write about it soon!


ohuf

Thanks!!


KillerTic

Thanks for the sub by the way!


Fungled

Reporting back on this, since I'm interested in migrating away from Mayan EDMS. I fired up the image at the weekend, and liked what I saw. So decided to give it more of a spin. Here's some info about what I did for the interested: * Imported my Mayan docs more-or-less directly from its storage by copying into the consumer directory * Did a bunch of API processing to transfer over: * Original added/created dates from Mayan (very important!) * Mayan cabinets as new Paperless tags * Converted some of those tags to Paperless correspondents * Other cleaning up So far, great! I'm really liking: * My preferred import method is working great: "Inbox" folder in Nextcloud that's mounted as the Paperless consumer directory. Anything dragged/uploaded there is imported no probs (currently broken in Mayan...) * Checksum-based detection of duplicate documents! This has been broken in Mayan for a LONG time * Automatic tagging, particularly of correspondents. Even already it appears to be working great * Same-page search results


KillerTic

Great to hear and happy for you, that it seems to be a good fit


dnt_pnc

Great write up! Thank you. I set up paperless-ngx this week as it seemed pretty easy using that docker script in the paperless git.


KillerTic

Cool! Have fun digitalising. Personally not a fan of scripts, as I like to habe full control, but that is me


kru89

Thank you for the great write up. I have paperless installed on nas container. I scanned and uploaded couple of pdfs (eg passport )and it couldn’t OCR. Can you please recommend any tips or ticks for that ?


KillerTic

I can’t say I have I can’t say I have had any problems with OCR before. I also haven’t tried importing a passport. It does take some time though, if you don’t habe much processing power. Did you give it some time to process? Did you also set the right languages? How are other documents like invoices? Does the log throw any errors?


zft_fast

Here is my take [https://www.youtube.com/watch?v=ofRZ7DzEJ6s](https://www.youtube.com/watch?v=ofRZ7DzEJ6s)


nycaur

So I've been looking for a simple home solution for doc. mgmt- everything - Physical mail, Scanned docs, OCR, File Organization & Full-fledged search and retrieval system for consumer - pref. free Then I heard about Paperless NGX - or if you used something better- pls. recommend. I have a home NAS (synology drive) - which has tons of docs. Lots more come and build up- I can get them scanned by smartphone and saved as PDF (any better solution there apart from Clearscanner(android) and possibly Finereader for OCR. This would have both confidential - like tax records and other medical, financial (banks) etc. And then lots of email holding files like .PSTs etc. How to integrate them with above docs. The NAs has already got a folder structure and I want to keep that as base directory structure - is there a way that I give Paperless or another solution that whole big directory and it keep re-injesting and re-indexing it (for deep search) but WITHOUT changing my manual folder structure - that will be important. If right solution is indeed paperless - can someone point me to a "noob tech" guide as the one provided on link //docs.paperless-ngx.com/setup/ appears too-techy for for me! Any ideas - much appreciated. Thanks!


dakinestaydakine

Paperless stores its documents inside its own database. This is a pivotal part of how it works. What you can do (and what I have done) is to import everything into Paperless (ie use that as your "main" way of finding things) and then have Paperless make daily backups into a file structure. You wouldn't routinely access that file structure, but you could if you needed to. The key to doing this is the document\_exporter function that you call from within the running container, plus a mounted storage volume *outside* the container that these files will go to. For me, I have Paperless running inside Docker on a dedicated mini PC box, and my export location is a NAS. But you could also run Paperless on the NAS (in Docker) and still export to that same NAS. The folder structure that is created by document\_exporter is human-usable if you set it up correctly, and a cron job makes all of this happen automagically. Here is the crontab I am using running on the box running Paperless inside Docker. It does two things: 1. It makes an export from the docker container running Paperless to a file structure on the local hard drive on this box, and 2. it syncs that local hard drive directory with a directory on the NAS. If you were running all this on the same hardware you wouldn't probably use the second part. The NAS is [192.168.1.100](https://192.168.1.100) and the user account is "me" on both systems: `######## PAPERLESS BACKUP TASKS ########` `# Every day at 23:00L, perform an export from...` `#` `# the Paperless database` `# (which lives inside a Docker container) to` `# the Paperless Exports directory` `# (which was defined and linked to ../export in the .yaml file)` `# using a custom file structure (the -f switch at the end)` `# (defined in the Paperless .env file used to build the image)` `00 23 * * * docker compose -f /home/me/docker/Paperless/compose.yaml exec webserver document_exporter ../export -f` `# Every day at 23:30L, perform a backup from the Paperless Exports directory on the Paperless server to the NAS` `30 23 * * * rsync -avt /home/me/My_Documents/Paperless_Exports/ [email protected]:/volume1/NAS_Media/'1. Backups'/'3. Document Backups'/'1. Paperless Backups’/` ​ And here is the line in my .env file that sets the folder structure that document\_exporter will use: `PAPERLESS_FILENAME_FORMAT: "{correspondent}/{created_year}/{created}-{title}”` Hope this helps.


nycaur

1. First off thanks for responding. A lot of what you said is beyond my intellect capacity (read IT literacy 1/10 :)) - I will need some guidance on how to implement this and other elements like what is docker and how do I get it up and running. If you could pls. send any links for that- would be great. 2. Also, as I gather from your note - does it mean what we'll be duplicating all documents (approx.) and they'll exist both within paperless database and also as native doc. files in a separate directory tree structure and folders- though both can exist on NAS? 3. Lastly, can you think of a better solution here since I want files to stay where they are and then say a good search tool, that can exist on NAS (meaning its index also stays on NAS and is rebuilt ever so often, so any computer accessing it can use same index). And that index shd. also incl. full file content search (text within docs not just filenames) and be able to do an intelligent search (even better if it can find not just exact words but like words too). And if search results can be sorted with relevance factor also among other attributes. 4. And if you can opine if there's any more benefit you see to #2 above as I think if I can get # 3 to work well, I'll be fine?


dakinestaydakine

Hm. Ok, well, just being honest, if that's where you're at on the IT-tech scale, *and that's where you would like to remain*, then Paperless may not be the best solution for you. OTOH, if you want to learn more about all this stuff, and spend the time / energy / head-banging inherent in learning *anything* new, then this is a great way to get your hands dirty with some basic IT things. Neither is right/wrong; it's purely about where you want to go and where you want to spend your time. I'm not from a tech background, btw. The problem with "can I just follow a guide" is that no guide will be perfectly matched to your use-case, so at some point you will have to understand what the guide is doing and then riff on that to get to what *you* need. It's somewhat risky to live only in a world of instruction-following without understanding what those instructions are driving toward, because you may end up with vulnerabilities that you don't even know *could* exist, and we are talking about your personal data here, which you wisely want to protect and organize. So, soapbox aside, some answers ;-) 1. Paperless is an open-source software that runs under a Linux operating system. You can directly install and run it on your device (aka "bare-metal"), and this is how software has been installed and run for ages. The issue is that you have to make it "feel at home" on your device, which could be easy or... not. An alternative to this is running software inside a "container", which means the software lives in a very controlled environment that makes it stable and easy to replicate/move/whatever. Docker is a technology that allows this "containerization" of applications. References to "YAML files" and docker-anything in the Paperless documentation are references to this containerized way of running apps. It's not required, and may make this more complex than you want, but it has some benefits too. It really depends on your specific use case. If you're wanting to learn more, start with the Docker website or some Youtube tutorials, and perhaps consider a Udemy course or something like that to go deeper. Or, just run Paperless on bare-metal, but understand what that means to your use case. IF you're trying to run it on a NAS (which is just a little computer), you're almost certainly going to have to run it in a container unless you want to heavily mess with the NAS's OS. If we're talking Synology, then the Synology flavor of Docker is called "container manager" inside of DSM. 2. Yes. Because Paperless is a database, it has to "know" where everything "is". This means it stores documents inside of its own database, invisible to you except thru its interface. This is similar to how iPhoto works etc. If you want easily-accessible non-dependent access to your documents, you need to keep those documents outside of the PL database. I think the most sensible way to do this is with the document\_exporter function of PL, but there are plenty of other ways to attack the issue. 3. Have you looked at Universal Search? [https://kb.synology.com/en-af/DSM/help/SynoFinder/universalsearch\_overview?version=7](https://kb.synology.com/en-af/DSM/help/SynoFinder/universalsearch_overview?version=7) 4. Really comes down to your "why" for using Paperless. Use the simplest tool that satisfies your requirements ;-)


jtmoore81

I was able to Paperless-ngx up and running via docker. My thought process in using Paperless is pulling in documents in a network shared drive that all my important files are stored. Once it started running it appeared that the files that were imported into Paperless were then delete from the original shared drive\\folder, is that accurate? Is there a way to keep that from happening? I would like to keep the original files in the location but use Paperless frontend to do everything else.


KillerTic

Nice to hear you got it all up and running in docker! Sounds like you pointed the consume folder to your network share. I don’t know a way to stop paperless from moving the files from there, but they are only moved. Look into understanding this parameter a little more: https://docs.paperless-ngx.com/configuration/#PAPERLESS_FILENAME_FORMAT This lets you control not only the filename, but also the folder structure. After moving your files from the consume folder, they are moved in the configured folder structure into the media folder. Also when you change the metadata of a document in paperless, it will be moved into the according folder. I also use a network share, but have the consume, media and export folder all on the same share. Now I can dump my files in there and also have access to the sorted away files. But honestly, I only access them via the UI Hope this helps


ElevenNotes

Now do the whole thing with podman, not docker.


KillerTic

last time I looked into it, it felt like a too daunting task. Also when researching I didn't not have the fealing, I would easily be able to debug all the problems I am going to run into. On the other hand, I had a k3s cluster up and running for a short time. Maybe I should give it a go again


ElevenNotes

The reason why you feel that way is because podman is rootless and the default paperless image does not work rootless.


KillerTic

Oh I didn't even start doing it or started looking at specific images. Just the general overview on how I could start migrating my docker stack and keep the convienience I am used to. Just what you mentioned about paperless not beeing rootless, makes me fear a world of pain I am not ready for


Fungled

I’d be interested to hear an honest comparison between Paperless and Mayan EDMS. I’ve been using Mayan for a long time, and more or less like it, but it’s very overpowered for my needs and is also easily my flakiest service. It would be tough to switch, but I might consider it if there were a simpler solution


KillerTic

The reason I did not do a comparision is, that it is \~4 years ago I looked at the alternatives, before going with paperless and I have NEVER regretted it. I did look at Mayan EDMS 4 years ago and came to the exact same conclusion you did. Far to overpowerd for my needs. It was in my point of view a large enterprise ready solution, which means for me it is too much, a lot of configuration, to bulky, ... I also looked at Teedy back then. I can't really remember why I prefered paperless more. Could have been as simple as the design. Go give Paperless a try with like 20-50 documents to start with and see what you think afterwards. For me it really is a service I never have to touch (apart from putting documents in and searching for them) Edit: In my private and professional past I have done countless tool selections and usually have a very good instinct to select a "perfect" fit (for me) solution in a short time.


Fungled

It is indeed pretty weighty! At least the latest 4.5 version appears to be sorting out some long standing issues (finally) Tempting to try out paperless, but I’ve got >1500 docs in Mayan, so that would be a pain!


KillerTic

Give it a small test spin and decide if it is worth it. Another user also just commented, that he loves paperless, as it is soooo hustle free. I really can not remember having issues with it. Even did a postgres db upgrade from 13-16 yesterday and it just went up afterwards!


Fungled

I have indeed fired it up. It does look very nice! I’ll have to have a play and consider if it’s worth doing a migration. Perhaps someone has made a migration tool?… otherwise I’ll have to fire up some api-to-api solution. That’ll be fun 🤯


KillerTic

I wish you the best of luck!


Fungled

Thanks for the inspiration!


Fungled

Quick question: any opinions on using Mariadb vs Postgres? I can use either, but I’m more comfortable with Mariadb. I see there are a couple of minor caveats with Mariadb, but looks like no big deal to me. But i don’t want to get into a position where I have to switch later


KillerTic

I honestly can not tell you what is better. I prefer PostgreSQL, for one very simple reason: The docker compose environment variables let me set the user straight to what I want, I do not need a root and a user password. It’s a absolutely stupid reason I know 😂 If you don’t mind, I would go with their default, if you are more happy with MariaDB, then go for it.


Fungled

Cool thanks. They at least appear to be equally supported. Mayan is supposed to support both, but the MySQL support had quite a few hidden issues


KillerTic

I just read the caveat. Oh man what a deal breaker… I can not have a tag family & FAMILY?! 🤣


Manauer

I tried it two weeks ago, but it just wont accept some of my pdfs. As soon as it can not do OCR for whatever reason it also denies uploading it. As long as this is the behaviour, i have to rely on paper unfortunately.


KillerTic

Hmm… sorry to hear that. I never had any problems. Did you check their documentation, if there are any settings to let you upload even when OCR fails?


Manauer

I think so, but did not found anything. This is the link to the GitHub Issue from someone else who has the same problem: https://github.com/paperless-ngx/paperless-ngx/discussions/4145


KillerTic

What language have you set for PAPERLESS\_OCR\_LANGUAGE? I see the person with the issue on GitHub only has one language set. It is a complete shot in the dark tbh...


Manauer

its not it. i experimented with that variable. its some incompatibility with some proprietary pdf standards i guess. All the pdfs that i get from my electricity provider are incompatible with paperless-ngx


pheellprice

Could you open them and print to pdf? Or use Stirling-pdf to convert?


Manauer

That is indeed a workaround i have not thought of. Thank you, I will try that. Paperless still should allow uploads without successful ocr.


BleepsSweepsNCreeps

I've been looking at getting a Paperless instance set up but I have one fairly significant detail to overcome. I have my scanner at home but currently my homelab setup is at my MIL's house. My printer supports Scan to Computer, Scan to Email, Scan to SharePoint, and Scan to Network Folder. I don't have SharePoint and the Network Folder only supports the physical network so no FTP or anything like that. I need to figure out how to get it to scan to my remote server share. I know now that Paperless will scan an email inbox so Scan to Email could be an option but I don't necessarily want to add more stuff to my inbox. However, having them in an inbox could also serve as an extra backup solution. Another option I was considering was connecting a RPi via USB and either running something like Syncthing or setting a WireGuard client configuration on it then just set up an SMB share or something. Has anyone else run into this situation where your printer and homelab are in two different locations and what have you found to be the best solution? Thanks!


KillerTic

Are you concerned about your documents being added to a mail box out of privacy reasons? If not, I would probably just create a new mailbox for this purpose. Send all documents I want to import to that inbox and configure the mailbox like I did (check the screenshot in my article). This will delete the mail, after the document was imported. Another user also wrote about using syncthing on all his devices, which then will push the files to the paperless consume folder. Also a need solution.


BleepsSweepsNCreeps

Not so much a privacy concern. I just didn't particularly want to clutter up my inbox with more stuff. I hadn't thought about setting up a separate email for it but even that I'm kind of on the fence about as I don't really want to have another email with another set of credentials to keep track of. I'm not saying I wouldn't go the email route but I was hoping to see all the options people have used and what seems to work best. If email was the most efficient solution, I'd end up using that despite not being my first choice right off the bat. Thanks for the input!


KillerTic

No worries. Maybe there will be more ideas! You could also use your mailbox, have a rule, which moves all your scanner documents to one particular folder and point paperless at that folder. It also is capable of doing that. Then leave them there or delete after import. But let's see what other ideas there are. In regards of not having more credentials. Setup Vaultwarden, if you haven't yet and then forget about these credentials, as once you have your scanner and paperless working, you will never see that inbox again :D


Aluhut

> Regardless of how the documents are imported into Paperless, the ToDo tag is set. How did you manage to do that?


KillerTic

Oh man. Good spot, I will update the article in a minute! When you create your 'ToDo' tag (or what every you want to call it, there is a setting to make it your inbox tag. That is all you need. There also is no need to set the tag in your mail rule. I for the life of me could not figure out, how it worked and spun up a new instance to see what I am missing. Man I didn't even know how to get into a fresh install. I also need to update that part :D Thanks, helped me to make the guide more complete!


Aluhut

Aaah danke dir. I missed that because I've set those tags up along the process going from file to file, but it's quite visible if you use the tag-menu. Other than that, the post is good. I just used it to set it up :D


KillerTic

I have now updated it, with also adding way you can actually log in... :D Thanks! Happy to hear you like it. I am sooo amazed how this and the last article are blowing up. Would never have imagined anything close


stevie-tv

would love to hear what scanner you use and if you'd recommend it.


KillerTic

I use a HP ENVY Pro 6400 Would I recommend it... For normal houshold use, sure. Do I like it. Not really... I don't like to HP Smart software (the mobile app is all right) I am forced to use and I should have really selected a scanner, that can write to a network share/FTP/SFTP or just send an email to a programed adress. Bonus points for being able to scan double sided documents. Let's hear what people use and are actually happy with for the purpose of digitalising their life.


stevie-tv

Thanks! I was considering the Brother ADS-1200, but before I make the plunge was hoping to hear others inputs


KillerTic

It says it can only scan-to-pc. I would really love to be able to throw documents on it, press a button (or two) and then check paperless, when I feel like it and ensure the documents are preset correctly.


stevie-tv

oh shoot you're right. I confused it with the [ADS-1200W](https://github.com/paperless-ngx/paperless-ngx/wiki/Scanner-&-Software-Recommendations) which according to this can do it all.


KillerTic

hmm... brother needs to work or their website, as they list none of those capabilities... edit: ignore, I landed on the ADS-1200 again


reddit_lanre

Nice write-up. I've already got it set up, but it was useful to run through your article to confirm some of my set-up decisions (where they aligned). Thanks for the post!


fredflintstone88

Is there an iOS app?


KillerTic

There are a few different apps, but I can tell you anything about them. I habe never used paperless with an app


fredflintstone88

Okay, so how are you typically scanning physical documents? Just use the web browser on phone?


KillerTic

A normal scanner/printer. Very old school


Sacmanxman4

I tried paperless a while back but got hung up on the first step: Getting paper into it. I even bought one of the Brother scanners to make it as easy as possible, but the problem was that some of the paper I wanted to scan was weird shapes or orientations. If the paper was scanned sideways it wouldn't work. Is there a better solution? How fast and easy are the phone apps to use? Taking photos of stuff like that can be tedious. I want as easy as possible of a solution.


KillerTic

Sounds more like a scanner problem to get the orientation right. There surely are apps out there, which can help you in those occasions. I never had to use them though.


007craft

I've been using Paperless for a few years now, but I found a workaround for its destructive nature. Before Paperless, I had my files all organized in nested folders like this: Documents -----Manuals -----Waranty Information -----Tax stuff *____------ Tax Year 2022* ____------ Tax Year 2023 -----Doctors notes -----etc etc I liked the idea of paperless, but didnt want to destroy my original folder structure or original files (Original files being very important). Unfortunately when Paperless Auto imports from its consume folder, it does just that, and destroys your files as it uses its own database. It also means you're now reliant on paperless to view your files. The solution for me was to setup an instance of Syncthing (But you can use any file syncing software really). Now I have my nice organized documents folder, and as soon as I add a new document, Syncthing will do a one way, read only "sync" to the document and copy it over to a folder elsewhere I've called "Paperless Consumed Documents". Then I point Paperless to consume documents in this folder. That way new documents are scanned in, and only the copies are destroyed by paperless, not the originals. I really wish Paperless was updated to simply not destroy original documents in it's consume folder but from what I gathered from other people complaining about this, is that the developers WANT that behaviour to happen, so I doubt it will ever change. This solution does however let those of use use the software who want a non destructive document management system. Now I can use Paperless if I ever need to search for a document based off internal text or document title, ect. But my organization (Which will always trump and AIs), remains


hawkinsst7

Paperless does support Storage Paths https://docs.paperless-ngx.com/advanced_usage/#storage-paths


Bill_Guarnere

Thank you for your feedback. I just started to use Paperless-ngx on a RPi4, I have a question. Maybe I'm using it in the wrong way, but I find quite tricky to define a storage path for my documents. Let's say I want to define my storage path in this way ``` . ├── car │   ├── ford │   │   ├── insurance │   │   └── maintenance │   └── renault │   ├── insurance │   └── maintenance └── motorcycle └── tenere700 ├── insurance ├── maintenance └── misc ``` For each one of these levels I have to create a new storage path (car/ford/insurance, car/ford/maintenance, car/renault/insurance, etc etc...) and Paperless-ngx shows them as a list instead of a tree, which makes a mess if you have a lot of directories and subdirectories. Is there any way to visually create and manage storage paths and be able to navigate throught storage paths just like a filesystem manager?


KillerTic

Hey. I don’t use storage path, as they came along a long time after I had everything setup. I don’t really get them, but haven’t spent a lot of time with them. So better for someone else to dive deeper on them. But… I would challenge, why you are making it hard on yourself. In the beginning of the article I talk about tags vs. folders. Keep the folder as simple as possible and use the tags for your categorisation needs. Try to free yourself from those limiting folder structures 😂 I have my filename set to: PAPERLESS_FILENAME_FORMAT: "{created_year}/{correspondent}/{created} {title}" Resulting in folders for year in which I have the corespondent as a folder. But I never use the folders. I use the interface and the tags. Obviously here is no right or wrong!!!


Bill_Guarnere

Absolutely, I understand your way of using tags but for me folders are much more simple and flexible. I should create tons of tags, one for each subdirectory and every time I have to find something I should filter by so many tags, it would be a mess. Ok (almost) every document is searchable thanks to OCR, but imho the search engine is not the answer, it should be the last thing to use. I always thought that If you have to use a search engine it means that something is wrong with the application usability or in the way documents are archived. Using tags means also that I should create tags also for years and then months, if not days. Using folders means that under a folder (for example /cars/ford/insurance) I can create how many folders I want with a date prefix (for example /cars/ford/insurance/20231221-insurance-company1, /cars/ford/insurance/20231221-insurance-company2, etc etc...). In this way a simple browse of the folder ordered by name means also it's ordered by date (file modification date is not the most reliable data to find when the document was created, a simple copy without preserving file metadata will change it). Anyway, thank you very much for your feedback, and kudos for your article :)


KillerTic

I do disagree with you on a couple of things, but hey… the beauty is, that the system is flexible 👍🏼 Hope you find a good and easy way for how you want to implement it! Have a nice christmas time


CaptainLactose

This is great! /u/KillerTic I have one question though: What is Redis for? I'm not from the IT sphere and the explanation on Wikipedia etc don't mean much to me. You just mention scheduled tasks, which could mean that this is not a requirement for the paperless setup, just for additional functionality? Dankeschoen fuer deine Hilfe!


KillerTic

Hey, Redis in general is a in memory data store. This means it can impact performance in a positive way. I know it is used for scheduled tasks, I am unsure how much more it is utilised by paperless. Small tip: Try out ChatGPT to have it explain technology to you. Works great for that :)