I've had a little pet project I've been mostly thinking about, and then lately actually working on. I'm a little obsessed with backups. I do frequently backup all my stuff onto an external hard drive. But my inherent paranoia has caused me to push for off-site backups. I could always just put stuff on an external drive that I handed to a had a friend keep for me, but where's the fun in that. So I decided to write a program in C that lets me do unattended backups for storing in AWS. My program also supports resuming a previously aborted process, which is useful if you are making backups of hundreds of GB. You might not be able to leave your computer on long enough for everything to finish, so a resume operation can pick up where you left off. In short this program: Creates compressed archives, encrypts archive with gpg, either with a key or with a passphrase. Then uploads to AWS. I picked AWS since they are the absolute cheapest if you want to store large amounts of data. When using the deep_archive storage class, the price is $0.00099 per GB. Retrieving a deep_archive is slow, might take up to 12 hours. Data retrieval costs more, but in theory you should not need to do that. If you encrypt with a key, you obviously need to store your private key somewhere else that you can retrieve more easily. At the moment, this program only has very basic options available, but I expect to add a bit more to it over time. https://gitlab.com/Daerandin/cebac
Wow, this is really cool! Thank you for making your backup code available to the public. I wonder if it would work with Google? I have Google Drive, but only 15GB total. I think its around $100.00 a year for 2TB... which isn't that bad.
There is already a Python package for interfacing with Google Drive, so it would be possible to implement that without too much trouble, if you'd actually be interested in that feature. The Google drive price is comparable to AWS Glacier storage, if you were to actually have 2TB stored. AWS becomes a bit more expensive when you start downloading, which is the downside. However, I use Deep Archive, so if I were to put 2TB on storage there, it would cost me a bit less than 24 USD per year. And that's the reason why I picked AWS. I expect to store roughly 700GB, although it will be compressed so maybe only around 650 GB, so I can expect to pay around 8 USD per year as long as I don't need to download it. The download prices are a bit complicated, as you pay for the actual request, as well as per GB downloaded. But that suits me, I use it as my disaster backup. In case I lose my computer and physical backup at home, I can get my stuff from AWS. I encrypt with a key, so to keep the key safe, I have it password encrypted and stored on Google Drive. I already have plans for a few more features (more formats, compression level selection), if Google Drive is something interesting then I'll add it to my list of planned features.
Amazon Web Storage does seem like a nice option. I was only thinking Google since I already have a Google account and drive... I do have an off-site backup I took to work in my lunch bag every day, but now I work at home, so I'm not exactly sure what to with it. Cloud backup sounds like a viable option, but knowing you are a security guru, what do you think about your data hosted in a data center in China somewhere? Is it really safe? What if AWS gets hacked like so many other large companies & governments? Your data is encrypted too, but doesn't that take even longer to encrypt and upload to cloud?
Since this is cloud storage, essentially putting my private files on another computer somewhere else in the world, and I have no idea who has access to it, I ensure to use strong encryption. My program use the gpgme library, which use GPG directly. GPG is well known for high quality encryption, and it is the provider that I personally trust. The two encryption options are either using pubkey, which is the safest method as long as you keep your private key safe. Password encryption is also available, using AES256, which would render it practically uncrackable as long as you use a strong passphrase for the encryption. So even if someone else gains access to my backup archives, I feel confident that they can't access the data. I guess if you want to be really paranoid, you could always assume someone has access to replace your archives. hashing the files before uploading and storing the hashes is one possible mitigation there. But I consider this low-risk and it's not something I intend to implement. As for time, encrypting takes time, yes. Which is why my program is designed to run unattended. You just start the process and leave it to do its thing. If at any time you need to abort, you can simply resume next time you start the program. Currently my program only support a very few options: tar archive compression: none (just to have the option), xz, zstd. xz give better compression, but is super slow. I intend to make it possible to compress with more threads later. zstd is almost as good as xz, but faster by a huge amount, so I use it for now encryption: none (in case you don't want/need encryption), gpg with key, gpg with passphrase key is safer, but passphrase is more convenient for someone who is unsure about managing gpg keys AWS: S3, Glacier, Deep Archive S3 is more convenient, but costs more. Glacier is less convenient, but costs less, you might need to wait 12 hours before a download is ready. Deep Archive, cheapest, but you might need to wait even longer before a download is available. If you just want to give it a little test, you can run the program without initiating any form of upload. This will simply create archives on your hard drive. You can also run the program without doing anything, just either print out configuration options, to verify that you configured it correctly. Creating archives, and uploading must be specified by the -a and -u flags, respectively. Documentation might be a little lacking (no man page), but the example config file is well documented. If you are curious to try it and need some help to compile it, just let me know.
You should add this information to the first post. Good stuff! Even working in IT, I don't mess with encryption enough. Always fixing problems and putting out fires.