How to Sync Files and Directories to AWS S3 using s3cmd Tool

Today we will show how to backup your data to Amazon Web Services. We will use s3cmd, a command line client for Amazon s3 storage. It enables you to create manage and delete buckets from your terminal and to load data from your server.

Install s3cmd on Ubuntu and CentOS

For installing s3cmd on CentOS operating system we first need to add EPEL repository.

yum install epel-release

Then install s3cmd

yum install s3cmd

On Ubuntu, it is inside official repository so we just need to run

apt get install s3cmd

That will get you s3cmd installed on your computer.

Configuring s3cmd

Now that we have s3cmd configured, we need to connect it to the AWS account. We assume you are familiar with AWS pricing and have working account, so we wont go trough how to create it. Going straight to config part as root user type:

 s3cmd --configure

And then work the prompt as follows, changing the bold for your credentials:

Enter new values or accept defaults in brackets with Enter.
Refer to user manual for detailed description of all options.

Access key and Secret key are your identifiers for Amazon S3. Leave them empty for using the env variables.
Access Key: ACESSSSSSSSSSSSSKEEEEEEY
Secret Key: 8ujSecret/82xqHWZqT5UzT0OCzUVvKeyyy
Default Region [US]:

Encryption password is used to protect your files from reading
by unauthorized persons while in transfer to S3
Encryption password: password
Path to GPG program [/usr/bin/gpg]:

When using secure HTTPS protocol all communication with Amazon S3
servers is protected from 3rd party eavesdropping. This method is
slower than plain HTTP, and can only be proxied with Python 2.7 or newer
Use HTTPS protocol [Yes]:

On some networks all internet access must go through an HTTP proxy.
Try setting it here if you can't connect to S3 directly
HTTP Proxy server name:

New settings:
Access Key: ACESSSSSSSSSSSSSKEEEEEEY
Secret Key: 8ujSecret/82xqHWZqT5UzT0OCzUVvKeyyy
Default Region: US
Encryption password: password
Path to GPG program: /usr/bin/gpg
Use HTTPS protocol: True
HTTP Proxy server name:
HTTP Proxy server port: 0

Test access with supplied credentials? [Y/n] y
Please wait, attempting to list all buckets...
Success. Your access key and secret key worked fine :-)

Now verifying that encryption works...
Success. Encryption and decryption worked fine :-)

Save settings? [y/N] y
Configuration saved to '/root/.s3cfg'

Now that we are connected to the AWS, we move to next step using it. We will list all the relevant commands.

1. Creating a bucket

To create a bucket in our account, we use s3cmd mb command followed by the url of future bucket.

s3cmd mb s3://linoxide
Bucket 's3://linoxide/' created

Now that we have created it, lets list to see buckets.

2. Listing buckets

To list all buckets that are currently available type this command

s3cmd ls
2016-12-03 15:52 s3://linoxide

Since it is a fresh account, we only have one that we created a moment ago.

3. Putting a directory into the bucket

Uploading files and folders is done with put command. Lets create a folder to use on local computer and one file in it.

mkdir test

echo 12345 >> test/file1

Now to put this folder to our S3 bucket, we use put command

s3cmd put -r test s3://linoxide
upload: 'test/file1' -> 's3://linoxide/test/file1' [1 of 1] 6 of 6 100% in 0s 60.62 B/s done
upload: 'test/file1' -> 's3://linoxide/test/file1' [1 of 1] 6 of 6 100% in 0s 41.88 B/s done

Notice the -r in the command, it means recursively, which is needed if we are uploading folders. For files it can be omitted.

4. Uploading files

Uploading the files is, like mentioned above, done same as directory except you omit -r

s3cmd put test/file1 s3://linoxide
upload: 'test/file1' -> 's3://linoxide/file1' [1 of 1] 6 of 6 100% in 0s 44.00 B/s done
upload: 'test/file1' -> 's3://linoxide/file1' [1 of 1] 6 of 6 100% in 0s 17.77 B/s done

5. Listing the contents of the bucket

Since we put some data into the bucket, we want to see what is inside. Command number one that we did was for listing all the buckets we have, now we do same command with bucket uri to get the contents of the bucket

s3cmd ls s3://linoxide
DIR s3://linoxide/test/
2016-12-03 17:21 6 s3://linoxide/file1

We have directory test and alongside it we have file1

6. Downloading files and folders

Downloading is done with get command and same as put command, for folder you need to user -r option.

s3cmd get -r s3://linoxide/
ERROR: Parameter problem: File ./file1 already exists. Use either of --force / --continue / --skip-existing or give it a new name.

We see that trouble here is that we already have those files, which is normal, since we just uploaded it from here. Lets clear out the space.

rm -rf file1 test/

After clearing out we can download

s3cmd get -r s3://linoxide/
download: 's3://linoxide/file1' -> './file1' [1 of 2] 6 of 6 100% in 0s 97.99 B/s done
download: 's3://linoxide/test/file1' -> './test/file1' [2 of 2] 6 of 6 100% in 0s 112.95 B/s done

We can download individual directories or files, as well as what we did entire bucket.

7. Deleting files and folders

To delete the folder you can use del command, no -r marker needed

s3cmd del s3://linoxide/test

To purge all data from the bucket, you need -r and -f (as in force) options

s3cmd del -f -r s3://linoxide/

8. Syncing entire directories

s3cmd supports syncing directories. For example if we have 5 files inside directory test (after doing touch test/file{1..5} command), we can try to sync that directory.

 

s3cmd sync --dry-run test/ s3://linoxide
upload: 'test/file1' -> 's3://linoxide/file1'
upload: 'test/file2' -> 's3://linoxide/file2'
upload: 'test/file3' -> 's3://linoxide/file3'
upload: 'test/file4' -> 's3://linoxide/file4'
upload: 'test/file5' -> 's3://linoxide/file5'
WARNING: Exiting now because of --dry-run

It was ran with dry-run option, which means that it will only list files to sync, and not actually sync them. Removing the slash gives us different location, in test folder

s3cmd sync --dry-run test s3://linoxide
upload: 'test/file1' -> 's3://linoxide/test/file1'
upload: 'test/file2' -> 's3://linoxide/test/file2'
upload: 'test/file3' -> 's3://linoxide/test/file3'
upload: 'test/file4' -> 's3://linoxide/test/file4'
upload: 'test/file5' -> 's3://linoxide/test/file5

And removing dry run uploads the files.

9. Syncing with exclusion lists

Lets add some files with .txt ending

touch test/file{6..8}.txt

They should be there along with other files

ls test/
file1 file2 file3 file4 file5 file6.txt file7.txt file8.txt

Now we want to sync this directory without those files that end in .txt. We do this with following command

s3cmd sync --dry-run test/ --exclude '*.txt' s3://linoxide

Running the same command without dry run marker should get the files excluded and uploaded.

10. Removing the bucket

To delete the bucket, first purge all the data, like we did before:

s3cmd del -f -r s3://linoxide/

And then remove it with rb command which is for removing bucket

s3cmd rb s3://linoxide/

Conclusion

We have installed s3cmd and used it to back up data to the Amazon cloud. It is a quick and easy way to get your data backed up, without having to keep your physical backup with you. That is, if you are ok with Amazon pricing. If not, you might still need to get the additional hard drive and get your data backed up in old-fashioned way. Thank you for reading, this is all.

About Mihajlo Milenovic

Miki is a long time GNU/Linux user, Free Software advocate and a freelance system administrator from Serbia. Got introduced to GNU/Linux in year 2003 on old AMD Duron computer, and since than always eager to learn new stuff about this system. From 2016 writes for Linoxide to share his experiences with wider audience

Author Archive Page

Have anything to say?

Your email address will not be published. Required fields are marked *

All comments are subject to moderation.