The Linux wget command-line tool is a nifty utility used to download files over the internet. It's usually used when you want to download tarball & zipped files, deb & rpm packages from a website.
With wget command, you can download a file over HTTP, HTTPS or even FTP protocols. A number of options you can use alongside the wget command that makes it powerful.
In this tutorial, we will look at how to use the GNU wget command through examples.
syntax of wget
The syntax for using wget is quite straightforward. Simply invoke the wget command, followed by the options and thereafter the website link as shown.
$ wget [ options ] url
Most modern Linux distributions ship with the wget utility, so you won't be required to install it. However, there are some instances where installation may be required especially in older systems and some cloud instances or to resolve '–bash:wget:Command not found' error message.
For Ubuntu & Debian flavours, run the following command using the APT package manager:
$ sudo apt-get install wget
To install wget command in RHEL / CentOS 8 flavors, execute:
$ sudo dnf install wget
For older releases, (RHEL 7 and CentOS 7 and earlier ) use the yum package manager as follows
$ sudo yum install wget
For Arch Linux and Arch Linux distributions such as Manjaro, Arch Linux and Endeavor OS, use the pacman package manager as follows:
$ sudo pacman -S wget
Download a file from Internet
In its most basic form, without using any options, wget command downloads a file from the internet as specified in the URL. The simplest syntax is shown below:
$ wget website_url
For example, in the example below, we are downloading the latest WordPress tarball from the official WordPress Site.
$ wget https://wordpress.org/latest.tar.gz
From the output, we can see that wget firstly resolves the website's domain to an IP address, upon which is connects to the server and thereafter initiates the file transfer. When the download is in progress, you get to see the progress bar indicating the file name, the size, the download speed along with the estimated time of completion of the download.
Upon completion of the download, be sure to find the file in your present working directory. This can be verified by invoking the ls command.
Download file and save under specific name
If you prefer to save the file under a name different from the original name, simply pass the
-O option followed by your preferred name.
$ wget -O wordpress.tar.gz https://wordpress.org/latest.tar.gz
The above command downloads the original file 'latest.tar.gz' from the WordPress site and saves it as 'wordpress.tar.gz'.
Download to a specific directory
As discussed earlier, wget downloads the file to your present working directory. You can specify a different download location using the
-P flag followed by the path to the destination directory.
$ wget -P /var/www/html https://wordpress.org/latest.tar.gz
In the example above, the file is saved in the /var/www/html directory.
Limit Download Speed
Using wget, you can also limit the speed at which files are being downloaded. To put a cap on the download speed, invoke the
--limit-rate option followed by the download speed. The speed is usually measured in bytes per second.
To set the limit rate to 500KB/s run the following command:
$ wget --limit-rate=500k https://wordpress.org/latest.tar.gz
How to continue getting a partially-downloaded file
Sometimes during a file download, your connection can suddenly drop leaving you with a partially downloaded file. Instead of restarting the download, use the wget
-C option to resume your download as shown.
$ wget -c https://wordpress.org/latest.tar.gz
In the above example, we are resuming the file download of the tarball file from WordPress. Note that if the website does not respond to the resumption of file download, then the command will restart the download of the file and overwrite the existing one.
Download file in the background
To download a file in the background, pass the wget
-b option as shown below.
$ wget -b https://osdn.net/projects/manjaro/storage/kde/20.0.3/manjaro-kde-20.0.3-200606-linux56.iso
In the example above, we are downloading the Manjaro KDE Plasma ISO file in the background. To monitor the status of the download, use the wget log command as shown
$ tail –f wget –log
Download multiple files
If you have multiple files to download, it's cumbersome to run the wget command repetitively on the terminal. A better approach would be to run the wget command once to download the files one after the other.
To accomplish this, save the website URLs in a text file. Thereafter, invoke the wget command with the
-i option followed the by name of the text file.
In this example, we want to download the WordPress tarball file and OwnCloud zipped file. First, we have saved the download links to a sample file, let's call it 'downloads.txt':
Now, to download multiple files, run:
$ wget -i downloads.txt
These will be downloaded one after the other. And that's how you download multiple files.
Download a mirror of a website
Another not commonly known use of wget is the ability to clone or make a mirror copy of a website. You can achieve this by passing the
$ wget -m http://www.example.com
If you intend to clone a website for local browsing, pass extra arguments as shown.
$ wget -m -k -p http://www.example.com
-k flag facilitates the conversion fo the link into a format suitable for viewing locally. Meanwhile, the
-p option instructs wget utility to download relevant files needed for displaying content on an HTML page on a browser.
Ignore SSL certificate
To avoid running into an error when downloading a file over HTTPS from a web server whose SSL certificate is not trusted or valid, pass the
$ wget --no-check-certificate https://domain-with-untrusted-ssl-cert
Download and Redirect output to log file
The output of wget command can be logged during file download using the
-o option as shown:
$ wget -o download.log https://wordpress.org/latest.tar.gz
Thereafter, you can view the log file in real-time as the download progresses using the tail command as shown:
$ tail -f download.log
Limit download Retries
Sometimes, you may experience slow bandwidth speeds that may prevent you from downloading a file. You can set the number of download retries using wget
--tries=[no. of tries] option as shown:
$ wget --tries=10 https://wordpress.org/latest.tar.gz
In the example above, get will retry the download of the tarball file 10 times before exiting.
Download from FTP server
You can also download a file from an FTP server, even when it is password protected. All you need to do is to pass the ftp username and password in the wget command as shown:
$ wget --ftp-user=[ftp_user] --ftp-password=[ftp_password] ftp://ftp.ftp-site.com
Change wget User agent
A user agent is essentially a header field that is sent to a web server by a browser. If you are downloading a web page, wget will emulate a web browser. It so happens that at times, you get an error indicating that you have insufficient permissions to access the server. When this happens, chances are that the website or web server is blocking the browser that is associated with a particular 'User-Agent'. Here's a database of the user agents that you can find.
To change the user agent, use the syntax:
$ wget -U "user agent" URL
For example, to download the site example.com while emulating Google Chrome version 74 - which is the latest version, use:
$ wget -U "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36" "http://example.com"
To download a website for local viewing, you need to turn on recursive downloads with
--recursive option and the links are converted to local links with
$ wget -rk https://example.com
As you have observed, wget is such a powerful and flexible tool that you can use to grab files from the internet. In this tutorial, I have shown you various ways that you can use wget and how you can download files & webpages from web servers via http, https and ftp. To more information about the usage of the wget command, visit the man pages.