Web scraping is a powerful technique to extract data from websites, and you can perform basic web scraping tasks using Microsoft Excel.

Did you know that Excel was created by Microsoft in the 1980s and currently has over 1.1 billion users globally.

While being a common tool in many jobs, the capabilities of Excel is not know by many.

In this guide, we will walk through the process of scraping data from any website using Excel. In particular, we'll walk through the process of scraping the list of American presidents from Wikipedia using Excel's built-in features.

Here is how you can use excel to scrape data from any website

Step 1: Open Excel and navigate to the Data tab

  1. Launch Microsoft Excel
  2. Click on the "Data" tab in the ribbon at the top of the screen

Step 2: Use the "From Web" feature

  1. In the "Data" tab, click on "From Web" (you may find this under "Get & Transform Data" or "Get External Data" depending on your Excel version)
  2. A "New Web Query" window will open
Is it Legal to Use a Web Proxy Server in Canada and the United States?
Web proxy servers offer privacy and access to restricted content, but their legality varies. In Canada and the USA, using web proxies is legal as long as they’re not for illicit activities. It’s crucial to use these tools responsibly and in accordance with local laws to ensure digital safety.

Step 3: Enter the Wikipedia URL

  1. In the address bar of the "New Web Query" window, enter the following URL: https://en.wikipedia.org/wiki/List_of_presidents_of_the_United_State
  2. Click "Go" or press Enter

Note: If this is your first time connecting the web page to excel, you will get the following prompt.

Step 4: Select the table to import

  1. The webpage will load in the "New Web Query" window
  2. Here you will have a list of tables available, look for the table containing the list of presidents
  3. Once you select the desire table you will have a preview of it, this is a good time to check the quality of data and determine if it will require additional cleaning.

In our case, the data was fairly clean but we have to get rid of some columns. To do so select the 'Transform Data' option.

Else, you can select 'Load' and the data will appear in Excel.

Step 5: Clean the data with Power Query Editor

  1. After you click 'Transform Data', the following power query editor will open.
  2. You have the option to add and remove columns and rows, you can also rename the column header to your desired name. Here are more functions you can use with Power Query Editor.
Python HTML Parser Guide - DataHen
Dive into Python HTML parsing with BeautifulSoup & lxml. Install, parse HTML, extract tags, and perform basic tasks. Enhance your web scraping skills!

Step 6: Import the data

  1. Click "Import" at the bottom of the "New Web Query" window
  2. In the "Import Data" dialog box, choose where you want to place the data in your Excel workbook
  3. Click "OK"

Step 7: Clean and format the data

  1. Excel will import the table, but it may require some cleaning and formatting if you did not use the 'Transform Data'
  2. Remove any unnecessary columns or rows
  3. Adjust column widths as needed
  4. Format cells appropriately (e.g., dates, numbers)

Finally, save your Excel workbook with an appropriate name and location

What is great about this excel feature is that you can just refresh this data by clicking the "Refresh All' and you will get the latest data. This is particular useful if you are connected to the stock market data.

And there you have if, we go though how you can use excel to scrape a website with 7 simple and easy steps.

Additional Tips:

  1. If the table doesn't import correctly, try using Power Query (available in newer versions of Excel) for more advanced data transformation options
  2. Remember that web page structures can change, so you may need to adjust your approach if the Wikipedia page is updated
  3. Be mindful of Wikipedia's terms of use and any applicable copyright laws when using the scraped data

This method can be applied to other similar tables on various websites, making it a valuable skill for data collection and analysis.

While Excel's built-in web query feature is great, there are limitations that arise when you try to scale.

Key Differences Between Data Crawling and Data Scraping
What is Data Crawling? Data crawling is the process of collecting and gathering information from different data sources which may include web pages, databases, and other data repositories. Data Crawling is also called web crawling or web spidering [https://en.wikipedia.org/wiki/Web_crawler], these m…

Common Problems with Web Scraping Using Excel

While web scraping with Excel is a powerful and accessible tool, there are several challenges and limitations that users may encounter:

1. Blocking Due to Automation Detection

Many websites have measures in place to detect and block automated access to their content. When Excel (or any other automated tool) attempts to connect to a website repeatedly or at high speeds, the website might identify this behavior as automation and block access. This can prevent you from retrieving the data you need.

2. IP Address Restrictions

Websites can restrict access based on the IP address from which the requests are coming. If a website detects a high volume of requests from a single IP address, it may block that IP address temporarily or permanently. This is a common tactic to prevent scraping and to ensure that their servers are not overwhelmed by automated requests.

3. Limited Use of Power Query

Not all websites are compatible with Power Query in Excel. Websites like Amazon, Walmart, and Google Search have complex structures, dynamic content, and anti-scraping mechanisms that make it difficult or impossible to scrape data using Excel's Power Query. These sites often require more advanced web scraping techniques and tools that can handle JavaScript-rendered content and complex page structures.

If you are struggling with web scraping.

Let DataHen's expert  solutions streamline your data extraction process. Tailored for small to  medium businesses, our services empower data science teams to focus on insights, not data collection hurdles.

👉 Discover How DataHen Can Transform Your Data Journey!