Google Sheets

broken image


Let's assume you've stumbled on an actionable table on some website and want to scrape this useful tabular data into your spreadsheet for analysis. You may try to copy and paste it manually, but that's a layman's way. Google Sheets has a convenient function, IMPORTHTML, to do the job. It will import the table easily and refresh your data at regular intervals to keep it updated.

But how does it work? In this article, you'll learn how to use the IMPORTHTML function to fetch tables and lists from a web page easily. Sound interesting? Let's get started!

How to use IMPORTHTML formula in Google Sheets
How to use IMPORTHTML to import a portion of a range table data to Google Sheets
How to set a custom interval to automatically refresh IMPORTHTML in Google Sheets

Google Sheets is part of Google Workspace: where teams of any size can chat, create and collaborate. Google Workspace subscribers have access to additional Google Sheets features, including: – Easily add collaborators to projects, see changes as they occur, receive notifications for edits that happen while you're away and chat with. Google Sheets makes your data pop with colorful charts and graphs. Built-in formulas, pivot tables and conditional formatting options save time and simplify common spreadsheet tasks. Import API data into Google Sheets. Apipheny is a data analysis software and marketing automation tool that you can use to pull API data into Google Sheets from JSON and CSV sources. Make GET requests and POST requests in Google Sheets using a user-friendly app. The description of Google Sheets App. Create, edit, and collaborate on spreadsheets from your Android phone or tablet with the Google Sheets app. With Sheets, you can: - Share spreadsheets and collaborate in the same spreadsheet at the same time. Open, edit, and save Excel files. Google Sheets is part of Google Workspace: where teams of any. Open a spreadsheet in Google Sheets. Select the cell or cells you'd like to change. Click Data Data validation. To change the options listed, edit the items next to 'Criteria.' To delete a list, click Remove validation. If you change the contents of the range you've selected, the changes will be made in the list automatically.

How does the IMPORTHTML function work in Google Sheets?

The Google Sheets IMPORTHTML function looks for a specific HTML table or list and copies the data out of it. You can use it to scrape texts within a table or list. An HTML table is defined by the tag, while a list is defined by the

    (for unordered list) and
      (for ordered list) tags.

Google

How to use IMPORTHTML formula in Google Sheets

Before using the IMPORTHTML formula, let's understand its syntax.

  • URL — The URL of the page, including protocol (http:// or https://). Make sure to enclose the URL within double-quotes.
  • query_type — Use 'table' if you want to import a table, otherwise 'list' if you're going to import a list.
  • index — The index of the table or list on the web page. It starts at 1. A table with index = 1 means that it's the first table, index = 2 means that it's the second table, and so on.

How to get indexes of tables/lists to pull data from website to Google Sheets using IMPORTHTML

A page may contain one or more tables and/or lists. If you have no idea how to find out the indexes of tables on an HTML page, follow the steps below:

Step 1

Open your browser's Developer console. For most browsers on Windows, you can open the console by pressing F12. If you're using a Mac, use Cmd+Opt+J for Chrome, and Cmd+Opt+C for Safari. Note that, for Safari, you'll need to enable the 'Develop menu' first.

The exact look will depend on the version of Google Chrome you're using. It may change from time to time, but should be similar.

Step 2

Copy and paste the following code into the console to get indexes of all tables:

Postico 1 3 3 – a modern postgresql client tutorial. If you are looking for all lists' indexes instead, you need to get all elements with tag

    or
      . The following code may help you:

      Step 3

      Press Enter. You will see numbers that represent indexes shown in the results. Move your cursor over the elements in the result until the table/list you want to display is highlighted.

      As you can see in the screenshot above, the table highlighted has index = 6.

      How to import a table

      Let's see how we can import an HTML table. We will pull the latest currency exchange rates data from Yahoo! Finance's Currencies website to Google Sheets. The page only has one table, so we'll use 1 for the index value.

      Now, create a new blank Google spreadsheet and give it a name – for example, Currencies. Then, copy and paste the following formula into A1.

      Then, press Enter and wait until the entire table is populated in the spreadsheet.

      In the above image, we can see that the IMPORTHTML function successfully grabbed the latest currency rate data into Google Sheets.

      You may be interested in monitoring the exchange rate data. In that case, you may want to check our tutorial on how to build a currency exchange rate tracker in Google Sheets without coding.

      How to import a list

      You can import a list using the same method. The only change would be to replace the word 'table' with 'list' in the parameter. The following steps demonstrate how to pull data from a list containing programming languages starting with the letter 'C'.

      Create a new blank Google spreadsheet and give it a name. Then, copy and paste the following formula into C1:

      Press Enter and wait for the data to populate, as the following screenshot shows:

      Other options for scraping data into Google Sheets

      If you're looking for another method to retrieve data from different structure besides HTML tables and lists, here are some Google Sheets functions you may want to try:

      Many financial services share their data in JSON format through their APIs. If you need to scrape JSON data into Google Sheets without coding, we recommend using the JSON Client importer by Coupler.io. Coupler.io is an integration solution for importing data from different sources like Airtable, Shopify, HubSpot, WordPress, and many others.

      How to reference a cell in IMPORTHTML in Google Sheets

      You may want to put the URL and other params in cells, then refer to them when using the IMPORTHTML formula. In this case, you can change the params more easily by editing the cells' values.

      Here's an example:

      All params for URL, query, and index are put in B1, B2, and B3. Thus, you can easily write the IMPORTHTML formula as follows:

      Let's look at another example. Suppose you want to get the latest historical rates of the EUR/USD currency pair from this page:

      You can put the string EURUSD in a cell – for example, B1. In this case, if you want to fetch other currency data, you'll just need to change the value in B1. Here's an example of how to refer tothe B1 cell in the Google Sheets IMPORTHTML formula:

      Now, let's add the above formula into A3:

      If you want to pull historical data for AUD/USD, change B1‘s value to AUDUSD, and your data will refresh automatically.

      Tip: You can avoid typing B1 multiple times by using the SUBSTITUTE function. Here's what the updated formula looks like:

      How to use IMPORTHTML to import a portion of a range table data to Google Sheets

      Want to pull just a few columns? Or filter only rows with specific criteria? You can achieve these things by using the QUERY function in combination with IMPORTHTML.

      Importing specific columns

      Suppose you have a sheet with an IMPORTHTML function that pulls the latest EUR/USD rate data from a website to Google Sheets.

      Now, you only want to retrieve the Date and Close columns that are the 1st and 5th columns. To do that, you can combine your existing formula with the QUERY function — here's an example:

Sign

How to use IMPORTHTML formula in Google Sheets

Before using the IMPORTHTML formula, let's understand its syntax.

  • URL — The URL of the page, including protocol (http:// or https://). Make sure to enclose the URL within double-quotes.
  • query_type — Use 'table' if you want to import a table, otherwise 'list' if you're going to import a list.
  • index — The index of the table or list on the web page. It starts at 1. A table with index = 1 means that it's the first table, index = 2 means that it's the second table, and so on.

How to get indexes of tables/lists to pull data from website to Google Sheets using IMPORTHTML

A page may contain one or more tables and/or lists. If you have no idea how to find out the indexes of tables on an HTML page, follow the steps below:

Step 1

Open your browser's Developer console. For most browsers on Windows, you can open the console by pressing F12. If you're using a Mac, use Cmd+Opt+J for Chrome, and Cmd+Opt+C for Safari. Note that, for Safari, you'll need to enable the 'Develop menu' first.

The exact look will depend on the version of Google Chrome you're using. It may change from time to time, but should be similar.

Step 2

Copy and paste the following code into the console to get indexes of all tables:

Postico 1 3 3 – a modern postgresql client tutorial. If you are looking for all lists' indexes instead, you need to get all elements with tag

    or
      . The following code may help you:

      Step 3

      Press Enter. You will see numbers that represent indexes shown in the results. Move your cursor over the elements in the result until the table/list you want to display is highlighted.

      As you can see in the screenshot above, the table highlighted has index = 6.

      How to import a table

      Let's see how we can import an HTML table. We will pull the latest currency exchange rates data from Yahoo! Finance's Currencies website to Google Sheets. The page only has one table, so we'll use 1 for the index value.

      Now, create a new blank Google spreadsheet and give it a name – for example, Currencies. Then, copy and paste the following formula into A1.

      Then, press Enter and wait until the entire table is populated in the spreadsheet.

      In the above image, we can see that the IMPORTHTML function successfully grabbed the latest currency rate data into Google Sheets.

      You may be interested in monitoring the exchange rate data. In that case, you may want to check our tutorial on how to build a currency exchange rate tracker in Google Sheets without coding.

      How to import a list

      You can import a list using the same method. The only change would be to replace the word 'table' with 'list' in the parameter. The following steps demonstrate how to pull data from a list containing programming languages starting with the letter 'C'.

      Create a new blank Google spreadsheet and give it a name. Then, copy and paste the following formula into C1:

      Press Enter and wait for the data to populate, as the following screenshot shows:

      Other options for scraping data into Google Sheets

      If you're looking for another method to retrieve data from different structure besides HTML tables and lists, here are some Google Sheets functions you may want to try:

      Many financial services share their data in JSON format through their APIs. If you need to scrape JSON data into Google Sheets without coding, we recommend using the JSON Client importer by Coupler.io. Coupler.io is an integration solution for importing data from different sources like Airtable, Shopify, HubSpot, WordPress, and many others.

      How to reference a cell in IMPORTHTML in Google Sheets

      You may want to put the URL and other params in cells, then refer to them when using the IMPORTHTML formula. In this case, you can change the params more easily by editing the cells' values.

      Here's an example:

      All params for URL, query, and index are put in B1, B2, and B3. Thus, you can easily write the IMPORTHTML formula as follows:

      Let's look at another example. Suppose you want to get the latest historical rates of the EUR/USD currency pair from this page:

      You can put the string EURUSD in a cell – for example, B1. In this case, if you want to fetch other currency data, you'll just need to change the value in B1. Here's an example of how to refer tothe B1 cell in the Google Sheets IMPORTHTML formula:

      Now, let's add the above formula into A3:

      If you want to pull historical data for AUD/USD, change B1‘s value to AUDUSD, and your data will refresh automatically.

      Tip: You can avoid typing B1 multiple times by using the SUBSTITUTE function. Here's what the updated formula looks like:

      How to use IMPORTHTML to import a portion of a range table data to Google Sheets

      Want to pull just a few columns? Or filter only rows with specific criteria? You can achieve these things by using the QUERY function in combination with IMPORTHTML.

      Importing specific columns

      Suppose you have a sheet with an IMPORTHTML function that pulls the latest EUR/USD rate data from a website to Google Sheets.

      Now, you only want to retrieve the Date and Close columns that are the 1st and 5th columns. To do that, you can combine your existing formula with the QUERY function — here's an example:

      Google Sheets

      By defining 'SELECT Col1, Col5' in the QUERY function, you will get this result:

      Importing specific rows

      You can also retrieve specific rows. For example, here's how to add a filter to our previous formula to fetch just the data with Close values higher than 1.2250:

      Now, let's add one more filter to fetch just the top 3 highest rates. Here's the formula:

      How to set a custom interval to automatically refresh IMPORTHTML in Google Sheets

      By default, the Google Sheets IMPORTHTML refresh period is every 1 hour. However, you can speed up the refresh interval if you want. As the formula is recalculated when its arguments change, you can use this to force the refresh interval. The idea is to concatenate the original URL with a query string that changes periodically based on the time we set – for example, every 5 minutes. Here are the steps:

      First, add a query string in the original URL

      Google Sheets Login

      Suppose we have the following values in B1-B5. The IMPORTHTML function is defined in B5. Notice that a query string '?refresh=' & B4 is added to the original URL.

      The sheet looks as follows:

      We're not done yet. Let's continue to the next step.

      Next, use script and trigger to automate refresh

      We are going to refresh the value of B4 every 5 minutes using a script and trigger. As a result, the Google Sheets IMPORTHTML formula will also refresh at the same interval. Follow these instructions:

      Step 1. Go to the Script editor (either Tools > Script Editor or Extensions > App Script).

      Step 2. Copy and paste the following code in the Code.gs. Then, save your changes by pressing the Disk icon in the toolbar.

      Step 3. Open the Triggers menu on the left, then click the Add Trigger button.

      Step 4. Set a trigger for myFunction so that it runs every 5 minutes. Optionally, you can set the Failure notification settings to Notify me immediately so that you receive a notification immediately when an error occurs.

      Step 5. Click the Save button. If you are asked to authorize the script to access your data, grant the permission.

      Step 6. Run your script for the first time.

      Now, you'll be able to see the data on your sheet refresh every 5 minutes. Even when your Google Sheet is closed, it will continue to refresh.

      How many IMPORTHTMLs can Google Sheets handle?

      You can use the IMPORTHTML in a Google spreadsheet as many times as you want. Before, the limit was 50 per Google spreadsheet for external data, but Google removed this limitation in 2015. As Google Sheets is web-based, you may experience a drop in speed if you have lots of IMPORTHTML formulas in your spreadsheet especially if your internet connection is slow.

      How to pull non-public data from a website into Google Sheets using IMPORTHTML function

      You may want to pull data from a non-public URL on a website into Google Sheets. Unfortunately, you can't do that using the IMPORTHTML function. See the following screenshot, which shows what happens if you try scraping your LinkedIn network list.

      The formula only works if the page is publicly available and does not require you to log in to access the data. You'll get an error message #N/A Could not fetch url for accessing non-public URLs.

      What to do if IMPORTHTML formula suddenly not working in your Google Sheets

      If your formula suddenly stops working, we recommend you to check the following things:

      • Check for URL change. Although it's a rare case, it's possible that the page you scrape has been moved to another URL.
      • Check for protocol change. For example, the site you're scraping is now using https instead of http, but the auto-redirect to https is not set up yet by the website owner.
      • Check for index change. The table or list with index = 9 could have index = 8 now.

      If you still can't pull the data you want, then it could be that the website owner now blocks bots/crawlers from reading their web content. Check the website's robots.txt by navigating through /robots.txt.

      Back to Blog



broken image