Blast Analytics and Marketing

Analytics Blog

Supporting Leaders to EVOLVE
Google Big Query
Category: Digital Analytics

How to Export Google Analytics BigQuery Clickstream Data into Your Data Warehouse

June 18, 2015

Google Analytics Premium clients have the option to export clickstream (hit-level) data into Google BigQuery through a native integration. As we’ve seen in the previous post, clickstream data empowers analysts to answer much more complex (and valuable) business questions, namely by integration with other data sources (e.g. CRM).

Many technical teams are new to Google BigQuery and how to work with it. On top of that, the structure of how the Google Analytics (GA) Premium data is stored in BigQuery may not be obvious at first glance.

By following the instructions below, you’ll jump-start your knowledge and more quickly arrive at an ETL process that can bring this valuable clickstream data into your own data warehouse.

Why export Google Analytics BigQuery clickstream data?

In one word: data enrichment – but here are three specific points:

1) Business Logic
Let’s say you have custom rules to determine a channel (combination of source, medium, campaign, etc) and you need to update the clickstream data with this new field. Google BigQuery doesn’t currently support UPDATE or DELETE operations. For some use cases, this can be problematic.

2) Analysis
Analysis on Google Analytics BigQuery data can be a little cumbersome since data is scattered across multiple tables and is in a nested structure. While nesting is useful to express hierarchical data (reducing duplication) it can pose its own set of challenges to the average SQL user.

3) Data Integration
You may have your own data warehouse already and you would like to analyze the clickstream data using your own infrastructure. Furthermore, you may want to join this data with your other data sources (CRM, Email, Social, etc) to increase context for improved business outcomes.

Diagram Export Google Analytics BigQuery Clickstream Data

Below, we’ll give you the keys to the castle to jump-start your journey towards exporting of clickstream data out of BigQuery and importing into your own data warehouse.

How to export Google Analytics clickstream data from BigQuery

Google Analytics Premium can provide daily exports of clickstream data into Google BigQuery and stores it using a nested JSON format (to avoid duplication and save space). Take a look at the nested schema that Google uses.

To export, the first step is to create a SQL query to select all hits for a given day without nesting. One caveat though, Google Analytics clickstream data doesn’t have a column with a hit timestamp, we’ll need to create one by adding the hit time (hits.time) converted to seconds and adding the visitStartTime (visitStartTime+ hits.time/1000). This new column is named hit_timestamp.

Below is a snippet of the SQL query. You can find the complete SQL query on the BlastAM GitHub Repository.

SQL query to export clickstream data from BigQuery

Step-by-Step Instructions

  1. Go to the Google Analytics BigQuery project and create a new dataset (in our example, we’ll call this ETL). This dataset will hold your clickstream data exports.
  2. Grab the SQL query from our GitHub.
  3. Edit the SQL query to tailor your Google Analytics implementation (customize the fields for dimensions, enhanced ecommerce, etc.) Don’t add more dimensions/metrics than the ones your current implementation is using.
  4. Paste your query into the query window.
  5. Select ‘Show Option’ and configure a destination table (in our example sessions_20150101). Select the dataset you created in ‘step 1’ and name the table appropriately. While you are testing you can add a ‘LIMIT’ to the SQL query to limit the number of returned rows.Export Google Analytics Bigquery Clickstream Data
  6. Once the query is finished, you will have your flattened clickstream data in the new table under the new dataset. You can append other tables to this table for analysis or export it to a CSV for storage or analysis in another platform.

Exporting Google Analytics Clickstream Data to Your Own Data Warehouse

  1. Create a bucket in Google Storage under the same project.
  2. Export the table to Google Storage and download it for analysis or load into your data warehouse. The file pattern will divide the file into multiples and compress them using gzip. This is handy for analysis with Hive (Hadoop). The format itself is a CSV.

Exporting Google Analytics Bigquery data to Google Storage

The exported data is now in Google Storage:
Google Analytics BigQuery data on Google Storage

You can now create an ETL job to automate any of the processes described above to keep your data warehouse data up-to-date.

Interested in archiving your Google Analytics clickstream data? Contact Us

Want to know more about web analytics with clickstream data? Subscribe to our RSS feed and follow us on Twitter.

Joao Correia
About the Author

Joao Correia is the Director of Data Insights at Blast Analytics & Marketing. His mission is to improve business performance through data, actionable insights and thoughtful strategy.

Joao Correia has written on the Blast Digital Customer Experience and Analytics Blog.