Analytics & Digital Marketing Tips

How to Export Google Analytics BigQuery Clickstream Data into Your Data Warehouse

How to Export Google Analytics BigQuery Clickstream Data into Your Data Warehouse

June 18, 2015           Analytics, Google Analytics

Google Analytics Premium clients have the option to export clickstream (hit-level) data into Google BigQuery through a native integration. As we’ve seen in the previous post, clickstream data empowers analysts to answer much more complex (and valuable) business questions, namely by integration with other data sources (e.g. CRM).

Many technical teams are new to Google BigQuery and how to work with it.  On top of that, the structure of how the Google Analytics (GA) Premium data is stored in BigQuery may not be obvious at first glance.

By following the instructions below, you’ll jump-start your knowledge and more quickly arrive at an ETL process that can bring this valuable clickstream data into your own data warehouse.

Why export Google Analytics BigQuery clickstream data?

In one word: data enrichment – but here are three specific points:

1) Business Logic
Let’s say you have custom rules to determine a channel (combination of source, medium, campaign, etc) and you need to update the clickstream data with this new field. Google BigQuery doesn’t currently support UPDATE or DELETE operations. For some use cases, this can be problematic.

2) Analysis
Analysis on Google Analytics BigQuery data can be a little cumbersome since data is scattered across multiple tables and is in a nested structure. While nesting is useful to express hierarchical data (reducing duplication) it can pose its own set of challenges to the average SQL user.

3) Data Integration
You may have your own data warehouse already and you would like to analyze the clickstream data using your own infrastructure. Furthermore, you may want to join this data with your other data sources (CRM, Email, Social, etc) to increase context for improved business outcomes.

Diagram Export Google Analytics BigQuery Clickstream Data

Below, we’ll give you the keys to the castle to jump-start your journey towards exporting of clickstream data out of BigQuery and importing into your own data warehouse.

How to export Google Analytics clickstream data from BigQuery

Google Analytics Premium can provide daily exports of clickstream data into Google BigQuery and stores it using a nested JSON format (to avoid duplication and save space). Take a look at the nested schema that Google uses.

To export, the first step is to create a SQL query to select all hits for a given day without nesting. One caveat though, Google Analytics clickstream data doesn’t have a column with a hit timestamp, we’ll need to create one by adding the hit time (hits.time) converted to seconds and adding the visitStartTime (visitStartTime+ hits.time/1000). This new column is named hit_timestamp.

Below is a snippet of the SQL query. You can find the complete SQL query on the BlastAM GitHub Repository.

SQL query to export clickstream data from BigQuery

Step-by-Step Instructions

  1. Go to the Google Analytics BigQuery project and create a new dataset (in our example, we’ll call this ETL). This dataset will hold your clickstream data exports.
  2. Grab the SQL query from our GitHub.
  3. Edit the SQL query to tailor your Google Analytics implementation (customize the fields for dimensions, enhanced ecommerce, etc.) Don’t add more dimensions/metrics than the ones your current implementation is using.
  4. Paste your query into the query window.
  5. Select ‘Show Option’ and configure a destination table (in our example sessions_20150101). Select the dataset you created in ‘step 1’ and name the table appropriately. While you are testing you can add a ‘LIMIT’ to the SQL query to limit the number of returned rows.Export Google Analytics Bigquery Clickstream Data
  6. Once the query is finished, you will have your flattened clickstream data in the new table under the new dataset. You can append other tables to this table for analysis or export it to a CSV for storage or analysis in another platform.

Exporting Google Analytics Clickstream Data to Your Own Data Warehouse

  1. Create a bucket in Google Storage under the same project.
  2. Export the table to Google Storage and download it for analysis or load into your data warehouse. The file pattern will divide the file into multiples and compress them using gzip. This is handy for analysis with Hive (Hadoop). The format itself is a CSV.

Exporting Google Analytics Bigquery data to Google Storage

The exported data is now in Google Storage:
Google Analytics BigQuery data on Google Storage

You can now create an ETL job to automate any of the processes described above to keep your data warehouse data up-to-date.

Interested in archiving your Google Analytics clickstream data? Contact Us

Want to know more about web analytics with clickstream data? Subscribe to our RSS feed and follow us on Twitter.

 

  • Hi Joao, will standard GA account + API(by R or ect ) could be a cheaper choice for the plan ?

  • Hi Derek,

    With a non-premium GA account we can’t export clickstream data, only a combination of up to 7 dimensions and 10 metrics (via API).
    If you want clickstream data without GA premium you do have a couple of options:
    http://www.clickstreamr.com
    – Snowplow: http://www.blastam.com/snowplow-consulting.aspx

  • Sandra Tender

    Hi João
    Nice work! Many thanks for sharing.
    Can I ask you how would you automate this in an ETL job. Can I use Bigquery or some other Google product?

  • Hi @sandratender:disqus,

    Sorry the delay in the response, I was on vacation. You will have to write code to interact with the BigQuery and Cloud Storage APIs, or use an ETL tool like Talend open source version.

    http://www.talend.com/download/talend-open-studio

  • Sébastien Brodeur

    What about sessions and users custom dimensions?

  • Greg Armstrong

    Hi @s_bastien_brodeur:disqus ,

    The best way to view hits at a session level is to combine the fullVisitorId (unique visitor ID) with the visitId (unique session ID). Like this:

    CONCAT(fullVisitorId,”-“,string(visitId)) as SessionId

    For custom dimensions (you can also refer to Joao’s SQL recipe, it’s close to the bottom), you essentially need to lookup the custom dimension index and return the variable. Like this:

    MAX(CASE WHEN hits.customDimensions.index = 7 THEN hits.customDimensions.value END) WITHIN RECORD AS customDimension7

 

Analytics Blog

We're here to help with tips and insight on the following topics:

Subscribe to RSS


Optimize your website and marketing campaigns

Get a constant flow of Google Analytics help and digital marketing tips, case studies and more from Google Certified Partner Blast Analytics & Marketing.




Connect with Blast Analytics & Marketing




 

4 Shares
Share2
Tweet
Share1
+11