Analytics and Search Marketing Tips
Google Analytics Premium clients have the option to export clickstream (hit-level) data into Google BigQuery through a native integration. As we’ve seen in the previous post, clickstream data empowers analysts to answer much more complex (and valuable) business questions, namely by integration with other data sources (e.g. CRM).
Many technical teams are new to Google BigQuery and how to work with it. On top of that, the structure of how the Google Analytics (GA) Premium data is stored in BigQuery may not be obvious at first glance.
By following the instructions below, you’ll jump-start your knowledge and more quickly arrive at an ETL process that can bring this valuable clickstream data into your own data warehouse.
In one word: data enrichment – but here are three specific points:
1) Business Logic
Let’s say you have custom rules to determine a channel (combination of source, medium, campaign, etc) and you need to update the clickstream data with this new field. Google BigQuery doesn’t currently support UPDATE or DELETE operations. For some use cases, this can be problematic.
Analysis on Google Analytics BigQuery data can be a little cumbersome since data is scattered across multiple tables and is in a nested structure. While nesting is useful to express hierarchical data (reducing duplication) it can pose its own set of challenges to the average SQL user.
3) Data Integration
You may have your own data warehouse already and you would like to analyze the clickstream data using your own infrastructure. Furthermore, you may want to join this data with your other data sources (CRM, Email, Social, etc) to increase context for improved business outcomes.
Below, we’ll give you the keys to the castle to jump-start your journey towards exporting of clickstream data out of BigQuery and importing into your own data warehouse.
Google Analytics Premium can provide daily exports of clickstream data into Google BigQuery and stores it using a nested JSON format (to avoid duplication and save space). Take a look at the nested schema that Google uses.
To export, the first step is to create a SQL query to select all hits for a given day without nesting. One caveat though, Google Analytics clickstream data doesn’t have a column with a hit timestamp, we’ll need to create one by adding the hit time (hits.time) converted to seconds and adding the visitStartTime (visitStartTime+ hits.time/1000). This new column is named hit_timestamp.
Below is a snippet of the SQL query. You can find the complete SQL query on the BlastAM GitHub Repository.
The exported data is now in Google Storage:
You can now create an ETL job to automate any of the processes described above to keep your data warehouse data up-to-date.
Interested in archiving your Google Analytics clickstream data? Contact Us
Want to know more about web analytics with clickstream data? Subscribe to our RSS feed and follow us on Twitter.
Share this Post
Digital Analytics Blog
We're here to help with tips and insight on the following topics:
Optimize your website and marketing campaigns