Ingesting Google Analytics Data Into Kafka With Python

Kevin Feasel

2018-07-06

Hadoop

Bill Ward shows us an example using Python of reading Google Analytics data into a Kafka topic:

Google Analytics is a very powerful platform for monitoring your web site’s metrics including top pages, visitors, bounce rate, etc. As more and more businesses start using Big Data processes, the need to compile as much data as possible becomes more advantageous. The more data you have available, the more options you have to analyze that data and produce some very interesting results that can help you shape your business.

This article assumes that you already have a running Kafka cluster. If you don’t, then please follow my article Kafka Tutorial for Fast Data Architecture to get a cluster up and running. You will also need to have a topic already created for publishing the Google Analytics metrics to. The aforementioned article covers this procedure as well. I created a topic called admintome-ga-pages since we will be collection Google Analytics (ga) metrics about my blog’s pages.

In this article, I will walk you through how to pull metrics data from Google Analytics for your site then take that data and push it to your Kafka Cluster. In a later article, we will cover how to take that data that is consumed in Kafka and analyze it to give us some meaningful business data.

This is a good tutorial and I’m looking forward to part two.

Related Posts

Auto ML With SQL Server 2019 Big Data Clusters

Marco Inchiosa has a model scenario for using Big Data Clusters to scale out a machine learning problem: H2O provides popular open source software for data science and machine learning on big data, including Apache SparkTM integration. It provides two open source python AutoML classes: h2o.automl.H2OAutoML and pysparkling.ml.H2OAutoML. Both APIs use the same underlying algorithm implementations, […]

Read More

Erasure Coding In Hadoop

Guy Shilo explains erasure coding, a new feature in Hadoop 3: The benefits are, of course, space-saving, and for large files also improved performance (blocks striped across datanodes can be read in parallel, and less blocks are written because there is no x3 replication). The larger the file the more notable is the performance gain. […]

Read More

Categories

July 2018
MTWTFSS
« Jun Aug »
 1
2345678
9101112131415
16171819202122
23242526272829
3031