The Hadoop in Real World team shreds some URLs:
Hive offers 2 functions to work with URLS – parse_url and parse_url_tuple.
With both functions you can extract information like – PROTOCOL, HOST, PATH, QUERY, Query parameters etc.
Let’s see them in action.
Let’s, shall we?