User-Defined Functions In Hive

Kevin Feasel

2016-06-17

Hadoop

Tim Spann talks about user-defined functions in Hive:

When you start using Hive you may miss some of the functions you are used to from Oracle, MySQL or elsewhere. Or you might just want a profanity filter. Whatever the case you can browse our list below for a large selection of UDF libraries. You can also use the pointers listed to write your own.

The Brickhouse Collection of UDFs from Klout includes functions for collapsing multiple rows into one, generating top K lists, a distributed cache, bloom counters, JSON functions, and HBase tools.

Coming from a SQL Server background, UDFs might be something you instinctively avoid (or at least that’s the case with me).  In practice, though, they’re a really good addition to the product.

Related Posts

Getting Started With Zeppelin

Sangeeta Gulia shows us how to get started building notebooks with Apache Zeppelin on top of Spark: There are 3 interpreter modes available in Zeppelin. 1) Shared Mode In Shared mode, a SparkContext and a Scala REPL is being shared among all interpreters in the group. So every Note will be sharing single SparkContext and single […]

Read More

How Per-Second AWS Billing Helps With Data Processing

Prakash Chockalingam explains how AWS per-second billing can make resource allocation easier: Because of the hourly increments in billing, users spend a lot of time playing a giant game of Tetris with their big data workloads — figuring out how to pack jobs to use every minute of the compute hour. Examples: If a job […]

Read More

Categories

June 2016
MTWTFSS
« May Jul »
 12345
6789101112
13141516171819
20212223242526
27282930