Press "Enter" to skip to content

Log Tokenization and Reduction in Azure Data Explorer

Brian Bønk tries out some new functions:

Before the release described below – the ADX service had a good handfull of features to help with anomaly detection and clustering on semi structured data.

With the functions like basket() and autocluster() the service can find patterns based on common values across the columns. The problem with these functions, is that they are not able to parse free text columns and extract tokens and repeatable patterns.

Yes, you could use the diffpatterns_text() function – but that is not strong enough to cover real diversity of free text log records.

It’s interesting that the end result is looking for log entries whose shape differs from the norm. That’s a clever approach to log file analysis.