Hive offers 2 functions to work with URLS – parse_url and parse_url_tuple.
With both functions you can extract information like – PROTOCOL, HOST, PATH, QUERY, Query parameters etc.
Let’s see them in action.
Let’s, shall we?
One of the most common, and sometimes boring, task when working with datasets is writing some code to profile the data. Most data scientists will have built a set of tools/scripts to help them with this regular and slightly boring task. As with most IT tasks we should be trying to automate what we can, to allow us to spend more time on more important tasks, such as deriving insights and delivering value to the business, instead of repeatedly writing code to produce various statistics about the data and drawing pretty pictures.
I’ve written previously about automating and using some data profiling libraries to help us with this task. There are lots of packages available on pypi.og and on GitHub. Below I give examples of 5 Python Data Profiling libraries, with links to their GitHubs.
Brendan includes some good examples of libraries here so check it out.
Recently, we hosted Allision Kennedy on the Raw Data by P3 Adaptive Podcast. During the course of the conversation, our co-host Thomas LaRock expressed his frustration at the lack of a simple method to complete what should be a simple task in Power Query. In Tom’s example, he explained he wanted to replace a given value within the column names without individually renaming all of the columns. He pointed out that this has been possible in Office for 20+ years but requires learning some M to complete in Power Query due to the fact that column headers are not considered data.
It turns out, however, that there is an answer here. Read on for that answer.
I hit an error recently on a server that caused backups to fail. The database was backing up to a UNC path. Looking in the SQL Log file and Event Viewer, I found the following error:
The operating system returned the error ‘121(The semaphore timeout period has expired.)’ while attempting ‘DiskChangeFileSize’ on ‘\\uncpath\folder\databasename.bak’.
Read on to see what caused this error.
Paginated Reports have been available in Power BI since 2019. They serve an important purpose, but they are not easy for the average business user to learn, plus they require Power BI Premium to use. In my blog and video today, I will show you how you can use Excel as a substitute for Paginated Report Builder to build simple paginated reports from your Power BI Desktop data model.
Click through to see how.
When I’m working in a client’s Azure environment, and they don’t have a delete lock on their production environment I always work on getting them to have one.
This doesn’t always play nicely with everything in Azure, so read on for Denny’s advice when working with Azure Migrate.
Let’s talk about how to make something that’s already super exciting, even more fun, by using PowerShell. Why bother with fancy GUI’s and polished tools when you can do it the fun way?
Yes, there’s lots of good options now when it comes to logging, like structured logs, AWS CloudWatch, Azure Monitor, ELK, etc. Tools that give you a lot of power when it comes to filtering, alerts, and monitoring. However, I still often find myself digging through good ol’
*.logfiles on a server.
Read on for some good information about how to analyze a log file using nothing more than Powershell.