Press "Enter" to skip to content

Month: May 2022

Importing Data into R

Sebastian Sauer shows off several ways of loading data into R:

Importing data into R can cause headaches for newbies. For some, the concept of relative and absolute paths is new. That’s why I compiled here some recommendations on how to important data into R and on how to ditch the “what’s my path” problem.

Click through for some notes. This post focused on files rather than databases, though that’s a very common way of loading data as well.

Comments closed

Negative Identity Reseeding

Andrea Allred hits identity integer absolute zero:

Recently we had a system go down because we ran out of integers. (Mental note to create an alert when we are getting near to running out of integers.) If we upped the column to a bigint we were going to have to drop all the indexes. This server doesn’t have the capacity needed to do an operation of that size and it was estimated that it would be down for 8 hours while we dropped indexes, upped the column type and added back the indexes. This was way too long as it was early in the work day (had it been evening, it would have been fine).

There was also a concern about how many stored procedures were expecting an int but would need to be modified to a bigint, along with any code. That was a big undertaking and we were in an emergency down.

This reseeding typically works well, though it’s important to know if the code will fail upon getting negative numbers.

Comments closed

Office Hours Text Version

Brent Ozar does some Q&A:

Q: WhatsUpDocs?: Hi Brent, have you ever needed to look at business documentation (check business rules/logic) when consulting or as an employee, but it was severely lacking? Recently joined a different team in work and trying to find simple answers to questions is an uphill struggle…

The vast, vast majority of companies don’t document their technology. The tech is in a constant state of flux, and it’s a miracle if the tech even works, let alone is documented accurately. If you’re the kind of person who needs accurate, up-to-date documentation on the tools you use, you’ll be happier working for very large, slow-moving companies with compliance needs. Think giant global financial corporations.

Click through for the full list.

Comments closed

Generating a Report for Known Issues in Power BI

Gilbert Quevauvilliers builds a report with daily alerts:

I was actually working on my own version of the Known Issues Power BI report, when the Power BI team released their blog post New Power BI Known Issue page | Microsoft Power BI Blog | Microsoft Power BI

In the steps below I am going to show you the report I created, and then how I configured it to alert me when there are updates to new known issues.

I also subscribed to this report on a daily basis, this allows me to quickly and easily be aware of any known issues in Power BI.

Read on to see how Gilbert did this and click through for a copy of the report for your own purposes.

Comments closed

Azure Shared Disk with Zone-Redundant Storage

Dave Bermingham runs some tests:

What makes this interesting is that you can now build shared storage based failover cluster instances that span Availability Zones (AZ).  With cluster nodes residing in different AZs, users can now qualify for the 99.99% availability SLA. Prior to support for ZRS, Azure Shared Disks only supported Locally Redundant Storage (LRS), limiting cluster deployments to a single AZ, leaving users susceptible to outages should an AZ go offline.

There are however a few limitations to be aware of when deploying an Azure Shared Disk with ZRS.

Dave also checks to see how their performance compares to locally-redundant storage.

Comments closed

Shipping Kafka Logs to Kibana via Filebeat

Shivani Sarthi uses Filebeat to perform log shipping:

To ship the Kafka logs, we will be using the filebeat agent. A filebeat agent is a lightweight shipper whose purpose is to forward and centralize the log data.

For filebeat to work, you need to install it as an agent on the desired servers. Filebeat then monitors the log files, collects the log events, and forwards them to the ElasticSearch or LogStash for indexing.

Click through for an Ansible script to install Filebeat, integrate with Kafka, and communicate with Logstash for eventual querying via Kibana.

Comments closed

Optimizing Hive Performance with Tez

Jay Desai has some recommendations around tuning Tez queries:

Tuning Hive on Tez queries can never be done in a one-size-fits-all approach. The performance on queries depends on the size of the data, file types, query design, and query patterns. During performance testing, evaluate and validate configuration parameters and any SQL modifications. It is advisable to make one change at a time during performance testing of the workload, and would be best to assess the impact of tuning changes in your development and QA environments before using them in production environments. Cloudera WXM can assist in evaluating the benefits of query changes during performance testing.

Click through for several configuration and query considerations.

Comments closed

Context Transition in DAX

Marco Russo and Alberto Ferrari explain the idea of context transition in DAX:

Let us state this from the very beginning: context transition is a simple concept. It is a powerful feature aiming to simplify the authoring of DAX code. That said, most new DAX developers find context transition hard to understand, and they consider it to be the major reason behind incorrect results. There are two reasons for developers to feel this way:

– A solid understanding of the difference between the row context and the filter context is an important prerequisite to understand and master the concept of context transition.

– You need to remember when and how the context transition works. Most errors involving context transition are due to the developer forgetting to take the context transition into account, rather than not knowing how it works. Once they realize that context transition is happening, the code suddenly makes sense.

Read on to understand more about the idea of context transition.

Comments closed

Thoughts on Code Obfuscation

Joy George Kunjikkur reminds us of worse times:

Long long ago I was given the special task of hiding code. Hiding code..what? Yes, we have to deliver code in such a way nobody should be able to reverse engineer.

I’ve run into problems around this in modern code as well. For example, using client-side React means that you aren’t going to be able to hide secrets like credentials, connection strings, etc. in a way that users absolutely won’t be able to see them. In the SQL Server world, some companies use encrypted stored procedures, which is a joke considering that you also need to ship the keys to decrypt those procedures, meaning that an enterprising user can get around your obfuscation attempt in moments.

Comments closed