Press "Enter" to skip to content

Day: January 30, 2019

Password Protect Everything, Including Hadoop

George Leopold summarizes a recent Securonix report:

The malware spreads via brute-force attacks on weak passwords “or by exploiting one of three vulnerabilities found on Hadoop YARN Resource Manager, Redis [in-memory key-value store service] and ActiveMQ,” Securonix said. Once logged into database services, the malware can for example delete existing databases stored on a server and create another with a ransom note specifying a bitcoin payment.

The security analyst recommends continuous review of cloud-based services like Hadoop and YARN instances and their exposure to the Internet. Along with strong passwords, companies should “restrict access whenever possible to reduce the potential attack surface.”

It’s pretty standard advice: secure your data, password-protect your systems, and minimize the number of computers that get to touch your computers.

Comments closed

Preparing Text Data For Natural Language Processing

Shirin Glander takes us through the process of preparing natural language data for machine learning using Keras:

As with any neural network, we need to convert our data into a numeric format; in Keras and TensorFlow we work with tensors. The IMDB example data from the keras package has been preprocessed to a list of integers, where every integer corresponds to a word arranged by descending word frequency.

So, how do we make it from raw text to such a list of integers? Luckily, Keras offers a few convenience functions that make our lives much easier.

This is a very nice tutorial if you’re new to the process.

Comments closed

Daylight Savings Time Calculations In Power BI

Fred Kaffenberger shows us how to convert UTC to local time zones with daylight savings time:

Quick tip for DST Refresh Date function Power BI Service. I’ll put the code up front, and explain it below. I’ll also say a bit about how to use it at the end. The United States and other places, like Australia, have a pesky thing called Daylight Savings Time. This means that in Central Time US, the offset from Universal Time Coordinated (UTC) is sometimes -6 and other times it’s -5. While Power Query can convert time zones, it doesn’t handle DST. And, my users like to see when the reports were refreshed as a step in evaluating data quality. In 2019, US DST is from March 10 – November 3 (2 AM local time). So, the functions here need to be updated every year.

As promised, here’s the custom function. 

Click through for the custom function and a nice explanation of how it works.

Comments closed

Reporting Services Scale-Out With Docker

Paul Stanton architects out a scenario using Windocks to create cloned Reporting Services containers in order to scale out Reporting Services:

Database cloning is a key aspect of the SSRS scale out architecture, with database clones providing each container a complete set of databases.  Two or more VMs operated behind a load balancer delivers a highly available and scalable reporting service.  This article focuses on Windows SQL Server containers and Windows Virtual Hard Drive (VHD) based cloning, but the same architecture can support SQL Server Linux containers or conventional instances (Windows or Linux).   Redgate SQL Clone, for example would support SQL Server instances.   Other options include the use of storage arrays instead of Windows VHD based clones.   The trade-offs between SQL containers and instances, and between VHDs and storage arrays are covered in separate sections below. 

The combination of SSRS containers with database cloning is appealing for simplicity and operational savings.  SSRS containers are also drawing interest as part of public cloud strategies, as SSRS containers can be integrated with AWS RDS or SQL Azure databases to provide a horizontally scalable reporting solution.

This is a bit more complex than Reporting Services scale-out with Enterprise Edition, but if you’re on Standard Edition and can’t use scale-out, it’s an interesting alternative.

Comments closed

QueryMemoryLimit In SSAS 2019

Shabnam Watson covers a new setting in Analysis Services 2019:

The purpose of this setting is limit the amount of memory any single query can take. This setting is extremely useful when you want to limit the amount of memory consumption per query for queries across the board. Before this setting, it was possible to have an extremely poorly written query eat up all of a server’s memory and bring all other queries down to a halt. You can see an example of a such a query and SSAS memory settings in my previous post here.

Read on for details about what it does and what happens when a query reaches the memory limit.

Comments closed

The Importance of Cardinality

Bert Wagner shows us why cardinality is important to understand when indexing data:

When building indexes for your queries, the order of your index key columns matters.  SQL Server can make the most effective use of an index if the data in that index is stored in the same order as what your query requires for a join, where predicate, grouping, or order by clause.

But if your query requires multiple key columns because of multiple predicates (eg. WHERE Color = ‘Red’ AND Size= ‘Medium’), what order should you define the columns in your index key column definition?

One of my favorite books for query tuning is a bit long in the tooth at this point but remains quite relevant, and a key point there is to look for ways to drop the largest percent of rows as soon as possible. This applies for good indexes as well: they’ll let you ignore as large a percentage of your irrelevant data as you can, as soon as possible.

Comments closed

Could Not Clear Differential Bitmap

Jack Vamvas takes us through a reason why you might get error 3041:

An error message has started appearing in the SQL Server Error Logs during a nightly full backup.

Could not clear ‘DIFFERENTIAL’ bitmap in database ‘Database1’ because of error 9002. As a result, the differential or bulk-logged bitmap overstates the amount of change that will occur with the next differential or log backup. This discrepancy might slow down later differential or log backup operations and cause the backup sets to be larger than necessary. Typically, the cause of this error is insufficient resources. Investigate the failure and resolve the cause. If the error occurred on a data backup, consider taking a data backup to create a new base for future differential backups.

Click through for the root cause and solution.

Comments closed