Data Lake Organization Tips

Melissa Coates has some great advice for people working with data lakes:

Q: Partitioning by date is common. Where should the dates go in the folder hierarchy?

Almost always, you will want the dates to be at the end of the folder path. This is because we often need to set security at specific folder levels (such as by subject area), but we rarely set up security based on time elements.

Optimal for folder security: \SubjectArea\DataSource\YYYY\MM\DD\FileData_YYYY_MM_DD.csv

Tedious for folder security: \YYYY\MM\DD\SubjectArea\DataSource\FileData_YYYY_MM_DD.csv

Click through for all of Melissa’s advice in FAQ form.

Name Conflicts In DAX

Kevin Feasel

2019-01-17

DAX, Naming

Marco Russo takes us through an issue with naming in DAX:

You deploy this model and start creating reports using the Sales Returning Customers measure. So far, so good. One day, you need to extend the data model importing a new table that you decide to name ReturningCustomers. As soon as you import the new table named ReturningCustomers, your measure Sales Returning Customers stops working. The reason is that the ReturningCustomers variable generates a name conflict with the table that has the same name, as you can see from the error message.

‘ReturningCustomers’ is a table name and cannot be used to define a variable.

Marco has some advice if you’re in a situation where you are liable to see this pop up.

Naming Conventions In SQL Server

Kevin Feasel

2019-01-14

Naming

Phil Factor explains naming requirements for SQL Server and gives suggestions for conventions to follow:

There are no generally accepted standards for naming SQL objects. Although ISO/IEC 11179 has been referred to as a standard for naming, it actually only sets a standard for defining naming conventions. There is a sample standard in the ‘Naming principles’ document (ISO/IEC 11179-5), but this is merely an example of how a standard should be defined. However, it is quite close to a general good-practice in programming.

When naming a table, it is a good idea to use a collective name or ‘object class term’ for the entity if one exists ( such as Employee, Cost, Tree, component, member, audience, staff or faculty) but use the singular rather than the plural form where possible. For the sake of maintenance, use a consistent naming convention that is informative but brief. It helps greatly to start with a dictionary of the correct nouns and verbs associated with the application domain and use that. If it proves inadequate, then the team can build on it. If a data model has been created as part of the design phase, this dictionary should be an end-product of this work.

As Phil notes at the end, consistency is the most important virtue here. It’s hard to work with a database where you have tables named Employees, employee_dates, and tblFiredEmployee.

Naming Is Hard

Kevin Feasel

2018-12-13

Naming

Tim Mitchell calls out some bad naming practices:

Spaces in object names. The bane of my existence. Most relational databases will allow you to use spaces in object names, requiring anyone accessing that object to put brackets around the name. My rule of thumb is that if it’s something that I will interact with programmatically, it doesn’t get a space in the name. Spaces in object names tend to break things, so please stop doing this.

My favorite naming convention oddity is the idea that if something has “Number” in the name, it’s never a number.

Finding Dependencies On Tables

Kevin Feasel

2018-07-09

Naming

Lori Brown shows us a few methods for finding references to tables:

I use sys.sql_modules to check for references to strings in stored procedures. You might think that this would not be necessary since you can find that in sys.dm_sql_referencing_entities for those. However, what if you do a lot of stuff in stored procedures that uses dynamic SQL to create objects or other stored procs on the fly? sys.dm_sql_referencing_entities might miss those. For example, when I look in sys.sql_modules for all occurrences of Sales.Orders I find more.

This is necessarily an incomplete list, but at least it gets you started.

Missing @@SERVERNAME On Linux

Steve Jones fixes a naming issue on his SQL on Linux installation:

I setup a new instance of SQL Server on Linux some time ago. At the time, the Linux machine didn’t have any Samba running, and no real “name” on the network. As a result, after installing SQL Server I got a NULL when running SELECT @@SERVERNAME.

The fix is easy. It’s what you’d do if you had the wrong name.

Read on for the command, and don’t forget to restart the database engine afterward.

Naming Indexes

Monica Rathbun hits one of my hobby horses:

As you can see from above, none of the names gave a complete indication of what the index encompassed. The first one did indicate it was a Non Clustered Index so that was good, but includes the date which to me is not needed. At least I knew it was not a Clustered Index. The second index did note it is a “Covering Index” which gave some indication that many columns could be included I also know it was created with the Data Tuning Advisor due to the dta prefix. The third index was also created with dta but it was left with the default dta naming convention. Like the first one I know the date it was created but instead of the word Cover2, I know there are 16 key columns are noted by the “K#” and regular numbers tell me those are included columns. However, I still have no idea exactly what these numbers denote without looking deeper into the index. The last index is closer to what I like however the name only tells me one column name when in fact it encompasses five key columns and two included columns.

I absolutely love seeing lots and lots of “_dta_” indexes; it’s a sign that I have a long day ahead of me.

Altering Procedures And Object Definitions

Kevin Feasel

2018-01-11

Naming

Solomon Rutzky shows what metadata doesn’t get updated when you call sp_rename on a T-SQL procedure, function, view, or trigger:

This behavior is noted in the Microsoft documentation for sp_rename:

Renaming a stored procedure, function, view, or trigger will not change the name of the corresponding object name in the definition column of the sys.sql_modules catalog view. Therefore, we recommend that sp_rename not be used to rename these object types. Instead, drop and re-create the object with its new name.

Ok, so we have all been warned, at least when it comes to using sp_rename. But that is not the end of the story. There is, indeed, another way to change the object such that the definition does not reflect its current state. And that other way has to do with something missing from the examples shown thus far, something that wouldn’t be missing had I been following best-practices.

Click through to see what else doesn’t get updated.

Generating Fake Company Names

Kevin Feasel

2017-09-19

Naming

Daniel Hutmacher has a great way of generating fake company names:

This query is actually a lot simpler than it first appears. Here’s how it breaks down:

  • Pick 100 words at random (table “a”)

  • For each word in “a”, if possible, pick a single random word (“b”) that doesn’t start or end with the same three letters as the “a” word.

  • For each word in “b”, if possible, pick a single random word (“c”) that doesn’t start or end with the same three letters as the “a” nor the “b” word.

  • The UPPER(), LEFT() and SUBSTRING() stuff is just to turn the names into title case.

  • As before, the ORDER BY NEWID() randomizes the order in which the TOP (1) row is returned.

My favorite name when running this was Disaster Votes, followed closely by Fail Users Vendor and Terminated Enterprise.  Apparently my SQL Server instance has a very negative impression of my made up companies’ leadership skills.

Names Matter

Kevin Feasel

2017-09-15

Naming

Chris Webb is concerned about the under-use of the name “Power Query”:

In more recent times I’ve written posts with unwieldy names like “Introduction to Insert Topic Name Here in Power Query/Power BI/Excel 2016 Get & Transform” and in the future I suppose this will have to grow to “Introduction to Insert Topic Name Here in Power Query/Power BI/Excel 2016 Get & Transform/Analysis Services Data Loading/Common Data Service”. Tagging and categorising blog posts can help here, I know, but it’s the title of a blog post that’s the main determining factor as to whether it gets read or not when someone is looking at a list of search results. It’s getting ridiculous, but how else can I ensure that someone searching for the solution to a data loading problem in Excel 2016 Get & Transform will find a post I’ve written that contains the answer but shown in Power BI?

Commenters agree that there’s a lot of ambiguity and multiple naming, and that this hurts end users who don’t necessarily know that all of these technologies are really the same Power Query engine.

Categories

February 2019
MTWTFSS
« Jan  
 123
45678910
11121314151617
18192021222324
25262728