Press "Enter" to skip to content

Day: September 27, 2019

HBase and S3

Krishna Maheshwari, et al, explain how we can allow Apache HBase to use S3 for storage:

Cloudera Data Platform (CDP) provides an out-of-the-box solution that allows Apache HBase deployments to use Amazon Simple Storage Service (S3) as its main persistence layer for saving table data. Amazon S3 is an object store which offers a high degree of durability with a pay-per-use cost structure. There is no server-side component to run or manage for S3 — all that is needed is the S3 client library and AWS credentials. However, HBase requires a consistent and atomic filesystem which means that it cannot directly use S3 because it is an eventually consistent object store. Both CDH and HDP have only provided HBase solely using HDFS because there have been long-standing impediments that prevented HBase from natively using S3. To address these issues, we’ve built an out-of-the-box solution which we are delivering for the first time via CDP. When you launch an Operational Database (HBase) cluster on CDP, HBase StoreFiles (the backing files for HBase tables) are stored in S3 and HBase write-ahead-logs (WAL) are stored in an HDFS instance run alongside HBase per usual.

I hadn’t thought of using S3, but it’s an interesting post.

Comments closed

Data Warehouse Concepts

Katrine Spirina takes us through several basic data warehousing concepts:

There are three basic types of modeling. Conceptual Data Model describes all entities a business needs information about. It provides facts about real-world things, customers, and other business-related objects and relations.

The goal of creating this data model is to synthesize and store all the data needed to gain an understanding of the whole business. The model is designed for the business audience.
Logical Data Model suits more in-depth data. It describes the structure of data elements, their attributes, and ways these elements interrelate. For instance, the model can be used to identify relationships between customers and products of interest for them. This model is characterized by a high level of clarity and accuracy.

Physical Data Model describes specific data and relationships needed for a particular case as well as the way data model is used in database implementation. It provides a wealth of meta-data and facilitates visualizing the structure of a database. Meta-data can involve accesses, limitations, indexes, and other features.

Click through for the whole story.

Comments closed

TensorFlow Changes

Ajit Jaokar summarizes key changes from TensorFlow from 1.x to 2.0:

The Data pipeline simplified:  TensorFlow2.0 has a separate module TensorFlow DataSets that can be used to operate with the model in more elegant way. Not only it has a large range of existing datasets, making your job of experimenting with a new architecture easier – it also has well defined way to add your data to it.
In TensorFlow 1.x for building a model we would first need to declare placeholders. These were the dummy variables which will later (in the session) used to feed data to the model. There were many built-in APIs for building the layers like tf.contrib, tf.layers and tf.keras, one could also build layers by defining the actual mathematical operations.
TensorFlow 2.0 you can build your model defining your own mathematical operations, as before you can use math module (tf.math) and linear algebra (tf.linalg) module. However, you can take advantage of the high level Keras API and tf.layers module. The important part is we do not need to define placeholders any more.

These look like some nice improvements.

Comments closed

Get Lock Details Against a Database

David Fowler has a new procedure:

Have you ever wanted a quick and easy way to see who was holding (and waiting on) locks on a particular database? Perhaps you’ve got some blocking issues going on and you want to see exactly which rows the row level locks were taken out on?

sp_LockDetails will return some handy information about the locks held on a specific database, including SPID, login, database, lock type and resource.

David includes the script in the post as well.

Comments closed

Azure Data Factory Pipeline Hierarchies

Paul Andrew explains the idea of pipeline hierarchies with respect to Azure Data Factory:

Next, even if the concept isn’t new, where I’d like to call out two big differences in my approach to orchestration with ADF comes from working within Microsoft Azure. The highly scalable cloud platform presents some new challenges that SSIS simply didn’t. For me these are:

– Needing to consider our wider solution and what things now cost. I’m fairly sure I’ve said it before. When working with ‘Pay-as-you-go’ services we need to think about designing for cost/consumption as well as all our other data transformation and output requirements. In Azure it is so easy to just leave resources running night and day, when only a short window of compute is needed.
– We need to consider the scale out capabilities of the other services that ADF is going to invoke. Or, to put it another way, how much parallel activity execution do we want ADF to achieve? As you may know the ADF ForEach activity by default allows us to execution inner activities in parallel, but is that enough?

It’s a very interesting idea; read the whole thing.

Comments closed

Listing Windows Users with Powershell

Jack Vamvas shows us how we can use Powershell to list Windows users in an Active Directory group:

Question: How can I use Powershell to list out Windows users? Are there Powershell cmdlets which can report on Windows users ?
Answer: There are “out of the box” Powershell cmdlets which will support the requirement . How you apply the powershell cmdlets will depend on how much detail is required.

Jack has a few examples here as well.

Comments closed

Troubles with Dropping Logins

Pamela Mooney takes us through a scenario involving dropping a user and login, and some of the difficulties which might arise:

I had to obscure a lot, but the bottom query results correlate to the top results.  The first line of the bottom query results show the grantor of the permissions, and the bottom line is the grantee.  In this case, a login was explicitly denied impersonation on a server role.  I’m using this example because it is really quirky to fix.  Most often, you’ll just reverse the permissions, using pretty standard syntax. Even easier, right click on the login, go to the “Securables” tab, and remove the permissions.  However, if you are a fan of the TSQL approach, this one is not so straightforward, so it’s a good one to show.  

Click through for a demonstration.

Comments closed