R Internals: Data Sizes With Nullable Columns

Niels Berglund digs into the Binary Exchange Langage (BXL) and notices something weird about data sizes:

When looking at the data sent, the size of the packages and “drilling” into the TCP packets we could deduct that: :

  • Each column has an over-head of 32 bytes (at least for non nullable data)

  • The size of the column in one row is the size of the data type for numeric types.

  • For decimal and numeric an extra byte is added to each column, where this byte indicates the precision.

  • Columns of alpha numeric type all had 2 bytes pre-pended to the bytes, except max types.

  • For char and nchar the storage size was 2 bytes plus the size the column was defined as.

  • For varchar and nvarchar the storage size was 2 bytes plus the size of the data stored.

  • For the varmax data types the number of bytes that were pre-pended varied dependent on the data size.

Read the whole thing.

Related Posts

Using DALEX To Explain Black-Box Models

Przemyslaw Biecek explains that there’s more than LIME for explaining black-box models: I’ve heard about a number of consulting companies, that decided to use simple linear model instead of a black box model with higher performance, because ,,client wants to understand factors that drive the prediction’’. And usually the discussion goes as following: ,,We have tried LIME […]

Read More

Comparing Keras In Python Versus R

Dmitry Kisler performs image classification using Keras in both Python and R: From the plots above, one can see that: the accuracy of your model doesn’t depend on the language you use to build and train it (the plot shows only train accuracy, but the model doesn’t have high variance and the bias accuracy is […]

Read More

Categories

November 2017
MTWTFSS
« Oct Dec »
 12345
6789101112
13141516171819
20212223242526
27282930