Category Archives: Notes

All Articles

Database Concepts or Database Keywords

Data Mart – A single star schema, i.e. dimensions and a fact table that stores data at the detail level. A Data Mart covers a specific area of the business: billing, inventory, transactions, claims, etc.

Data Modeling – Performing analysis on the business processes and data to discover attributes of, and relationships between, data elements. An Entity Relationship Diagram (ERD) is the implementation of a data model. Another way to think of it is to discover the business rules of the data elements.

Data Warehouse – The data warehouse to me is more of a logical concept. All of your data marts (with, hopefully conformed dimensions) plus any ODS or active staging areas. By active, I mean updated in real-time by operational systems primarily in support of the warehouse. To me the warehouse is all of the systems that are required to answer the end users’ decision support questions across an enterprise. This is different than the Enterprise Data Warehouse (EDW) that is the granddaddy data source for all decision support. A data warehouse can reside in a single data or in many databases.

Dimension – A textual table in a star schema. The textual data describes a fact. Dimensions are the context of a measurement (fact). The dimensions included in the primary key of a fact table implement the grain of the fact table.

Dimensional Modeling – Data Modeling with the goal of extracting facts and defining descriptions of those facts from business systems. The relationships between data elements should be known before Dimensional Modeling starts (by previously performing Data Modeling to, at least, the entity level).

DML – Data Manipulation Language. Language used to add (insert), update, or delete records in a database table.

Fact – Measurement. In general, it is the numeric value that is acted upon by a user using a data mart.

Foreign Key – A foreign key (FK) is a link between two tables, i.e. a child or parent record. A foreign key is usually link to a primary key in a different table.

Index – A narrow table, usually containing a subset of a table’s columns, to speed access to a particular record or set of records. An example would be an index on a name field if you frequently search on the name column.

Natural Key – A natural key is a key that has meaning. An employee’s social security number might be used to uniquely identify employees in an HR system (I would NOT recommend this but I have seen it). That would be a natural key.

ODS – Operational Data Store. An ODS exists primarily when data quality is less than optimal. When an operational system cannot provide clean data, it can feed an ODS where the data is cleansed and/or transformed. An ODS is optional and is not a focal point of the warehouse. An ODS can have many sources of data and there can be many ODSs.

OLAP – Online Analytical Processing. Now called Business Intelligence (BI) for the most part. This includes data mining. OLAP used to be the queries that took hours to run but more and more OLAP is happening in real-time. OLAP is extracting historical, statistical, etc. information from the raw data.

OLTP – Online Transaction Processing. This used to be synonymous with real-time processing but now everything needs to happen in real-time. OLTP tends to be single, small statements from end point applications, i.e. POS, Entry Screens, embedded devices, log files, etc.

Oracle Designer – CASE tool developed by oracle. Designer is an end-to-end database design tool. I personally use it for its ERD functionality. At times I feel like even that is way too cumbersome.

Primary Key – A primary Key (PK) uniquely identifies a single row in a table.

Relational Database – RDBMS. A data store designed to store data in tables and columns with pre-defined relationships between tables. Primary and foreign keys define relationships.

Replication – Copying data from one database to another (Publisher and Subscriber Concept).

Schema – A set of related database objects. An HR schema would store personnel and payroll information; a GL schema would store the chart of accounts, etc.

Surrogate Key – A surrogate key is a sequential number that is meaningless. In oracle it’s usually created by a sequence.

.NET INTEGRATION WITH SQL SERVER 2005

 

 

• What are steps to load a .NET code in SQL SERVER 2005?
Write the managed code and compile it to a DLL/Assembly.
After the DLL is compiled using the “CREATE ASSEMBLY” command you can load the assemby into SQL SERVER. Below is the create command which is loading “mycode.dll” into SQL SERVER using the “CREATE ASSEMBLY” command
Syntax
CREATE ASSEMBLY AssemblyName FROM ‘C:/MyAssmbly.dll’

• How can we drop an essembly from SQL SERVER?
DROP ASSEMBLY AssemblyName

• Are changes made to assembly updated automatically in database?
No, it will not synchronize the code automatically. For that you have to drop the assembly (Using DROP ASSEMBLY) and create (Using the CREATE ASSEMBLY again)

• Why do we need to drop assembly for updateing changes?
When we load the assembly into SQL SERVER, it stores into the sys.assemblies system table. So any changes after that to the external DLL/ASSEMBLY will not reflect in SQL SERVER. So we have to DROP and CREATE assembly again in SQL SERVER.

• How to see assemblies loaded in SQL SERVER?
SELECT * FROM sys.assemblies_files

• If I want to see which files are linked with which assemblies?
Use sys.Assemblies_files system tables have the track about which files are associated with what assemblies.
SELECT * FROM sys.assemblies_files

• Does .NET CLR and SQL SERVER run in different process?
.NET CLR engine (hence all the .NET applications) and SQL SERVER run in the same process or address space. This “Same address space architecture” is implemeted so that there no speed issues. If the architecture was implemented the other way (i.e. SQL SERVER and .NET CLR engine running in different memory process area) there would have been reasonable speed issue.

• Does .NET controls SQL SERVER or is it vice-versa?
SQL SERVER controls the way .NET application will run. Normally .NET framework controls the way application should run. But in order that we have high stability and good security SQL SERVER will control the way .NET frame work works with SQL SERVER environment. So lot of things will be controlled through SQL SERVER example: threads, memory allocations, security etc.

SQL SERVER can control .NET framework by “Host Control” mechanism provided by .NET Framework 2.0. Using the “Host Control” framework external application’s can control the way memory management is done, thread allocation’s are done and lot more. SQL SERVER uses “host Control” mechanism exposed by .NET 2.0 and controls the framework.

• Is SQLCLR configured by default?
SQLCLR is not configured by default. If Developers want to use the CLR integration feature of SQL SERVER it has to be enabled by DBA (From the Surface Area Configuration).

• How to configigure CLR for SQL SERVER?
It’s a advanced option you will need to run the following query Analyzer

EXEC Sp_configure ‘show advaced option’, ‘1”; go reconfigure go
EXEC Sp_configure ‘clr enabled’, ‘1’ go reconfigure; go
EXEC Sp_configure ‘show advanced options’, ‘1’; reconfigure
EXEC sp_configure’ clr enabled’ ‘ ‘1’ reconfigure

• Is .NET feature loaded by default in SQL SERVER?
NO it will not be loaded, CLR is lazy loaded that means its only loaded when needed. It goes one step ahead. where the database Administrator has to turn the feature and using the “SP_Configure”.

NOTE: Loading .NET programming consumes some memory resources around 20 to 30 MB(it may vary depending on lot of situations). So if you really need .NET Integration then only go for this option.

• How does SQL Server control .NET at run-time?
.NET CLR exposes interfaces by which an external host can control the way .NET rum time runs. In previous versions of .NET it was done via COM interface “ICorRuntimeHost”.
In previous version you can only do the following with COM interface.

* Specify that whether its server or work station DLL
* Specify version of the CLR (e.g.version 1.1 or 2.0)
* Specify garbage collection behavior
* Specify whether or not jitted cod emay be shared across AppDomains.

In .NET 2.0 it’s done by “ICLRRuntimeHOST” But in .NET 2.0 you can do much above that was provided by the previous COM interface

* Exceptional conditions
* Code loading
* Class loading
* Security particulars
* Resource allocation

SQL Server uses the “ICLRRuntimeHOST” to control .NET run time as the flexibility provided by the interface is far beyond what is given by the previous .NET version, and that ‘s what exactly SQL Server needs, a full control of the .NET run time.

• What’s a “SAND BOX” in SQL SERVER 2005?
“Sand box is a safe place for running semi-trusted programs or scripts, often originationg from the a third party”

Now for SQL SERVER it’s .NET the external third party which is running and SQL SERVER has to be sure that .NET runtime crashes does not affect his working. So in order that SQL Server runs properly there are three sandboxes that user code can run:

Safe Access sandbox: This will be the favorite setting of DBA’s if they are compelled to run CLR- Safe access Safe means you have only access to in-proc data access functionalities. So you can create stored procedures, triggers, functions, data types, triggers etc. But you can not access memory, disk, create files etc. In short you ca not hang the SQL Server.

External access sandbox: In External access you can use some real cool features of .NET like accessing file systems outside box, you can leverage your classes etc. But here you are not allowed to play around with threading, memory allocation etc.

Unsafe access sandbox:In Unsafe access you have access to memory management, threading,etc. So here developers can write unreliable and unsafe code which destabilizes SQL Server. In the first two access levels of sand box its difficult to write unreliable and unsafe code.

Setup Replication MySQL

(for more database related articles)

MySQL replication allows you to have an exact copy of a database from a master server on another server (slave), and all updates to the database on the master server are immediately replicated to the database on the slave server so that both databases are in sync. This is not a backup policy because an accidentally issued DELETE command will also be carried out on the slave; but replication can help protect against hardware failures though.

Steps for setting up replication. The first step is to set up a user account to use only for replication. It’s best not to use an existing account for security reasons. To do this, enter an SQL statement like the following on the master server, logged in as root or a user that has GRANT OPTION privileges:

GRANT REPLICATION SLAVE, REPLICATION CLIENT
    ON *.*
    TO 'replicant'@'slave_host'
    IDENTIFIED BY 'newpassowrd';

In this SQL statement, the user account replicant is granted only what’s needed for replication. The user name can be almost anything. The host name (or IP address) is given in quotes. You have to enter this same statement on the slave server with the same user name and password, but with the master’s host name or IP address.

GRANT REPLICATION SLAVE, REPLICATION CLIENT
    ON *.*
    TO 'replicant'@'master_host'
    IDENTIFIED BY 'newpassowrd';

This way, if the master fails and will be down for a while, you could redirect users to the slave with DNS or by some other method. When the master is back up, you can then use replication to get it up to date by temporarily making it a slave to the former slave server.

Configuring the Servers

Once the replication user(replicant) is set up on both servers, we will need to add some lines to the MySQL configuration file on the master and on the slave server(my.cnf/my.ini). Depending on the type of operating system, the file will probably be called my.cnf or my.ini. On Unix-type systems, the configuration file is usually located in the /etc directory. On Windows systems, it’s usually located in c:\ or in c:\Windows. Using a text editor, add the following lines to the configuration file, under the [mysqld] group heading:

server-id = 1
log-bin = /var/log/mysql/bin.log

The server identification number is an arbitrary number to identify the master server. Almost any whole number is fine. A different one should be assigned to the slave server to keep them straight. The second line above instructs MySQL to perform binary logging to the path and file given. The actual path and file name is mostly up to you. Just be sure that the directory exists and the user mysql is the owner, or at least has permission to write to the directory. Also, for the file name use the suffix of “.log” as shown here. It will be replaced automatically with an index number (e.g., “.000001”) as new log files are created when the server is restarted or the logs are flushed.

For the slave server, we will need to add a few more lines to the configuration file. We’ll have to provide information on connecting to the master server, as well as more log file options. We would add lines similar to the following to the slave’s configuration file:

server-id = 2

master-host = masterservernameoripaddress master-port = 3306
master-user = replicant
master-password = newpassword

log-bin = /var/log/mysql/bin.log
log-bin-index = /var/log/mysql/log-bin.index
log-error = /var/log/mysql/error.log

relay-log = /var/log/mysql/relay.log
relay-log-info-file = /var/log/mysql/relay-log.info
relay-log-index = /var/log/mysql/relay-log.index

This may seem like a lot, but it’s pretty straightforward once you pick it apart. The first line is the identification number for the slave server. If you set up more than one slave server, give them each a different number. If you’re only using replication for backing up your data, though, you probably won’t need more than one slave server. The next set of lines provides information on the master server: the host name as shown here, or the IP address of the master may be given. Next, the port to use is given. Port 3306 is the default port for MySQL, but another could be used for performance or security considerations. The next two lines provide the user name and password for logging into the master server.

The last two stanzas above set up logging. The second to last stanza starts binary logging as we did on the master server, but this time on the slave. This is the log that can be used to allow the master and the slave to reverse roles, as mentioned earlier. The binary log index file (log-bin.index) is for recording the name of the current binary log file to use. As the server is restarted or the logs are flushed, the current log file changes and its name is recorded here. The log-error option establishes an error log. If you don’t already have this set up, you should, since it’s where any problems with replication will be recorded. The last stanza establishes the relay log and related files mentioned earlier. The relay log makes a copy of each entry in the master server’s binary log for performance’s sake, the relay-log-info-file option names the file where the slave’s position in the master’s binary log will be noted, and the relay log index file is for keeping track of the name of the current relay log file to use for replicating.

For the failover in Linux we can use the High Availability Services. We will talk about it later.

Data Types in SQL Server

(for more database related articles)

Explain Varchar and Nvarchar

VARCHAR and NVARCHAR data types are both character data types that are variable-length. Below is the summary of the differences between these 2 data types:

 

 

VARCHAR(n)

NVARCHAR(n)

Character Data Type

Non-Unicode Data

Unicode Data

Maximum Length

8,000

4,000

Character Size

1 byte

2 bytes

Storage Size

Actual Length (in bytes)

2 times Actual Length (in bytes)

 

We would use NVARCHAR data type for columns that store characters from more than one character set or when we will be using characters that require 2-byte characters, which are basically the Unicode characters such as the Japanese Kanji or Korean Hangul characters.

 

 Explain Varchar(MAX), Nvarchar(MAX) and Varbinary(MAX)

 MAX indicates that the maximum storage size is 2^31-1 bytes. The storage size is the actual length of data entered + 2 bytes. The data entered can be 0 characters in length.

We can also declare local variables of VARCHAR(MAX), NVARCHAR(MAX) and VARBINARY(MAX) data types. This is unlike the data types (TEXT, NTEXT and IMAGE). These enhanced data types are replacing because declaring local variables of TEXT, NTEXT and IMAGE data types is not allowed in the previous versions of SQL Server.

Explain XML Data type
We can create variables, parameters, and columns of the xml type. We can optionally associate a collection of XML schemas with a variable, parameter, or column of xml type and this type of xml instance is call typed and if we will not associate xml schemas with the XML Data Type than it is called untyped.