Sandesh Shrestha: 2015

Thursday, December 24, 2015

Library Function Vs System Calls

Standard functions can be classified into two major categories:

1. Library function calls
2. System function calls

Library Functions	System Calls
They are part of standard C library and executes in user space. Eg: strcmp(), malloc()	These functions change the execution mode of the program from user mode to kernel mode. Eg: open(), read()
Library functions may or may not make system calls.	They are called by library functions and executed by kernel.
Library functions are portable i.e. an application using library functions will run on every system.	An application relying on system call may not run on every system as system call interface may vary from system to system.
Library functions can be debugged using a debugger.	System calls cannot be debugged as they are executed by kernel.

How does system calls switch modes ?

Traditionally, the mechanism of raising an interrupt of ‘int $0x80′ to kernel was used. After trapping the interrupt, kernel processes it and changes the execution mode from user to kernel mode. Today, the systenter/sysexit instructions are used for switching the execution mode.

Monday, November 9, 2015

insmod: ERROR: could not insert module hello.ko: Invalid parameters

I encountered this problem when I tried passing variables from command line using module_param for the first time.

This problem has a simple solution. The solution is that the assignment operator used to assign the value should not have any spaces before or after the operator.

I am using the following code :

#include <linux/module.h> /* Needed by all modules */

#include <linux/kernel.h> /* Needed for KERN_INFO */
#include <linux/moduleparam.h>

static int myvar = 1;
module_param(myvar, int , 0);
MODULE_PARM_DESC(myvar, "An integer");

static int __init init_hello(void)
{
printk(KERN_INFO "Hello world 1.\n");
printk(KERN_INFO "myvar is an integer: %d \n", myvar);
return 0;
}

static void __exit cleanup_hello(void)
{
printk(KERN_INFO "Goodbye world 1.\n");
}
module_init(init_hello);
module_exit(cleanup_hello);

MODULE_LICENSE("GPL");
MODULE_AUTHOR("Sandesh Shrestha");

After I compile using make, I used the following command to insert the module:

ubuntu@sandesh:~/cpractice[00:44]$ sudo insmod hello.ko myvar=10

Please note that there are no spaces before and after the assignment operator.

Sunday, October 4, 2015

Process priority in Linux

Linux uses the priority based scheduling algorithm. It ranks the processes based on their worth and need for processor time. The Linux kernel implements two separate priority ranges.

1. Nice Values:
It is a number between -20 and +19 with a default value of 0. Larger nice values correspond to a lower nice values i.e. it is nice to the other processes in the system. Thus, processes with lower nice values receive a greater proportion of the system processor as compared to processes with a higher nice value. In Mac OS X nice values control over the absolute time-slice. In Linux, it is the control over the proportion of the time-slice.

ps -el commmand gives the processes and their corresponding nice values.

The getpriority and setpriority functions gets and sets the nice values. (http://linux.die.net/man/2/setpriority)

Nice values can be set from the linux command line using the nice command. To set lower priority to a process to avoid slow down of other processes, we can use the following command.

$ nice -n 19 tar cvzf archive.tgz largefile

2. Real-time priority:
These priority values range from 0 to 99 inclusive with default value being 0. Unlike nice values, higher real-time priority values mean higher priority. All real time processes are at a higher real-time priority value than other processes.

The command below gives the list of processes and their real-time priority values under the column marked RTPRIO.

ps -eo state,uid,pid,ppid,rtprio,time,comm

A value of "-" means the process is not real time.

Friday, October 2, 2015

Types of Kernel

Kernels can be classified into the following categories:

1. Monolithic
In this type of architecture, all the basic system services like process and memory management, interrupt handling etc were packaged into a single module in kernel space.

Earlier version of this architecture had the following drawbacks:

a. The size of kernel was huge.
b. Bug fixing or addition of new features resulted in recompilation of whole kernel.

However, the modern architecture which Linux uses is much better and includes feature to dynamically load the modules thus facilitating easy extension of OS's capabilities.

Linux follows the monolithic modular approach.

2. Microkernels:
In this type of architecture, only the bare minimum is implemented in kernel space which includes managing memory protection, process scheduling and inter process communication (IPC) thus decreasing the kernel size and increasing security and stability of OS. Other basic services like device driver management,protocol stack, file systems, graphics, etc are implemented in user space and are run as servers. These servers are different from other user space programs as kernel grants them permissions to interact with physical memory which is usually off limits to most programs. Thus these serves can interact directly with the hardware. These services are started at system startup.

QNX follows the microkernel approach.

3. Hybrid Kernel:
Hybrid kernel is the most successful kernel implementation and is used in operating systems like Windows NT and above and Mac OS X. This type of kernel are extension of microkernel with some properties of monolithic kernel. Hybrid kernels are microkernel that has some non-essential code in kernel space in order for the code to run faster than it would have been in user space. Unlike monolithic kernel, they are unable to load modules at run time.

The hybrid kernel approach combines the speed and simpler design of a monolithic kernel with the modularity and execution safety of a microkernel.

Wednesday, September 23, 2015

Named Pipe(FIFO) and Unnamed Pipe(Pipe) : Differences

Pipes are synchronized way of passing information between processes.

Differences:

Unnamed Pipe(PIPE)	Named Pipe(FIFO)
They are unidirectional.	They can be bidirectional.
They can only be used by related processes (parent/child or child/child having the same parent.)	They can be used by any process (unrelated processes.)
They are opened at the time of creation only.	They are not open at the time of creation.
They exist as long as the file descriptors are open.	They exist as directory entries
They are created by using the function pipe().	They are created by using the function mkfifo().
Eg: int pipe(int fd[2]) fd[0] – descriptor used for reading fd[1] – descriptor used for writing	Eg: int mkfifo(char *path, mode_t mode)

Sunday, September 13, 2015

Character devices and Block devices

There are two main types of devices under Linux systems, block devices and character devices.

Character Devices	Block Devices
The devices that conveys data character by character between the device and the user program.	The devices that read/write data in fixed size blocks.
Eg: Mouse, Keyboard	Eg: USB, Hard Disk
No buffering is performed	Block devices are accessed through a buffer
The first character is ‘c’ in the output of ls -l cd to /dev and run the command ls -l	The first character is ‘b’ in the output of ls -l cd to /dev and run the command ls -l

Saturday, September 5, 2015

Understanding Recursion

Recursion is a method where the solution to the problem depends on solution to smaller instances of the same problem.

The most common example for recursion is factorial. Let us see this with an example.

#include <stdio.h>

int factorial(int n){
int F;
printf("I am calculating the factorial of %d: ",n);
if(n==0)
return 1;

F = n*factorial(n-1);
printf("The calculation of factorial is done for %d\n",F);
return F;
}

int main(){
int n,result;

printf("Enter the number to calculate the factorial \n");
scanf("%d",&n);

result = factorial(n);
printf("The factorial is %d \n",result);
}

Can you guess what will be the output of this program, how the print statements are printed ?

Let's assume the number for which the factorial is to be calculated is 5.

factorial function is called with value 5--> the first printf statement is printed-->.The recursive function is then called with the value 5.

The calculation of 5*factorial(4) is paused as it needs to calculate the factorial first ie factorial(4).

So, it goes to beginning of function and the second printf statement is then printed -->factorial is called again with value 4,

The calculation 4*factorial(3) is paused as factorial(3) needs to be calculated first.

So, it goes to beginning of function and the third printf statement is then printed -->factorial is called again with value 3,

The calculation 3*factorial(2) is paused as factorial(2) needs to be calculated first.

So, it goes to beginning of function and the fourth printf statement is then printed -->factorial is called again with value 2,

The calculation 2*factorial(1) is paused as factorial(1) needs to be calculated first.

So, it goes to beginning of function and the fifth printf statement is then printed -->factorial is called again with value 1,

The calculation 1*factorial(0) is paused as factorial(0) needs to be calculated first.

So, it goes to beginning of function and the sixth printf statement is then printed. At this point n=0 and the value 1 is returned to factorial(0).

Now the paused calculations take place as follows:

Since factorial(0) =1,
The fifth calculation:factorial(1) = 1*factorial(0) =1 and the print statement is printed with n=1
The fourth calculation:factorial(2) = 2* factorial(1)=2 and the print statement is printed with n=2
The third calculation:factorial(3) = 3*factorial(2) =6 and the print statement is printed with n=3
The second calculation:factorial(4) = 4*factorial(3) = 24 and the print statement is printed with n=4
The first calculation:factorial(5) = 5*factorial(4) =120 and the print statement is printed with n=5

Finally F =120 and the value is printed.

The final output looks like this:

Enter the number to calculate the factorial: 5
I am calculating the factorial of 5
I am calculating the factorial of 4
I am calculating the factorial of 3
I am calculating the factorial of 2
I am calculating the factorial of 1
I am calculating the factorial of 0
The calculation of factorial is done for 1
The calculation of factorial is done for 2
The calculation of factorial is done for 3
The calculation of factorial is done for 4
The calculation of factorial is done for 5
The factorial is 120

Saturday, August 29, 2015

C System Calls and FILE I/O library function

System calls are special types of functions available in some programming languages. They are used by programs to communicate directly with the operating system. The OS talks back to the program through the return value of the function. When a system call is made, the control is relinquished to the operating system to perform the system call and the program is blocked until the call has finished. We should always check the return value as this is the only method the operating system communicates with the program. All these functions are included under header file unistd.h, sys/stat.h and sys/types.h. The return value is an integer. Operation takes place on file descriptor.

There are more than 100 system calls implemented as a part of C library. A few of C system calls are:

1. close : This closes the file descriptor.
Function Definition : int close(int fildes);
2. dup : It provides and alias for the provided file descriptor.
Function Definition : int dup(int fildes);
3. dup2: It provides and alias for the provided file descriptor and then deletes the old file descriptor.
Function Definition : int dup2(int fildes);
4. fstat : It is used to determine information about a file based on its file descriptor.
Function Definition : int dup(int fildes, struct stat *buf); The second parameter stores the data about the file.
5. lseek : change the location of the read/write pointer of a file descriptor. The location can be set either in absolute or relative terms.
Function Definition: off_t lseek(int fildes, off_t offset, int whence);
whence :The method in which the offset is to be interpreted (relative, absolute, etc.).

Value	Meaning
SEEK_SET	Offset is to be measured in absolute terms.
SEEK_CUR	Offset is to be measured relative to the current location of the pointer.
SEEK_END	Offset is to be measured relative to the end of the file.

6. lstat : determine information about a file based on its filename.
Function Definition: int lstat(const char *path, struct stat *buf);
7. open : open a new file and obtain its file descriptor.
Function Definition:
int open(const char *path, int oflags);
int open(const char *path, int oflags, mode_t mode);
8. read : read data into a buffer.
Function Definition : size_t read(int fildes, void *buf, size_t nbytes);
9. stat : used to determine information about a file based on its file path.
Function Definition: int stat(const char *path, struct stat *buf);
10. write : used to write data out of a buffer.
Function Definition : size_t write(int fildes, const void *buf, size_t nbytes);

In contrast to these library function for system calls, there are other similar library function which are used for file I/O in C. The operation takes place on pointers of type FILE(i.e. streams) and an integer is returned They are:

1. fopen: opens the filename pointed to, by filename using the given mode.
Function Definition : FILE *fopen(const char *filename, const char *mode);
2. fprintf: sends formatted output to a stream.
Function Definition : int fprintf(FILE *stream, const char *format, ...)
3. fscanf: reads formatted input from a stream.
Function Definition: int fscanf(FILE *stream, const char *format, ...)
4. fputc: writes a character (an unsigned char) specified by the argument char to the specified stream and advances the position indicator for the stream.
Function Definition : int fputc(int char, FILE *stream)
5. fgetc: gets the next character (an unsigned char) from the specified stream and advances the position indicator for the stream.
Function Definition: int fgetc(FILE *stream)

Two other functions are used for I/O operations on binary files i.e. streams :

1. fread : reads data from the given stream into the array pointed to, by ptr.
Function Definition : size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream)
2. fwrite : writes data from the array pointed to, by ptr to the given stream.
size_t fwrite(const void *ptr, size_t size, size_t nmemb, FILE *stream)

Binary Files and Text Files

http://fileinfo.com/help/binary_vs_text_files

Tuesday, August 25, 2015

Dynamic Memory Allocation in C

Many programmers think that dynamic memory allocation is difficult to learn. I was one of them. However, after going through some tutorials I found it to be quite easy.

Why dynamic memory allocation ?

Allocating large data objects are compile time is not always practical if data objects are used infrequently and for a short time. So, these data objects are allocated at run time.

What is dynamic in dynamic memory allocation ?

It is called dynamic because the memory is allocated at run-time as opposed to compile time. One more important thing to note about dynamic memory allocation is that the memory is allocated in heap as opposed to stack.

How to implement dynamic memory allocation ?

C provides four different functions for dynamic memory allocation. They are:

1. malloc : The malloc() function returns a void pointer to the allocated memory or NULL if the
memory could not be allocated. You should always check the returned pointer for NULL. Please note that there is a big difference between void pointer and NULL pointer. Void pointer does not have type but it does have a valid pointer to an address. However, NULL pointer does not have any address.

Syntax : data_type *p = (data_type *) malloc(size_t);
(data_type *) is the cast. malloc returns a void pointer. So, it is important to type cast to correct pointer type.

Eg : int *data;
*data = (int*) malloc(sizeof(int));

The pointer returned by malloc() should be saved in a static variable, unless you are sure that the memory block will be freed before the pointer variable is discarded at the end of the block or the function.
You should always free a block of memory that has been allocated by malloc() when you are finished with it. If you rely on the operating system to free the block when your program ends, there may be insufficient memory to satisfy additional requests for memory allocation during the rest of the program’s run.
Avoid allocating small blocks (that is, less than 25 or 50 bytes) of memory. There is always some overhead when malloc() allocates memory—16 or more bytes are allocated in addition to the requested memory.
Avoid allocating small blocks (that is, less than 25 or 50 bytes) of memory. There is always some overhead when malloc() allocates memory—16 or more bytes are allocated in addition to the requested memory.

2. calloc : The calloc library function assigns memory but with two major differences.

a. Calloc takes two parameters, number of elements and size of element. The product of these elements determine the size of the memory block to allocate.

b. It initializes the memory it allocates to zero.

Syntax : data_type *p = (data_type *) calloc(n,size_t);

Eg : int *data;

*data = (int *) calloc(n, sizeof(int);

There is a limit to the number of bytes that can be allocated by malloc or calloc function at one time. That is 65,510 bytes in certain compiler.

3. free : This function returns allocated memory back to the operating system.The free() function is almost foolproof. The pointer array is initialized to NULL which is a safe value because if someone tries to free the already freed pointer, there is no error. Errors could occur, however, when you try to free memory that

Was not allocated with one of the memory allocation functions,
Has been released through a prior call to free() or a call to realloc(),
Is currently in use by another thread in a multi-threaded operating system,
Is not yours to free.

4. realloc : When we are not sure as to how much of memory is required for certain operation we can first assign a small amount of memory using calloc() or malloc() and then call realloc() to make the block larger.

Syntax : void *realloc(void *ptr, size_t size)

The pointer ptr points to memory block previously allocated with malloc, calloc or realloc. If is is NULL a new block of memory is allocated and the pointer to that block is returned.

size is the new size for the memory block, in bytes. If it is 0 and ptr points to an existing block of memory, the memory block pointed by ptr is deallocated and a NULL pointer is returned.

memset: This library function does not allocate or deallocate memory but initializes a block of memory to any value passed as it's parameter.

Syntax : void *memset(void *str, int c, size_t n)

Parameters

str -- This is a pointer to the block of memory to fill.
c -- This is the value to be set. The value is passed as an int, but the function fills the block of memory using the unsigned char conversion of this value.

n -- This is the number of bytes to be set to the value.

Sunday, August 23, 2015

Array of Structure

We can use structure to store the value of one particular object but if we want to store the value of 100 such objects we need array of structure.

For eg: If we want to store the value of details of book like author, no of pages and price. We can define such structure as follows:

struct bookinfo{

char[20] author;

int pages;

float price;

};

Now to define three similar structure we can define an array of the above defined structure as follows:

struct bookinfo record[3];

Lets's see this using a C program:

#include
struct bookinfo{
char author[20];
int pages;
float price;
};

int main(){
int i;
struct bookinfo record[3];
char suffix[3][3]={"st","nd","rd"};
for(i=0;i<3 i="" p=""> {
printf("Enter the author of first book: ");
scanf("%s",record[i].author);
printf("Enter the number of pages in the book: ");
scanf("%d",&record[i].pages);
printf("Enter the price of the book: ");
scanf("%f",&record[i].price);
}

for(i=0;i<3 i="" p=""> {

printf("\nThe author of %d%s book is: %s \n",i+1,suffix[i],record[i].author);
printf("The number of pages in %d%s book is: %d \n",i+1,suffix[i],record[i].pages);
printf("The price of %d%s book is: %f \n",i+1,suffix[i],record[i].price);
}
return 0;
}

OUTPUT:

Enter the author of first book: fasdf
Enter the number of pages in the book: 32
Enter the price of the book: 34
Enter the author of first book: phy
Enter the number of pages in the book: 45
Enter the price of the book: 56.5
Enter the author of first book: math
Enter the number of pages in the book: 54
Enter the price of the book: 45.7

The author of 1st book is: fasdf
The number of pages in 1st book is: 32
The price of 1st book is: 34.000000

The author of 2nd book is: phy
The number of pages in 2nd book is: 45
The price of 2nd book is: 56.500000

The author of 3rd book is: math
The number of pages in 3rd book is: 54
The price of 3rd book is: 45.700001

Tuesday, August 18, 2015

GCC Compilation Process

GCC compiles a C/C++ program into executable in 4 steps as shown in the diagram below. For example a: "gcc -o hello.exe hello.c" is carried out as follows:

Lets understand this with the a simple example.

#include <stdio.h>
int main(){
printf("This is an example of compilation process using gcc");
return 0;
}

1. Preprocessing: via the GNU C Preprocessor(cpp.exe) which includes the headers(#include) and expands the macros(#define). The command is cpp hello.exe > hello.i . Note here that the command has cpp not gcc as we are using cpp.exe which is the C preprocessor.

The resultant intermediate file "hello.i" contains the expanded source code. Click on the link below to see how the file looks like.

http://pastebin.com/YBr6yt1s

2. Compilation: The compiler compiles the pre-processed source code into an assembly code for a specified processor.

The command is : gcc -S hello.i

The -S option specifies the pre-processed source code, instead of object code.The resultant assembly is "hello.s".

3. Assembly : The assembler (as.exe) converts assembly code into machine code.

The command is : as -o hello.o hello.s

4. Linker: Finally the linker links the object code with the library code to produce an executable file.

ld -o hello.exe hello.o ...libraries...

We can see the detailed compilation process by enabling -v(verbose) option.

gcc -v hello.c -o hello.exe

Saturday, August 15, 2015

Function Pointers in C

Pointers are used to point to address of some variable. Pointers can be de-referenced to get the value of the data that the pointer points to. In addition to data, pointers can also be used to store the address of a function. Not only this, it can also be de-referenced to execute the function.

Lets understand with the following example:

#include <stdio.h>

int add(int x, int y){
return x+y;
}

int main(){

int c;
int (*p)(int,int);
p = add; // p = &add can also be used to assign the address of add function to p.
c= p(2,3); // c= (*p)(2,3) can also be used to de-reference the pointer p and run the function.
printf("%d\n",c);
}

The add function adds two integers x and y and returns the sum of two integers.

In function main, the function pointer is defined using the syntax : int (*p) (int, int).
This means that, the pointer variable p is the function pointer for a function with two integers as the input variable and an integer as the return value.

The syntax, p= add ; or p= &add; is used to store the address of add function in the pointer variable p.

The syntax, c = p(2,3) or c= *p(2,3) can be used to call the function add by de-referencing the pointer p and the return value is stored in c as c is an int variable.

Sunday, August 9, 2015

Windows Programming in C

The best tutorial that I have come across till now to get started with windows programming in C is:

http://www.winprog.org/tutorial/simple_window.html

The program compiles perfectly and the output is a basic window.
Although, we do not use C to do windows programming, it's good to understand what happens behind the scene when we create windows application using frameworks like .NET.

The code is as follows:

#include <windows.h>

const char g_szClassName[] = "myWindowClass";

// Step 4: the Window Procedure
LRESULT CALLBACK WndProc(HWND hwnd, UINT msg, WPARAM wParam, LPARAM lParam)
{
    switch(msg)
    {
        case WM_CLOSE:
            DestroyWindow(hwnd);
        break;
        case WM_DESTROY:
            PostQuitMessage(0);
        break;
        default:
            return DefWindowProc(hwnd, msg, wParam, lParam);
    }
    return 0;
}

int WINAPI WinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance,
    LPSTR lpCmdLine, int nCmdShow)
{
    WNDCLASSEX wc;
    HWND hwnd;
    MSG Msg;

    //Step 1: Registering the Window Class
    wc.cbSize        = sizeof(WNDCLASSEX);
    wc.style         = 0;
    wc.lpfnWndProc   = WndProc;
    wc.cbClsExtra    = 0;
    wc.cbWndExtra    = 0;
    wc.hInstance     = hInstance;
    wc.hIcon         = LoadIcon(NULL, IDI_APPLICATION);
    wc.hCursor       = LoadCursor(NULL, IDC_ARROW);
    wc.hbrBackground = (HBRUSH)(COLOR_WINDOW+1);
    wc.lpszMenuName  = NULL;
    wc.lpszClassName = g_szClassName;
    wc.hIconSm       = LoadIcon(NULL, IDI_APPLICATION);

    if(!RegisterClassEx(&wc))
    {
        MessageBox(NULL, "Window Registration Failed!", "Error!",
            MB_ICONEXCLAMATION | MB_OK);
        return 0;
    }

    // Step 2: Creating the Window
    hwnd = CreateWindowEx(
        WS_EX_CLIENTEDGE,
        g_szClassName,
        "The title of my window",
        WS_OVERLAPPEDWINDOW,
        CW_USEDEFAULT, CW_USEDEFAULT, 240, 120,
        NULL, NULL, hInstance, NULL);

    if(hwnd == NULL)
    {
        MessageBox(NULL, "Window Creation Failed!", "Error!",
            MB_ICONEXCLAMATION | MB_OK);
        return 0;
    }

    ShowWindow(hwnd, nCmdShow);
    UpdateWindow(hwnd);

    // Step 3: The Message Loop
    while(GetMessage(&Msg, NULL, 0, 0) > 0)
    {
        TranslateMessage(&Msg);
        DispatchMessage(&Msg);
    }
    return Msg.wParam;
}

Saturday, August 8, 2015

printf in C

Format:
int printf ( const char * format, ... );

Function:
This prints formatted output to stdout.
Writes the C string pointed by format to the standard output (stdout). If format includes format
specifiers (subsequences beginning with %), the additional arguments following format are formatted and inserted in the resulting string replacing their respective specifiers.

Return Value:
printf is a function and like any other function it has input parameters and a return value.
On success, the total number of characters written is returned. On failure, a negative number is returned. Thus the following C program makes perfect sense.

#include

int main(void); // Declare main() and the fact that this program doesn’t use any passed parameters
int main()
{

int i;
int nCount = 0; // Always initialize your auto variables
char szString[] = “We want to impress you %d\n”;

for (i = 0; i < 5; i++)
{
nCount += printf(szString, i + 1); //The return value of printf is 25 which is added to nCount
}
return (nCount); // Brackets around all return values
}

Output:

We want to impress you 1
The number of characters printed is 25
We want to impress you 2
The number of characters printed is 50
We want to impress you 3
The number of characters printed is 75
We want to impress you 4
The number of characters printed is 100
We want to impress you 5
The number of characters printed is 125
We want to impress you 6
The number of characters printed is 150

Scanset in C

scanf family functions support scanset specifiers which are represented by %[]. Inside scanset, we can specify single character or range of characters. While processing scanset, scanf will process only those characters which are part of scanset. We can define scanset by putting characters inside squre brackets. Please note that the scansets are case-sensitive.

For Eg:

If we want to get only capital letters from stdin we can use the following:

scanf("%[A-Z]s", str);

If we want to get only small letters from stdin we can use the following:

scanf("%[a-z]s", str);

If first character of scanset is ‘^’, then the specifier will stop reading after first occurrence of that character. For example, given below scanset will read all characters but stops after first occurrence of ‘o’

scanf("%[^o]s", str);

If we want to read a line from stdin into the buffer pointed to by s until either a terminating newline or EOF found.

scanf("%[^\n]s", str);

Thursday, August 6, 2015

How are integers stored in binary

In a C program, when we use the following statements:

int a = 12;

The operating system allocates 4 bytes of memory for a and stores 12 in binary format in those 4 bytes.

In binary 12 = 1100, which when expanded to store in 4 bytes will be as follows:
00000000 00000000 00000000 00001100.

Lets assume the memory address to store these 4 bytes are 1001,1002,1003,1004
The rightmost bit is called the least significant bit and the leftmost bit is the most significant bit. Each byte is stored in a different memory address. If we store the bytes with the least significant byte in the lowest memory address ie store 00001100 in 1001, this method of storing in memory is called little endian. On the other hand, if we store the most significant byte in the lowest memory address ie store 00000000 in 1001, it is called big endian.

Integers can be positive or negative. One way to store negative numbers in binary is to set the most significant bit to 1 to denote that it is a negative number. So -12 is stored as follows:

10000000 00000000 00000000 00001100.
So, we have only 31 bits for the numbers. Thus the largest integer that can be stored is 2^31-1= 65535 and the lowest integer that can be stored is -2^31-1= -65535

However, there are two major problems when we store integers like this.

1. Lets suppose we want to store values from -3 to +3. There are 7 numbers in this range.

0 - 000
1 - 001
2 - 010
3 - 011
-0 - 100
-1 - 101
-2 - 110
-3 - 111

Here we can see that 0 has two representations which can create a big confusion.

2. Arithmetic operations do not give the correct results.

If we add +1 and -1, it should output 0. Here,

001 + 101 = 110 which is -2 and that is wrong.

So, to avoid these we use 2's complement to store negative numbers. In 2's complement, to get the binary representation for a negative number, its positive conterpart is first complemented and 1 is added to the 1's complement.

The 2's complement of 1(001) is 111(110+1).

The above range can now be represented as follows:

0 - 000
1 - 001
2 - 010
3 - 011
-1 - 111
-2 - 110
-3 - 101
-4 - 100

Thus, we can see that both of the above problems are solved using the 2's complement. Also, we can store one more number as there is only 1 representation for 0. The range of number which can be represented using this method is -2^31 to 2^31-1.

Friday, June 26, 2015

Linux Kernel Module - Creating, Inserting and Removing Kernel Object

Consider the following C program stored in the file simple_module.c

#include <linux/init.h>
#include <linux/module.h>

int simple_module_init(void){
printk(KERN_ALERT "Inside the init function");
return 0;
}

void simple_module_exit(void){
printk(KERN_ALERT "Inside the exit function");
}

module_init(simple_module_init);
module_exit(simple_module_exit);
The above example is the simplest example of a kernel module. It has an init and an exit function. The init function is called when the module is initialized and the exit function is called when the module is removed.

To run this program we create a Makefile which is as follows:

obj-m := simple_module.o

To compile this function we run the following command,

"make -C /lib/modules/$(uname -r)/build M=$PWD modules", where C gives the directory of the build and M means the module, PWD means the reference files are in the present directory.

When this command is run, it creates a simple_module.ko file, ko = kernel object.

To load this module into the kernel, the command is :

sudo insmod simple_module.ko

We can see if the module is loaded using the following command:

lsmod

To remove the module from the kernel, we use the following command:

sudo rmmod simple_module

The log messages can be seen in /var/log/syslog or /var/log/messages.

Use the following command to see the tail of the file and follow the changes in file

tail -f /var/log/syslog

Thursday, May 14, 2015

EMC VPLEX : Extending VMWare Functionality Across Data Centers

VPLEX is a storage virtualization appliance. It sits between the storage arrays and hosts and virtualizes the presentation of storage arrays, including non-EMC arrays. Storage is then configured and presented to the host. It delivers data mobility and availability across arrays and sites. VPLEX is a unique virtual storage technology that enables mission critical applications to remain up and running during any of a variety of planned and unplanned downtime scenarios. VPLEX permits painless, nondisruptive data movement, taking technologies like VMware and other clusters that were built assuming a single storage instance and enabling them to function across arrays and across distance.

VPLEX key use cases comprise:

Continuous operations – VPLEX enables active/active data centers with zero downtime
Migration/tech refresh – VPLEX provides accelerated and nondisruptive migrations and technology refresh
Oracle RAC functionality – VPLEX extends Oracle Real Application Clusters (RAC) and other clusters over distance
VMware functionality – VPLEX extends VMware functionality across distance while enhancing availability
MetroPoint Topology – VPLEX with EMC RecoverPoint delivers a 3-site continuous protection and operational recovery solution

The EMC VPLEX family includes three models:

EMC VPLEX Local:

EMC VPLEX Local delivers availability and data mobility across arrays. VPLEX is a continuous availability and data mobility platform that enables mission-critical applications to remain up and running during a variety of planned and unplanned downtime scenarios.

EMC VPLEX Metro:

EMC VPLEX Metro delivers availability and data mobility across sites. VPLEX Metro with AccessAnywhere enables active-active, block level access to data between two sites within synchronous distances. Host application stability needs to be considered. It is recommended that depending on the application that consideration for Metro be =< 5ms latency. The combination of virtual storage with VPLEX Metro and virtual servers allows for the transparent movement of VM’s and storage across longer distances and improves utilization across heterogeneous arrays and multiple sites.

EMC VPLEX Geo:

EMC VPLEX Metro delivers availability and data mobility across sites. VPLEX Geo with AccessAnywhere enables active-active, block level access to data between two sites within asynchronous distances. Geo improves the cost efficiency of resources and power. It provides the same distributed device flexibility as Metro but extends the distance up to 50ms of network latency.

Monday, May 11, 2015

Overlay Networks

The idea of an "overlay network" is that some form of encapsulation is used to decouple a network service from the underlying infrastructure. Per-service state is restricted at the edge of the network and the underlying physical infrastructure of the core has little or no visibility of the actual services offered. This layering approach enables the core network to scale and evolve independently of the offered services.

The best example of this is the internet itself. Internet is an overlay network on top of a solid optical infrastructure. The underlying infrastructure is called the "underlay network" The majority of paths in the Internet are now formed over a DWDM infrastructure that creates a virtual topology between routers and utilizes several forms of switching to interconnect routers together. Also, the idea of MPLS L2/L3 VPNs is essentially an overlay network of services on top of an MPLS transport network. The label edge routers (LER) encapsulates every packet arriving from an enterprise site with two labels. A VPN label identifying essentially the enterprise context and a transport label that identifies how the packet should be forwarded through the core MPLS network. In this way, this is a double overlay.

One of the main advantages of overlays is that they provide the ability to rapidly and incrementally deploy new functions through edge-centric innovations. New solutions or applications can be organically added to the existing underlay infrastructure by adding intelligence to the edge nodes of the network. This is why overlays often emerge as solutions to address the requirements of specific applications over an existing basic infrastructure, where either the required functions are missing in the underlay network, or the cost of total infrastructure upgrade is prohibitive from an economic standpoint.

Saturday, May 9, 2015

Software Defined Storage Solution from Coho Data

For a long time, networking was defined by some distributed protocols like BGP, OSPF, MPLS, STP and so on. Each network device in the topology would run these protocols and collectively they made the internet work. They accomplished the miraculous job of connecting the plethora of devices that make up the internet. However, the amount of effort required to configure, troubleshoot and maintain these devices was enormous. Add to that the cost of upgrading these devices every few years. Collectively, these costs compelled the networking industry to come up with a solution to these problems.

SDN was introduced few decades back. The concept of separating the brain from the device was a radical idea which spread very fast across the networking industry. SDN introduced centralized control to the network. Hence, whole of the network can now be controlled from a single device. This centralized controller evaluates the entire topology and pushes down instructions to individual device thus making sure that each device is working as efficiently as possible. The SDN controller is also able to single-handedly track the resource utilization and respond to failure thus minimizing the down time.

SDN simplified networking to a great extent. However, storage which is complementary to networking was still implemented in the same old way at that time. Coho Data, which is based out of Sunnyvale California took the effort to redefine storage using the concept of software defined networking. It has introduced a control-centric-architecture to storage.

Here's a graphical representation of how the storage controller looks like:

SDSC (Software Defined Storage Controller) is the central decision making engine that runs within the Coho Cluster. It evaluates the system and makes decisions regarding two specific points of control data placement and connectivity. At any point, the SDSC can respond to change by either moving client connections or by migrating data. These two knobs turn out to be remarkably powerful tools in making the system perform well.

The strong aspect of this solution is its modular nature. Not only the storage device are completely new and innovative, innovation has also been done in the switching fabric thus facilitating the migration of data. The solution makes sure that performance is not degraded when the storage capacity scales.

Tiering in Coho Architecture:

Coho’s microarrays are directly responsible for implementing automatic tiering of data that is stored on them. Tiering happens in response to workload characteristics, but a simple characterization of what happens is that as the PCIe flash device fills up, the coldest data is written out to the lower tier. This is illustrated in the diagram below.

All new data writes go to NVMe flash. Effectively, this top tier of flash has the ability to act as an enormous write buffer, with the potential to absorb burst writes that are literally terabytes in size. Data in this top tier is stored sparsely at a variable block size.

As data in the top layer of flash ages and that layer fills, Coho’s operating environment (called Coast) actively migrates cold data to the lower tiers within the microarray. The policy for this demotion is device-specific: on our hybrid (HDD-backed) DataStore nodes, data is consolidated into linear 512K regions and written out as large chunks. On repeated access, or when analysis tells us that access is predictive of future re-access, disk-based data is “promoted,” or copied back into flash so that additional reads to the chunk are served faster.

Source :http://www.cohodata.com/blog/2015/03/18/software-defined-storage/

Wednesday, April 22, 2015

Thin and Thick Provisioning Storage

Thick Provisioning:

In this type of storage provisioning, an estimate is made about the storage requirements for a virtual machine for its entire life cycle. Then, a fixed amount of space is provisioned to its virtual disk in advance and have the entire space committed to the virtual disk. The virtual disk takes up the entire provisioned space.

Types of thick provisioning:

Thick Provision Lazy Zeroed:

A thick provisioned lazy zeroed VMDK is similar to the eager zeroed except that the zeroing operating is performed just before a write operation, not at creation. The space is still allocated to the VMDK so after creating a VMDK with this format the datastore will show that the space is no longer available, but there is the additional overhead of zeroing out at write time.

Thick Provision Eager Zeroed:

When a thick provisioned eager zeroed disk is created the maximum size of the disk is allocated to the VMDK and all of that space is zeroed out. The creation of this disk format takes a while to be created, this is because of the zeroing process.

Thin Provisioning:

Thin provisioned VMDK's do not allocated or zero out space when they are created but instead do it only at write time. When an 80GB VMDK is created that is thin provisioned, only a little bit of metadata is written to the datastore. The 80GB does not show up in the datastore as in use like it does with thick provisioned. Instead, only when data is actually written does it take up space for a thin provisioned VMDK. At write time space is allocated on the datastore, the metadata of the VMDK is updated, then the block or blocks are zeroed out, then the data is written. Because of all the overhead at write time thin provisioned VMDK's have the lowest performance of the three disk formats. This overhead though is very small and most environments will not notice it until they have very write intensive VMs.

Types of thin provisioning:

The difference in the two types of thin provisioning is not due to the difference in feature as in thick provisioning but due to the level at which provisioning is done. Both types of thin provisioning work in the same way as described above.

Virtual Disk Thin Provisioning:

In this type the provisioning is done at the virtual disk level. For a thin virtual disk, ESXi provisions the entire space required for the disk’s current and future activities, for example 40GB. However, the thin disk uses only as much storage space as the disk needs for its initial operations. As the disk requires more space, it can grow into its entire 40GB provisioned space.

Array Thin Provisioning:

In this type provisioning is done at the storage array level ie with LUNs. Space allocated as devices (volumes or LUNs) is created on the storage device, but the consumption of this space is only as required. Storage array thin provisioning requires ESXi 5 and a storage device with a fi rmware version that supports T10-based Storage APIs: Array Integration (Thin Provisioning).

When Storage APIs - Array Integration is used, the host can integrate with physical storage and become aware of underlying thin-provisioned LUNs and their space usage.

Using thin provision integration, host can perform these tasks:

Monitor the use of space on thin-provisioned LUNs to avoid running out of physical space. As your datastore grows or if you use Storage vMotion to migrate virtual machines to a thin-provisioned LUN, the host communicates with the LUN and warns you about breaches in physical space and about out-of-space conditions.

Inform the array about the datastore space that is freed when files are deleted or removed from the datastore by Storage vMotion. The array can then reclaim the freed blocks of space.

Both of these approaches allow for the overprovisioning of storage resources. This can be a powerful feature and can provide cost savings, but it must be used with caution. If a thin-provisioned storage device runs out of space, the results are never good. Because of this, monitoring is essential with both forms of thin provisioning.

Tuesday, April 21, 2015

Software Defined Storage

SDS is a class of storage solutions that can be used with commodity storage media and compute hardware; where storage media and compute hardware have no special intelligence embedded in them. All the intelligence of data management and access is provided by a software layer. The solution may provide some or all the feature of modern enterprise storage systems like scale up and out architecture, reliability and fault tolerance, high availability, unified storage management and provisioning, geographically distributed data center awareness and handling, disaster recovery, QoS, resource pooling, integration with existing storage infrastructure, etc. It may provide some or all data access methods like file, block and object.

A generic data flow in a SDS solution is explained in the figure below:

Source : http://thenewstack.io/understanding-software-defined-storage/

VMware defines the Software-defined Storage Architecture as follows:

SDS is a new approach to storage that enables a fundamentally more efficient operational model. We can accomplish this by:

Virtualizing the underlying hardware through the Virtual Data Plane
Automating storage operations across heterogeneous tiers through the Policy-Driven Control Plane

Virtual Data Plane

In the VMware SDS model, the data plane, responsible for storing data and applying data services (snapshots, replication, caching, and more, is virtualized by abstracting physical hardware resources and aggregating them into logical pools of capacity (virtual datastores) that can be flexibly consumed and managed. By making the virtual disk the fundamental unit of management for all storage operations in the virtual datastores, exact combinations of resources and data services can be configured and controlled independently for each VM.

The VMware implementation of the virtual data plane is delivered through:

Virtual SAN – for x-86 hyperconverged storage
vSphere Virtual Volumes – for external storage (SAN/NAS)

Policy-Driven Control Plane

In the VMware SDS model, the control plane acts as the bridge between applications and infrastructure, providing standardized management and automation across different tiers of storage. Through SDS, storage classes of service become logical entities controlled entirely by software and interpreted through policies. Policy-driven automation simplifies provisioning at scale, enables dynamic control over individual service levels for each VM and ensures compliance throughout the lifecycle of the application.

The policy-driven control plane is programmable via public APIs used to control policies via scripting and cloud automation tools, which in turn enable self-service consumption of storage for application tenants.

The VMware implementation of the policy-driven control plane is delivered through:

Storage Policy-Based Management – provides management over external storage (SAN/NAS) through vSphere Virtual Volumes and over x86 storage through Virtual SAN.

Nutanix which is another player in the field of Software-defined Storage follows a similar approach but the controller here is a seperate VM on top of hypervisor and requires Nutanix hardware to implement the approach.

You can read more on software defined storage in this ebook written by Scott Lowe

Coho Data which is based out of Sunnyvale, California uses a SDN enabled data stream switch to connect the VMs to storage implemented as MicroArray Nodes containing PCIe flash and hard drives.

Data Hypervisor Software on the MicroArray virtualizes storage hardware to create a high performance, bare metal object store that scales to support different application needs without static storage tiers.

Coho Data Architecture: http://www.cohodata.com/coho-scale-out-storage-architecture

Sunday, April 19, 2015

Storage Area Network

Challenges with Directly Attached Storage:

1. Storage remains isolated and underutilized.
2. Complexity in sharing storage resources across multiple servers.
3. High cost of managing information.
4. Challenges in scalability.

An effective information management system must provide:

1. Timely information to business users
2. Flexible and resilient storage infrastructure.

A storage area network(SAN) provides such a solution.

A storage area network is a high-speed, dedicated network designed to deliver block-level storage to computers that are not directly connected to the storage devices or drive arrays. The storage in a SAN is not owned by any server unlike DAS(Directly Attached Storage) but is accessible by all of the servers on the network.

Advantages of SAN:

Enables sharing of storage resources across multiple servers.
Centralizes storage and management
Meets increasing storage demands efficiently with better economics of scale.

SAN Classification:

Fibre Channel (FC) SAN: uses Fiber Channel protocol for communitcation.
IP SAN: uses IP-based protocolss for communication
Fibre Channel over Ethernet (FCoE) SAN: uses FCoE protocol for communication.

Understanding Fibre Channel:

High-speed network technology: Supports upto 16 Gbps

Highly Scalable : accomodates approximately 15 million devices.

Components of FC SAN:

Node (server and storage) ports: Provide physical interface for communicating with other nodes.

Exist on

- HBA in server

- Front-end adapters in storage

Each port has a transmit(Tx) link and a receive (Rx) link

Cables:

SAN implementation uses

- Optical fiber cables for long distances

- Copper cables for short distance

Two types of optical cables:

Single-mode: Carries single beam of light and carries signal upto 10 km

Multimode : Can carry multiple beams of light simultaneously. Used for short distance

Connectors:

Attached at the end of a cable

Enable swift connection and disconnection of the cable to and from a port

Commonly used connectors for fiber optic cables are:

Standard Connector(SC): Duplex connectors

Lucent Connector(LC) : Duplex connectors

Straight Tip(ST) : Patch panel connectors and Simplex connectors.

Interconnecting Devices:

Commenly used interconnecting devices in FC SAN are:

- Hubs, switches and directors

Hubs provide limited connectivity and scalability

Switches and directors are intelligent devices

- Switches are available with fixed port count or modular design

- Directors are always modular, and its port count can be increased by inserting additional 'line cards' or 'blades'.

- High-end switches and directors contain redundant components.

- Both switches and directors have management port to connect to SAN management servers.

SAN Management Software:

- A suite of tools used in a SAN to manage interfaces between host and storage arrays

- Provides intergrated management of SAN environment

- Enables web-based management using GUI or CLI

FC Interconnectivity Options:

Point-to-Point Connectivity:

Simplest FC configuration which enables direct connection between nodes.
Offers limited connectivity and scalability
Used in DAS environment

FC-AL Connectivity:

Provides shared loop to attached nodes: Nodes must arbitrate to gain control
Implemented using ring or star topology. May also use hub which uses star topology.
Limitations of FC-AL :

- Only one device can perform I/O operation at a time

- Uses 8 bit of the 24 bit fiber channel addressing.1 address is reserved to connect to FC switch port. Supports upto 126 nodes.

- Addition or removal of a node causes momentary pause in loop traffic

FC- SW Connectivity :

Creates a logical space(called fabric) in which all nodes communicate using switches. Interswitch links(ISL) enable switches to be connected.
Provides dedicated path between nodes.
Addition/removal of node does not affect traffic of other nodes.
Each port has unique 24 bit FC address.

Port Types in Switch Fabric:

Port provides physical interface to a device to connect to other devices. The types are:

N_port: or Node port is typically a host port of storage array switch.

E_port: or Extension port which is connected to E-port of other switch

F_port : or Fabric port is a port in switch which connects to N_port

G_port: or Generic port can work as F_port or E_port which is automatically done.

Fibre Channel Protocol (FCP) Overview:

Traditional technologies such as SCSI have limited scalability and distance
Network technologies provide greater scalability and distance but have high protocol overhead.
FCP provides benefits of both channel and network technologies
- High performance with low protocol overheads
- High scalability with long distance capability
Implements SCSI over FC network
Storage devices attached to SAN, appear as local storage devices to host operating system

Addressing in switched Fabric:

The server or disk array which has a HBA reports itself to the network using Fabric Login(FLogi). It advertises its NWWN(Node World Wide Name). The FC switch replies with the FC ID for that device. This functionality is similar to that of a DHCP.

A FC switch has a block of addresses assigned to it represented by its Domain ID.

Domain ID is a unique number provided to each switch in the fabric. Domain IDs can be statistically or dynamically configured. Since permission is required to assign domain ID, it never overlaps. One switch is elected as the principal switch. This is elected based on priority value and system WWN. The lowest one wins. No backup principal switch is elected unlike DR/BDR selection in normal switches. If the principal switch dies, new now is elected. The failover is fast.

- 239 addresses are available for domain ID.

Maximum possible number of node ports in a switched fabric:

- 239 domains * 256 areas * 256 ports = 15,663,104

In case of multiple switches, FSPF(Fabric Shortest Path First) is used for routing which uses Fabric IDs. Fiber channel routing table is checked for routing.

Address Resolution using Fiber Channel Name Server(FCNS):

FCNS has a list of PWWN and FC ID. This server is run by principal switch. FCNS database is distributed across all switches so there is no need of backup. As soon as the device gets reply from switch with the FC ID, the host will send PLogi message with PWWN and FC ID thus registering itself with principal switch. For address resolution, host will send query to FCNS with PWWN and FCNS replise with FC ID. Thus routing is based on FC ID which is a logical address. Thus fiber channel is a layer-3 protocol.

Pages