Debugging Techniques

Finding bug is a process of confirming the many things you believe are true, until you find the one, which is not true.
Below are some of the things, which developers believe they are true

When the function is called, it receives its parameters correctly.

At a certain line or function in your source file, a certain variable has a certain value.

In a given if-then-else statement, the “else” part is the one that is executed.

Divide and Conquer

The simple technique for debugging is monitoring, write printf statements in the source code at which you know the data is good and where it becomes corrupted.
As we do not know where the data is good and bad, usually ad hoc addition of lots of printf statements are added to track the control flow and data values during the execution of code. After adding printf statements, run the program and keep adding printf statements until you isolate the area in the program that contains the wrong data. Then concentrate on why and how this data gets modified.
In this way, you can start zeroing in on the section of code that contains the error.

Debug Macros

The preprocessor conditional compilation directives can be used as switch to include or not to include the debug messages.

For example:

#ifdef DEBUG
    printf(“sum %d\n”, sum);
#endif

The program can be compiled with DEBUG undefined for normal use;

you can define it when debugging is needed as follows

#define DEBUG
The following program prints the debug messages

#define DEBUG
int main()
{
    char *iter = (char*) malloc (100);
    strcpy(iter, “This is a simple program”);
    int sum;
    for (int i=0; i <100; i++) {
        #ifdef DEBUG
            printf(“char is %c and sum is %d\n”, *iter, sum);
        #endif
        sum = sum + *iter;
    }
    return 0;
}

You shall redirect the output to a file what would normally go to the screen. For example:

$test >tmp.out

runs the program test with debugging and sends the output to the file tmp.out.

You can use your text editor’s search capabilities to look for the information you want to find.

For large programs which has millions of code, the above mechanism will dump many millions of debug messages which makes the bug finding process difficult. We can use multiple debug level to tackle this issue. Based on the code size of the software, you shall define many debugging levels. Lower debugging levels produce less messages and higher the level, more messages it produces. When debugging an issue, start with lower level and then if you cannot figure out the cause, move to the next level and so on.

The following set of macros defines three debugging levels, DLEVEL3, DLEVEL2 and DLEVEL1.

#ifdef DLEVEL3
   #define DLEVEL2
#endif
#ifdef DLEVEL2
    #define DLEVEL1
#endif

If you software constitutes many modules, then module specific debug messages can be enabled to see only messages related to that module.

Run Time Tracing

The above techniques require source code access to identify the problem. Run time tracing allows the debugger to identify the problem on executable. This technique shall also be used to understand the code flow of the program. There are tools which traces all system calls or library routines.

strace utility provides the capability to trace the execution of an application from the perspective of system calls. Along with the system call routine, it also displays their arguments, return value, etc

Linux permits to access 4 GB virtual address space for a process and that 4 GB virtual space is divided into two parts such as user space ( 3 GB) and kernel space (1 GB). When we run any application program along with library , they allocates memory from user space . Here in the following example printf is a library function and internally uses a standard library libc.so. The function printf writes data “Hello” into monitor , but how it happens.

Ex.

void main()

{

printf(“Hello”);

}

Operating system uses system call to copy the data from user space to kernel space and write into device file which is associated with a device ( example stdout for monitor) . The driver known as modules works in kernel space which reads the data from a device file and write into the port .

To list out the system call maintained by o/s there is a development tool called strace .Ittrace the system calls and signals . It intercepts and records the system calls made by a running process. strace can print a record of each system call, its arguments, and its return value. strace is a system call tracer, i.e. a debugging tool which prints out a trace of all the system calls .

$strace ./singo

output:

open(“/etc/ld.so.cache”, O_RDONLY) = 3

fstat64(3, {st_mode=S_IFREG|0644, st_size=111023, …}) = 0

mmap2(NULL, 111023, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7f91000

close(3) = 0

open(“/lib/libc.so.6”, O_RDONLY) = 3

read(3, “\177ELF\1\1\1\3\3\1\360\3247004″…, 512) = 512

write(1, “Hello”, 5Hello) = 5

Each line starts with a system call name, is followed by its arguments in the parenthesis & then has return value at the end of the line. The important system call made in the above output are open,close,read,write. The last line write(1, “Hello”, 5Hello) = 5 line corresponding to the system call that cause Hello to be printed on the screen. The first argument -1 is the file descriptor of the file i.e. stdout to write to. So it writes the message on the terminal instead of writing in some other file on disk. The second argument tells the data is a string & third argument display the total number of character in the string.

$ strace -c ./singo

Hello% time seconds usecs/call calls errors syscall

—— ———– ———– ——— ——— —————-

nan 0.000000 0 1 read

nan 0.000000 0 1 write

nan 0.000000 0 2 open

nan 0.000000 0 2 close

nan 0.000000 0 1 execve

nan 0.000000 0 1 1 access

nan 0.000000 0 1 brk

nan 0.000000 0 1 munmap

nan 0.000000 0 2 mprotect

nan 0.000000 0 7 mmap2

nan 0.000000 0 3 fstat64

nan 0.000000 0 1 set_thread_area

—— ———– ———– ——— ——— —————-

100.00 0.000000 23 1 total

From the above output, it is evident that write system call is used 1time. This is initiated by printf() statement on stdout file descriptor.

One another important option of strace is -tt, it causes strace to print out the time at which each call finished.

ltrace can be used to monitor user library calls. Most of the option present in strace is also present in ltrace. For the complete set of options run `man ltrace`.

Interactive Debugging

Interactive debuggers give you the ability to stop the program at any point, examine and change variables, and continue the program to identify the code which changes the behavior and makes the program buggy. The GNU Debugger (also known as gdb) is an interactive debugging tool which allows you to see what is going on inside the running program while it executes.

Introspection

Linux provides lot of information and tools to investigate the running process (your executable). It stores information about each of the process running in the system at /proc file system.
This directory has lot of subdirectories each denoting the process ID of the running process in the system.

$ls /proc
1     204   2481  2927   30479  399   9            kallsyms    swaps
10    2040  2518  2996   30535  4     acpi         kcore       sys
11    2047  2526  3      3094   400   asound       keys        sysrq-trigger
1477  205   2538  3001   3096   404   buddyinfo    key-users   sysvipc
1478  2092  2582  3002   3102   411   bus          kmsg        timer_list
1502  2173  2654  3003   3103   418   cmdline      loadavg     timer_stats
1653  2207  2690  3004   3114   429   cpuinfo      locks       tty
1654  2226  2711  3005   3115   457   crypto       mdstat      uptime
167   2237  2733  3115  3137   494   devices      meminfo     version
168   2249  2766  5218  3163   5     diskstats    misc        vmcore
169   2264  2778  30146  3165   5051  dma          modules     vmstat
172   2302  2779  30283  3523   5052  driver       mounts      zaptel
174   2319  2796  30285  3525   51    execdomains  mtrr        zoneinfo
1990  2335  2797  30367  3526   52    fb           net
1992  2354  2808  30398  355    53    filesystems  partitions
2     2359  2809  30421  388    5555  fs           schedstat
201   2381  2838  30442  389    5557  interrupts   scsi
202   2395  2839  30444  390    6     iomem        self
2020  2408  2847  30445  397    7     ioports      slabinfo
203   2431  2859  30477  398    8     irq          stat

Find out the process ID of you program using the ps command, lets say your executable name is csqlserver, then

$ps -ef |grep csqlserver
csql      5218  3115  0 13:35 pts/1    00:00:00 csqlserver
csql      5370  3526  0 13:42 pts/3    00:00:00 grep csqlserver

For the above output, pid is 5218.
Under this directory (/proc/5218) you will find lots of useful information which helps in debugging your application
Some of the important ones are
cmdline – contains the command line options with the executable name with which you started the process
environ – contains the environment variables like PATH, HOME,etc this process uses
maps – contains the memory map of the process(similar to output of pmap command)
fd – directory which contains all the file descriptors(you will see minimum 0,1 and 2 file descriptors which refers to stdin, stdout, stderr respectively)
cwd – link to current working directory
status – process status

Linux Tools
ps , pmap, top, vmstat
tools displays information about running process in the system which helps developers to identify issues such as CPU consumption, high disk I/O, high memory usage by the programs.

Advertisements

Pointer Swizzling Technique

When we send structures containing pointers, data copy is involved to flatten the structure into contiguous buffer at the sender side and another copy is involved at the receiver side.

We can avoid one data copy at the receiver side using pointer swizzling

pointer swizzling

The contigous buffer should include space for the pointer member also during the marshall process as shown above. For the request structure, contigous buffer size would be

Sizeof(int) + sizeof(void*) + strlen(“Sample String”);

As the pointer value sent over the socket would not make any sense at the reciver end, it is sent with the buffer. At the receiver end, after receiving the  buffer, it would not copy the data to structure by creating a new object. Instead it would recalculate the pointer member and typecast the buffer to (struct Request*) and use it directly. It is illustrated in the below code snippet.

Sender Side
The below code snippet marshals the Request structure as per the pointer swizzling requirement. It copies the actual structure value at the beginning of the buffer and the actual values pointed by pointer after that.

Marshalling code:

   char buf[512];

   char *ptr = buf;

   memcpy(ptr, &request, sizeof(request)); //copy the struct at the beginning of the buffer

   ptr = ptr + sizeof(request);//copy the string value after the structure value

   strcpy(ptr, request.sVal);

   ret = write(sock, &buf, sizeof(buf));

   if(ret == -1){

      printf(“unable to send the request\n”);

      return 1;

   }

Receiver Side
The below code snippet unmarshals the Request structure. It resets the pointer value and sets it correctly to the actual value.

Unmarshalling Code:

      char buf[512];

      ret = read(new_sock, &buf, sizeof(buf));

      if(ret == -1){

          printf(“unable to receive the request\n”);

          return 1;

      }

      char *ptr = buf+ sizeof(int);

      //swizzle the pointer to the actual value(after the structure value in buffer)

      *(void**)ptr = buf + sizeof(struct Request);

      struct Request *request = (struct Request*) buf;

      printf(“Request got: %d %s\n”, request->iVal, request->sVal);

Analyzing code dump

Most of the operating systems produce core file if any process attempts illegal memory access, subsequently the process crashes. In case of linux, this core file is not generated by default. We need to set the limit of core file size to unlimited to get a core file in case of crash. The following command sets that

$ulimit -c unlimited
The following software problems lead to process crash,

Invalid Array Indexing
Out of bound array indexing will corrupt data structures that allocated memory before/after the array.

int a;
int b[100];
int c;

When array b is indexed with -1, variable a will be corrupted and when it is indexed with value greater than 99, then c will be corrupted.

Un-initialized Pointer Dereference

If a pointer is dereferenced before it is initialized, it can result in corrupting, any area of the memory. This shall create nightmares for debuggers as the pointer causing memory corruption might be located in completely unrelated area of the code.

A special case of this problem is dereferencing a NULL pointer.

void *p;

printf(“%d”, *p);
Dangling Pointer Access
After memory is freed, application should not access the memory as it might have already been allocated to different thread. This is dangerous in case of multi threaded programs.

free(buf);
//do something else
buf->a = NULL;


Illegal pointer type dereference

This happens when a program passes a pointer of wrong type to a function. This leads to stack corruption and it is one of the hardest bugs to fix.

int retrieve(int *cnt)
{
    //code to compute the value comes here
    *cnt = value;
    return 0;
}
int main()
{
   short a;
   retrieve(&a);
}
Invalid operations

Processors detect various exception conditions and abort program execution when they detect an error condition. A few of these conditions are:

Divide by zero

Program attempted access to an illegal address. The address might be out of range or the program might not have the privilege to perform the access. This creates SIGSEGV signal whose default action is to dump the core file.

Misaligned access to memory like long word operation attempted at addresses that is not divisible by 4.
Consider the following program(test.c) which tries to dereference NULL pointer,

#include <stdio.h>
int main()
{
   int *p = NULL;
   printf(“%d\n”, *p);
}

$gcc -o test test.c

$./test
Segmentation fault (core dumped)

If you do not get the core dumped message, try setting core file size to unlimited

You can verify this by running the ls command

$ls
core.24391  test  test.c

It generates core file with suffix containing the process ID. 
You can use debugger to open the core file and see what happened when program crashed as follows

$gdb ./test core.24391

GNU gdb Red Hat Linux (6.6-8.fc7rh)
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type “show copying” to see the conditions.
There is absolutely no warranty for GDB.  Type “show warranty” for details.
This GDB was configured as “i386-redhat-linux-gnu”…
Using host libthread_db library “/lib/libthread_db.so.1”.

warning: Can’t read pathname for load map: Input/output error.
Reading symbols from /lib/libc.so.6…done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/ld-linux.so.2…done.
Loaded symbols for /lib/ld-linux.so.2
Core was generated by `./test’.
Program terminated with signal 11, Segmentation fault.
#0  0x080483cf in main () at test.c:5
5          printf(“%d\n”, *p);
It says that the error occurred at test.c line number 5.
If you program is large containing multiple functions, then you can try ‘where’ command in gdb prompt to display the function call stack with its arguments. We can inspect the arguments and understand the flow of code using the stack trace.

objdump
This tool gives the content of any section in the executable file including the assembly instructions in the .text segment.It comes handy when you do not have access to the source code and have only the executable file.
This tool is also used for core dump analysis to figure out which instruction (at some particular address displayed by backtrace of gdb) dumped the core. From this we shall figure out which C source code caused the core dump.

Consider the following example program (dump.c)

#include<stdio.h>
typedef struct element{
    int data;
}Element;
void function()
{
    printf(“p is initialized to null first\n”);
    Element *p = NULL;
    printf(“I forgot to allocate memory\n”);
    p->data = 10;
    printf(“It would have dumped already\n”);
}
int main()
{
    function();
    return 0;
}

compile and get the executable

$gcc -O2 -o dump dump.c
set the core file size to unlimited to get the core file in case of dump as follows

$ulimit -c unlimited
Run the executable

$./dump

It will generate the core file in your current directory.

run gdb to figure out which instruction dumped core

$gdb ./dump core*
GNU gdb Red Hat Linux (6.6-8.fc7rh)
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type “show copying” to see the conditions.
There is absolutely no warranty for GDB.  Type “show warranty” for details.
This GDB was configured as “i386-redhat-linux-gnu”…
(no debugging symbols found)
Using host libthread_db library “/lib/libthread_db.so.1”.

warning: Can’t read pathname for load map: Input/output error.
Reading symbols from /lib/libc.so.6…(no debugging symbols found)…done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/ld-linux.so.2…(no debugging symbols found)…done.
Loaded symbols for /lib/ld-linux.so.2
(no debugging symbols found)
Core was generated by `./dump’.
Program terminated with signal 11, Segmentation fault.
#0  0x080483de in function ()
(gdb)
gdb output tells that segmentation fault(core dump) occured at address 0x080483de.
Now let us see which instruction is present at this address using the objdump command

$objdump -S ./dump >out

Open the out file using vi editor and search for 80483de. The function() will contain following assembly instructions

 080483c0 <function>:
 80483c0:       55                      push   %ebp
 80483c1:       89 e5                   mov    %esp,%ebp
 80483c3:       83 ec 08                sub    $0x8,%esp
 80483c6:       c7 04 24 00 85 04 08    movl   $0x8048500,(%esp)
 80483cd:       e8 fe fe ff ff          call   80482d0 <puts@plt>
 80483d2:       c7 04 24 20 85 04 08    movl   $0x8048520,(%esp)
 80483d9:       e8 f2 fe ff ff          call   80482d0 <puts@plt>
 80483de:       c7 05 00 00 00 00 0a    movl   $0xa,0x0
 80483e5:       00 00 00
 80483e8:       c7 04 24 3c 85 04 08    movl   $0x804853c,(%esp)
 80483ef:       e8 dc fe ff ff          call   80482d0 <puts@plt>
 80483f4:       c9                      leave
 80483f5:       c3                      ret
 80483f6:       8d 76 00                lea    0x0(%esi),%esi
 80483f9:       8d bc 27 00 00 00 00    lea    0x0(%edi),%edi

Look at address 80483de, it shows that NULL (0x0) is being moved. Now we have to figure out where this occurs in our source code. Note the call to puts() function from address 80483cd,  80483d9 and 80483ef. The dumping address comes between the 2nd and 3rd puts() function.
In our code we have three printf() statements, which means that core dump occurred between the 2nd and 3rd printf() statement, which is at line p->data = 10; Now we can walk through the code and identify the memory for p is not allocated before we access an element in it.

The most obvious memory errors result in a “Segmentation violation” message. This may alert the programmer to the location of the memory error when the program is run in gdb. The following errors discussed are the not so obvious errors.

Memory errors:

Heap memory errors:

Attempting to free memory already freed.

Freeing memory that was not allocated.

Attempting to write to memory already freed.

Attempting to write to memory which was never allocated.

Memory allocation error.

Reading/writing to memory out of the bounds of a dynamically allocated array

stack (local variables) memory errors:

Reading/writing to memory out of the bounds of a static array. (array index overflow – index too large/underflow – negative index)

Function pointer corruption: Invalid passing of function pointer and thus a bad call to a function.