Category Archives: Linux Kernel Analysis

linux 0.11 kernel – __NR_write and sys_write

Linux-0.11/include/linux/sys.h

1
2
3
4
fn_ptr sys_call_table[] = { sys_setup, sys_exit, sys_fork, sys_read,
sys_write, sys_open, sys_close, sys_waitpid, sys_creat, sys_link,
sys_unlink, sys_execve, sys_chdir, sys_time, sys_mknod, sys_chmod,
sys_chown, sys_break, sys_stat, sys_lseek, sys_getpid, sys_mount, ...}

Linux-0.11/include/unistd.h

1
2
3
4
5
6
7
#define __NR_setup	0	/* used only by init, to get system going */
#define __NR_exit	1
#define __NR_fork	2
#define __NR_read	3
#define __NR_write	4
#define __NR_open	5
#define __NR_close	6

Linux-0.11/lib/open.c

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
__asm__("int $0x80"
		:"=a" (res)
		:"0" (__NR_open),"b" (filename),"c" (flag),
		"d" (va_arg(arg,int)));
 
#define _syscall0(type,name) \
  type name(void) \
{ \
long __res; \
__asm__ volatile ("int $0x80" \
	: "=a" (__res) \
	: "0" (__NR_##name)); \
if (__res >= 0) \
	return (type) __res; \
errno = -__res; \
return -1; \
}
 
set_system_gate(0x80,&system_call);

Linux Kernel – Data structure List in include/linux/list.h

Linux Kernel has lots of data structure such as list (including linked list, double linked lists), red-black tree, B Tree , Interval tree, Heap, Radix tree, spin locks and so on.
I will put some articles for these interesting data structures.

List is most simple data structure we could start with.

Even for such simple data structure, Linux Kernel still has some operation which User-Space data structure didn’t use.

the definition of list_head

1
2
3
struct list_head {
    struct list_head *next, *prev;
};

It is a double linked list.

1
2
3
4
5
static inline void
INIT_LIST_HEAD(struct list_head *list)
{
    list->next = list->prev = list;
}
1
2
3
4
5
6
7
8
9
static inline void
__list_add(struct list_head *entry,
                struct list_head *prev, struct list_head *next)
{
    next->prev = entry;
    entry->next = next;
    entry->prev = prev;
    prev->next = entry;
}

what this function todo is put entry between next and prev.

prev –> entry <--- next

1
2
3
4
static inline void list_add_tail(struct list_head *new, struct list_head *head)
{
	__list_add(new, head->prev, head);
}

append something into tail is to add entry between head and tail(head and tail are linked together)

1
2
3
4
5
static inline void __list_del(struct list_head * prev, struct list_head * next)
{
	next->prev = prev;
	prev->next = next;
}

del is to delete all items between prev and next. therefore, set next’s prev to prev node.
and prev node’s next to be next node.

1
2
3
4
5
6
7
8
9
static inline void list_replace(struct list_head *old,
				struct list_head *new)
{
	new->next = old->next;
	new->next->prev = new;
	new->prev = old->prev;
	new->prev->next = new;
}
INIT_LIST_HEAD(old); // let old's all pointer pointed to itself.

delete old one and linked new one to old one’s postion.

1
2
3
4
5
static inline void list_move(struct list_head *list, struct list_head *head)
{
	__list_del_entry(list);
	list_add(list, head);
}

delete from one list and add as another’s head

1
2
3
4
5
6
7
8
9
static inline void list_rotate_left(struct list_head *head)
{
	struct list_head *first;
 
	if (!list_empty(head)) {
		first = head->next;
		list_move_tail(first, head);
	}
}

move head->next to head and tail.

some other functions such as list_cut_position is to split list into two.
and list_splice is to join two lists

1
2
3
4
#define list_for_each_prev_safe(pos, n, head) \
	for (pos = (head)->prev, n = pos->prev; \
	     pos != (head); \
	     pos = n, n = pos->prev)

some macro are defined in such format, there for developer don’t have to write “for” every time

Linux Kernel – start_kernel in main.c and task_struct for process management

Linux start with start_kernel function in main.c. It has lots of function, such mm_init for kernel memory allocators. Let’s start with some very basic components firstly.

task_struct is an basic and important struct for process.
It is defined in “include/linux/sched.h”

One process will be linked with others, for example, parent process and child process are linked together.

each member of task_struct is very critical and it is better to go though all of them to understand its better.

Check the definition of task_struct, you could print more information you want,

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
#include <linux/module.h>
#include <linux/kernel.h> /* printk() */
#include <linux/errno.h>   /* error codes */
#include <linux/sched.h>
 
 
MODULE_LICENSE("Dual BSD/GPL");
 
/* Declaration of functions */
void device_exit(void);
int device_init(void);
 
/* Declaration of the init and exit routines */
module_init(device_init);
module_exit(device_exit);
 
int device_init(void)
{
    struct task_struct *task = current; // getting global current pointer
    printk(KERN_NOTICE "assignment: current process: %s, PID: %d", task->comm, task->pid);
    do
    {
        task = task->parent;
        printk(KERN_NOTICE "assignment: parent process: %s, PID: %d", task->comm, task->pid);
    } while (task->pid != 0);
    return 0;
}
 
void device_exit(void) {
  printk(KERN_NOTICE "assignment: exiting module");
}

Screenshot from 2016-02-12 01:39:43

Linux Kernel Analysis: Enable Linux Kernel Stack Dumping

As we know, if we want to know a special function and who will call it.
One of way is to use “cscope” to static analysis.

The following step is how we do it:

Enabling Stack Dumping in Linux Kernel
Enabling in Kernel Config
To enable the dump_stack() function in the kernel config the following options must be set. You can use make menuconfig or make xconfig to do this.

Kernel hacking -> Kernel debugging
Kernel hacking -> Verbose kernel error messages

1

Enabling these two options will change the dump_stack() function from a do nothing function to dumping the stack.

You need to rebuild your Linux kernel image after enabling these options.

Using dump_stack()
Using the dump_stack() function is as easy as calling dump_stack() wherever you wish to print out the stack. This will cause a stack trace to be printed at that point.