[Linux Kernel] 문자 디바이스 드라이버 (Blocking IO)

이 글은 가상 파일시스템, 문자 디바이스 드라이버 작성, 세마포어를 어느 정도 이해한다고 가정한다. 이 글에서는 wait queue라는 인터페이스를 소개하고 wait queue로 Blocking IO를 구현하는 방법에 대해 설명할 것이다. Non-blocking IO가 되게 간단하긴 한데 지금 공부하기가 싫으니 다음 번에 Non-blocking IO & Async IO로 같이 글을 써야겠다.

Blocking vs Non-blocking IO

일반적으로 우리가 open을 통해서 여는 파일에 작업을 하면, blocking 방식으로 처리하게 된다. blocking IO 파일에 대하여 read를 호출했으나 아직 읽을 내용이 없을 때, write를 했으나 아직 데이터를 버퍼에 쓸 수 없을 때, open을 했으나 아직 초기화가 되지 않은 경우와 같이, "현재 시스템 버퍼가 비어있어 당장 read할 수는 없으나 기다리면 데이터가 들어오는 경우"를 처리하기 위해 프로세스는 데이터가 들어올때까지 block한다 (멈춘다) 이러한 상황이 발생하는 것은 소켓 통신으로 데이터를 처리하는 경우 클라이언트가 데이터를 보낼 때까지 기다리거나, 디스크에서 파일을 읽어오는걸 기다리는 경우 등을 생각할 수 있다.

#include <unistd.h>
#include <fcntl.h>

/* fd is file descriptor (of regular file, pipe, or socket, .. any way) */

int current_flag = fcntl(fd, F_GETFL, 0); /* get current flag of file */
fcntl(sock_fd, F_SETFL, flag | O_NONBLOCK); /* change flag of file (or O_NONBLOCK flag) */

하지만 fcntl을 사용하면 간단하게 non-blocking 방식으로 바꿀 수 있다. non-blocking은 말 그대로 데이터가 버퍼에 들어올 때까지 기다리지 않는 것이다. 기다리지 않으면 어떻게 되는가? 디바이스 드라이버에서 정의한, file_operations의 read 함수를 예로 들면 현재 데이터를 읽을 수 없는 경우 -EAGAIN을 리턴한다. (그럼 사용자 프로그램에서는 read의 리턴값이 -1이 되고, errno는 EAGAIN이 된다.)

blocking 방식을 사용하면, 프로세스가 block되는 동안은 아무것도 할 수 없으므로 매우 비효율적이다. 따라서 동시에 여러 파일에 대한 처리를 해야하는 경우에는 non-blocking 방식으로, 현재 처리할 수 있는 파일 들에 대해서만 처리하는 것이 매우 효율적이며, 웹 서버를 예로 들 수 있다. (클라이언트가 100개인데, 1명의 패킷을 기다리느라 나머지 99명은 대기한다면 매우 비효율적일 것이다.)

Blocking IO

나는 예전까지 Blocking IO를 매우 당연하게 생각해왔다, 당연히 소켓에서 read하면 데이터가 들어올 때까지 block되는 것 아닌가? 라고 생각했다. 그럼 이때 block을 하는 주체는 누구일까? 적어도 내 프로그램은 '데이터가 들어올 때 까지 sleep한다'라는 코드는 없다. 결국 Blocking IO도 디바이스 단에서 지원하는 기능 중 하나인 것이다. 이번 글에서 그 기능을 구현해볼 것이다.

wait queue

커널 프로세스든 사용자 프로세스든 기다리기 위해서 sleep을 할 때가 있다. 하지만 특히, 특정 이벤트를 기다리는 프로세스와, 해당 이벤트를 발생하는 프로세스가 존재할 때는 wait queue를 사용할 수 있다. 이 때, 기다리는 프로세스는 wait queue에 들어가서 대기하며, 이벤트를 발생하는 프로세스는 wait queue에서 대기중인 프로세스를 깨운다. wait queue를 사용할 때 주의할 점은, wait queue에 들어가기 전에는 반드시 누군가가 나중에 깨워줄 수 있다는 것을 알아야 한다는 것이다.

/* statically define wait queue head */
DEFINE_WAIT_QUEUE_HEAD(name);

/* dynamically define wait queue head */
wait_queue_head_t queue;
init_waitqueue_head(&queue);

DEFINE_WAIT_QUEUE_HEAD와, init_waitqueue_head로 정적, 또는 동적으로 wait queue를 만들 수 있다.

wait queue 사용법

/* include/linux/wait.h */

wait_event(queue, condition)
wait_queue_interruptible(queue, condition) 
wait_queue_timeout(queue, condition, timeout)
wait_queue_interruptible_timeout(queue, condition, timeout)

파라미터 queue는 앞에서 정의한 wait_queue_head이며, condition은 wait queue 안에서 프로세스가 기다리는 조건이다. 예를 들어 condition이 (count == 0)이라면, count가 0이 될때까지 기다린다. wait_event_XXXX는 매크로 함수이며, condition은 참이 될 때까지 몇 번이고 체크를 하면서 기다린다.

wait_event는 wait queue 안에 들어간 후 TASK_UNINTERRUPTIBLE 상태로 기다린다. wait_event_interruptible은 TASK_INTERRUPTIBLE 상태로 condition이 참이 되기까지를 기다리며, 시그널을 받으면 -ERESTARTSYS를 리턴한다. 자세한 설명은 wait.h에 상세하게 나와있다. 이 글에선 wait_event_interruptible을 사용할 것이다.

wait_event_XXXX로 wait queue 안에 들어가서 잠들어있으면 누군가는 깨워줘야 한다. 이에 해당하는 함수가 wake_up_XXXX 함수이다. 간단한 설명을 하자면 wake_up은 모든 프로세스를, wake_up_interruptible은 interruptible 상태인 프로세스만 깨운다. 그 외 자세한 부분은 __wake_up_common의 구현을 보면 알 수 있다.

#define wake_up(x)                  __wake_up(x, TASK_NORMAL, 1, NULL)          
#define wake_up_nr(x, nr)           __wake_up(x, TASK_NORMAL, nr, NULL)         
#define wake_up_all(x)              __wake_up(x, TASK_NORMAL, 0, NULL)          
#define wake_up_locked(x)           __wake_up_locked((x), TASK_NORMAL, 1)       
#define wake_up_all_locked(x)       __wake_up_locked((x), TASK_NORMAL, 0)       
                                                                                
#define wake_up_interruptible(x)    __wake_up(x, TASK_INTERRUPTIBLE, 1, NULL)   
#define wake_up_interruptible_nr(x, nr)   __wake_up(x, TASK_INTERRUPTIBLE, nr, NULL)
#define wake_up_interruptible_all(x)      __wake_up(x, TASK_INTERRUPTIBLE, 0, NULL)                                                                  
#define wake_up_interruptible_sync(x)     __wake_up_sync((x), TASK_INTERRUPTIBLE)

/**                                                                             
 * __wake_up - wake up threads blocked on a waitqueue.                          
 * @wq_head: the waitqueue                                                      
 * @mode: which threads                                                         
 * @nr_exclusive: how many wake-one or wake-many threads to wake up             
 * @key: is directly passed to the wakeup function                              
 *                                                                              
 * If this function wakes up a task, it executes a full memory barrier before   
 * accessing the task state.                                                    
 */                                                                             
void __wake_up(struct wait_queue_head *wq_head, unsigned int mode,                                                                                   
                  int nr_exclusive, void *key)                                  
{                                                                               
      __wake_up_common_lock(wq_head, mode, nr_exclusive, 0, key);               
}

Blocking IO example

#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/cdev.h>
#include <linux/fs.h>
#include <linux/slab.h>

#define MINOR_BASE 0
#define DEVICE_NAME "pipe"
#define BUFFER_SIZE 4096

int pipe_open(struct inode *inode, struct file *filp);
int pipe_release(struct inode *inode, struct file *filp);
ssize_t pipe_read(struct file *filp, char *buf, size_t count, loff_t *f_pos);
ssize_t pipe_write(struct file *filp, const char *buf, size_t count, loff_t *f_pos);

struct pipe_device {
	wait_queue_head_t in_queue, out_queue;
	char *buffer;
	int buffer_size;
	int read_pos, write_pos;
	struct semaphore sem;
	struct cdev cdev;
	struct class *class;
	struct device *device;
	dev_t dev;
};

struct pipe_device pdev;

static struct file_operations fops = {
	.owner = THIS_MODULE,
	.open = pipe_open,
	.release = pipe_release,
	.read = pipe_read,
	.write = pipe_write
};

int pipe_open(struct inode *inode, struct file *filp)
{
	struct pipe_device *dev;

	/* save pointer of pipe_device to private data.
	 * this is not needed, but for scalability when
	 * there's more than one device.
	 */
	dev = container_of(inode->i_cdev, struct pipe_device, cdev);
	filp->private_data = dev;
	return 0;
}

int pipe_release(struct inode *inode, struct file *filp)
{
	return 0;
}

static int free_space_len(struct pipe_device *dev)
{
	int len;

	if (dev->write_pos == dev->read_pos) { /* buffer is empty */
		return dev->buffer_size - 1;
	} else {
		len = dev->write_pos - dev->read_pos + dev->buffer_size;
		return len % dev->buffer_size;
	}

}

ssize_t pipe_read(struct file *filp, char *buf, size_t count, loff_t *f_pos)
{
	struct pipe_device *dev = filp->private_data;

	/* no space to read */
	if (down_interruptible(&dev->sem))
		return -ERESTARTSYS;

	while (dev->read_pos == dev->write_pos) {
		up(&dev->sem);

		/* sleep (waiting for writers) */
		wait_event_interruptible(dev->in_queue,
							dev->read_pos != dev->write_pos);

		if (down_interruptible(&dev->sem))
			return -ERESTARTSYS;
	}

	/* there's something to read */

	if (dev->write_pos > dev->read_pos) {
		count = min(count, (size_t) dev->write_pos - dev->read_pos);
	} else {
		count = min(count, (size_t) dev->buffer_size - dev->read_pos);
	}

	if (copy_to_user(buf, dev->buffer, count)) {
		/* failed to copy */
		up(&dev->sem);
		return -EFAULT;
	}

	dev->read_pos += count;
	/* wrapped */
	if (dev->read_pos == dev->buffer_size)
		dev->read_pos = 0;

	/* wake up writers */
	up(&dev->sem);
	wake_up_interruptible(&dev->out_queue);
	return count;
}

ssize_t pipe_write(struct file *filp, const char *buf, size_t count, loff_t *f_pos)
{
	struct pipe_device *dev = filp->private_data;

	if (down_interruptible(&dev->sem))
		return -ERESTARTSYS;

	while (free_space_len(dev) == 0) {
		up(&dev->sem);

		/* sleep (waiting for readers) */
		wait_event_interruptible(dev->out_queue, free_space_len(dev) != 0);

		if (down_interruptible(&dev->sem))
			return -ERESTARTSYS;
	}

	/* there's space to write */

	if (dev->write_pos > dev->read_pos)
		count = min(count, (size_t) dev->buffer_size - dev->write_pos);
	else /* wrapped */
		count = min(count, (size_t) dev->read_pos - dev->write_pos - 1);

	if (copy_from_user(dev->buffer, buf, count)) {
		/* failed to copy */
		up(&dev->sem);
		return -EFAULT;
	}

	dev->write_pos += count;
	if (dev->write_pos == dev->buffer_size)
		dev->write_pos = 0;
	up(&dev->sem);

	wake_up_interruptible(&dev->in_queue);
	return count;

}
int __init pipe_init(void)
{
	pdev.buffer_size = BUFFER_SIZE;
	pdev.buffer = kmalloc(sizeof(char) * pdev.buffer_size, GFP_KERNEL);
	if (!pdev.buffer)
		goto err_return;
	pdev.read_pos = 0;
	pdev.write_pos = 0;
	sema_init(&pdev.sem, 1);

	init_waitqueue_head(&pdev.in_queue);
	init_waitqueue_head(&pdev.out_queue);

	if (alloc_chrdev_region(&pdev.dev, MINOR_BASE, 1, DEVICE_NAME)) {
		printk("[%s] error while allocating device\n", DEVICE_NAME);
		goto free_buffer;
	}

	cdev_init(&pdev.cdev, &fops);
	if (cdev_add(&pdev.cdev, pdev.dev, 1)) {
		goto unreg_device;
	}

	pdev.class = class_create(THIS_MODULE, DEVICE_NAME);
	if (IS_ERR(pdev.class)) {
		goto unreg_device;
	}

	pdev.device = device_create(pdev.class, NULL, pdev.dev, NULL, "%s%d", DEVICE_NAME, 1);
	if (IS_ERR(pdev.device)) {
		goto unreg_class;
	}

	printk(KERN_INFO "[%s] successfully registered device!\n", DEVICE_NAME);
	return 0;

 unreg_class:
	 class_destroy(pdev.class);

 unreg_device:
	 unregister_chrdev_region(pdev.dev, 1);

 free_buffer:
	 kfree(pdev.buffer);

 err_return:
	 return -1;
}

void __exit pipe_exit(void)
{
	kfree(pdev.buffer);
	device_destroy(pdev.class, pdev.dev);
	class_destroy(pdev.class);
	cdev_del(&pdev.cdev);
	unregister_chrdev_region(pdev.dev, 1);
	printk(KERN_INFO "[%s] successfully unregistered device!\n", DEVICE_NAME);
}


module_init(pipe_init);
module_exit(pipe_exit);


MODULE_LICENSE("GPL");

간단하게 작동 과정을 설명하자면

테스트는 간단하게 쉘(sh1)에서 cat /dev/pipe1을 실행하면 block되고, 다른 쉘(sh2)에서 echo hello > /dev/pipe1을 실행하면 확인해볼 수 있다. 과정은 다음과 같다.

1. sh1이 'cat /dev/pipe1'을 실행한다 -> pipe_read 호출 -> 현재 아무 데이터도 없으므로 wait_event_interruptible로 wait queue에 들어가서 sleep한다.

2. sh2가 'echo hello > /dev/pipe1'을 실행한다 -> pipe_write 호출 -> 데이터를 썼으므로 wake_up_interruptible로 wait queue에 있는 sh1을 깨운다.

3. sh1은 원형 버퍼 안에 있는 데이터를 읽는다.

그 외에 동기화 문제를 해결하려고 세마포어를 사용했고, 데이터는 원형 버퍼 (circular buffer)를 사용하여 읽거나 쓰도록 했다.

-ERESTARTSYS를 리턴하면 read나 write를 다시 호출하게 된다.

오늘 한 삽질

cat /dev/pipe1로 대기하는 동안 rmmod를 했더니 컴퓨터가 죽어버렸다. (아마도 커널 패닉이지 싶은데 아무 로그도 안남았다...) 뭐 당연히 read에서 블락중인 프로세스가 있는데 모듈이 사라지면 oops가 뜨는 건 이해는 된다. 그럼 어떻게 막지? 하다가 디바이스의 file_operations에 owner를 명시 안해줘서 그런 거였다. owner가 있어야 사용 중인 모듈에 대해 'module xxxx is in use' 이런 메시지를 띄워준다. 이건 디바이스 모델을 제대로 이해해야 이해할 수 있을 것 같다.

오늘 이해 못한 점

cat /dev/pipe1를 실행했을 때 CTRL + C로 SIGINT를 보내도 죽지 않는다. 왜지? wait_event_interruptible이랑 down_interruptible만 썼는데 왜 안되는지 잘 모르겠다.

참고 문서

Linux Device Drivers book - Bootlin

A must-have book for people creating device drivers for the Linux kernel! Now available in a single PDF file. Linux Device Drivers from Jonathan Corbet, Alessandro Rubini and Greg Kroah-Hartmann, is the book anyone interested in writing Linux device driver

bootlin.com

Endless Learning