Exercise 1

lapic_init() calls mmio_map_region() at first to map 4K memory at physical address lapicaddr to the MMIO area of virtual memory. So we need to use boot_map_region() to implement the mmio_map_region() in kern/pmap.c.

Although it is called only by lapic_init() and the size it passes equals to PGSIZE, we still need to round size up to a multiple of PGSIZE. But here rounding base or pa is not needed as they are always page-aligned. A temporary variable is needed as we are required to return the base of the reserved region.

Follow the comment we can write some code like this:

void *
mmio_map_region(physaddr_t pa, size_t size)
{
    size_t rounded_size = ROUNDUP(size, PGSIZE);

    if (base + rounded_size >= MMIOLIM) {
        panic("mmio_map_region: requested size overflow MMIOLIM");
    }
    boot_map_region(kern_pgdir, base, rounded_size, pa, PTE_PCD | PTE_PWT | PTE_W);

    uintptr_t curr_base = base;
    base += rounded_size;
    return (void *)curr_base;
}

Exercise 2

The control flow transfer is, the bootstrap processor (BSP) executes boot_aps() to copy code in kern/mpentry.S to MPENTRY_PADDR, then send STARTUP IPIs to the APs. The APs run the entry code, then mp_main() will initialize it and notice the BSP they are up.

The code needs to be implemented here in page_init() is quite simple as we know that MPENTRY_PADDR is 0x7000, it is above PGSIZE and below IOPHYSMEM. What we need to do here is to initialize the page at MPENTRY_PADDR as a special case in the corresponding position.

@@ -310,5 +310,10 @@ page_init(void)
        for (i = 1; i < npages_basemem; i++) {
-               pages[i].pp_ref = 0;
-               pages[i].pp_link = page_free_list;
-               page_free_list = &pages[i];
+               // mark the physical page at MPENTRY_PADDR as in use
+               if (i == MPENTRY_PADDR / PGSIZE) {
+                       pages[i].pp_ref = 1;
+               } else {
+                       pages[i].pp_ref = 0;
+                       pages[i].pp_link = page_free_list;
+                       page_free_list = &pages[i];
+               }
        }

Now QEMU should pass tests till check_page_free_list().

Question 1

MPBOOTPHYS is used to calculate the corresponding physical address of its symbols at MPENTRY_PADDR since APs are in real mode when starting and it can only run codes below 640K and boot_aps() has made a copy of the code at MPENTRY_PADDR, so the code can be accessed by the APs with the help of this macro.

The bootloader in boot/boot.S doesn't need such macro because it is an independent module linked to 0x7c00 which is addressable in real mode.

If it were omitted in kern/mpentry.S. The APs will try to load code at high address, which is unaddressable in real mode.

Exercise 3

The comment provides all the information needed, even some of the code. It's very easy to implement.

static void
mem_init_mp(void)
{
    int i;

    for (i = 0; i < NCPU; i++) {
        uint32_t kstacktop_i = KSTACKTOP - i * (KSTKSIZE + KSTKGAP);

        boot_map_region(kern_pgdir, kstacktop_i - KSTKSIZE, KSTKSIZE, PADDR(percpu_kstacks[i]), PTE_W);
    }
}

Now QEMU should pass the check_kern_pgdir() test.

Exercise 4

This one is also easy as the comment also gives lots of hints. What we need to do is to replace ts with thiscpu->cpu_ts, change gdt[(GD_TSS0 >> 3)] to gdt[(GD_TSS0 >> 3) + i], set the esp point to the CPU's own kernel and load the TSS selector for each CPU. The TSS selector may be a little hard to find. Since its bottom 3 bits are 0 and GDT uses (GD_TSS0 >> 3) + i as its index, its value is GD_TSS0 + (i << 3) here.

void
trap_init_percpu(void)
{
    int id = thiscpu->cpu_id;

    // Setup a TSS so that we get the right stack
    // when we trap to the kernel.
    thiscpu->cpu_ts.ts_esp0 = (uint32_t)percpu_kstacks[id] + KSTKSIZE;
    thiscpu->cpu_ts.ts_ss0 = GD_KD;
    thiscpu->cpu_ts.ts_iomb = sizeof(struct Taskstate);

    // Initialize the TSS slot of the gdt.
    gdt[(GD_TSS0 >> 3) + id] = SEG16(STS_T32A, (uint32_t) (&thiscpu->cpu_ts),
                    sizeof(struct Taskstate) - 1, 0);
    gdt[(GD_TSS0 >> 3) + id].sd_s = 0;

    // Load the TSS selector (like other segment selectors, the
    // bottom three bits are special; we leave them 0)
    ltr(GD_TSS0 + (id << 3));

    // Load the IDT
    lidt(&idt_pd);
}

Exercise 5

This is the easiest one so far. Simply add a call to lock_kernel() or unlock_kernel() to the location specified.

In kern/init.c:i386_init():

@@ -44,2 +44,2 @@ i386_init(void)
        // Acquire the big kernel lock before waking up APs
-       // Your code here:
+       lock_kernel();

In kern/init.c:i386_init():

@@ -107,5 +107,5 @@ mp_main(void)
        // Now that we have finished some basic setup, call sched_yield()
        // to start running processes on this CPU.  But make sure that
        // only one CPU can enter the scheduler at a time!
-       //
-       // Your code here:
+       lock_kernel();
+       sched_yield();

In kern/trap.c:trap():

@@ -276,5 +275,5 @@ trap(struct Trapframe *tf)
                // Trapped from user mode.
                // Acquire the big kernel lock before doing any
                // serious kernel work.
-               // LAB 4: Your code here.
+               lock_kernel();
                assert(curenv);

In kern/env.c:env_run():

@@ -547,5 +547,6 @@ env_run(struct Env *e)
        curenv = e;
        e->env_status = ENV_RUNNING;
        e->env_runs++;
+       unlock_kernel();
        lcr3(PADDR(e->env_pgdir));
        env_pop_tf(&e->env_tf);

Question 2

When an exception or interrupt is triggered, the kernel will push some information onto the stack (here it is TrapFrame), before the trap handler acquires the big kernel lock. So other CPUs can push a TrapFrame when one CPU is in kernel mode. If all CPUs share one kernel stack, something like this may happen. CPU0 is handling a trap, at this time, a trap on CPU1 is triggered and it pushes its TrapFrame onto the stack. When CPU0 has done the job, it will pop the stack pushed by CPU1 instead of its own. In this situation, the kernel stack will be corrupted, and the control flow will not be transferred as expected.

Exercise 6

Implement round-robin scheduling in sched_yield() in kern/sched.c is not hard, we need to find the first runnable environment after curenv in envs[], if not found, search from 0 to curenv, as the comment indicates. Notice that the curenv may be null. Since env_run() does not return, a flag variable is not needed here.

void
sched_yield(void)
{
    struct Env *idle = curenv;
    int idle_envid = (idle == NULL) ? -1 : ENVX(idle->env_id);
    int i;

    // search envs after idle
    for (i = idle_envid + 1; i < NENV; i++) {
        if (envs[i].env_status == ENV_RUNNABLE) {
            env_run(&envs[i]);
        }
    }

    // find from 1st env if not found
    for (i = 0; i < idle_envid; i++) {;
        if (envs[i].env_status == ENV_RUNNABLE) {
            env_run(&envs[i]);
        }
    }

    // if still not found, try idle
    if(idle != NULL && idle->env_status == ENV_RUNNING) {
        env_run(idle);
    }

    // sched_halt never returns
    sched_halt();
}

Then modify syscall() in kern/syscall.c to dispatch sys_yield(). Simply add a case will do this.

case SYS_yield:
    sys_yield();
    return 0;

And don't forget to remove this in kern/init.c:mp_main(), though it won't cause any trouble if it is not removed.

// Remove this after you finish Exercise 6
for (;;);

To test this, we are asked to create three environments that all run the program user/yield.c. So this code should be added to kern/init.c:i386_init(). Remember to comment out the user_primes test as fork() is not implemented so far.

#if defined(TEST)
    // Don't touch -- used by grading script!
    ENV_CREATE(TEST, ENV_TYPE_USER);
#else
    // Touch all you want.
    ENV_CREATE(user_yield, ENV_TYPE_USER);
    ENV_CREATE(user_yield, ENV_TYPE_USER);
    ENV_CREATE(user_yield, ENV_TYPE_USER);
    // ENV_CREATE(user_primes, ENV_TYPE_USER);
#endif // TEST*

Now run make qemu CPUS=x (x is the number of CPUs) will create 3 environments, each one will be switched 5 times, and finally invokes the kernel monitor.

Question 3

In env_setup_vm(), the comment says the virtual address space of all environments is identical from UTOP to UVPT, as well as the address space of the kernel. The virtual address of e is always the same whatever the address space it is, so it can be dereferenced both before and after the addressing switch.

Question 4

The context switch needs to ensure the environment can resume the execution at exactly where it stops as the switch has never happened. So all the registers need to be saved. They are pushed onto the stack when it triggers sys_yield() syscall, and then the trap handler (here it is kern/trap.c:trap()) will save them in env_tf. And they are restored by env_pop_tf() when env_run() is executed.

Exercise 7

This exercise requires us to implement several functions. The codes are long but not very difficult since most of them are for handling errors.

Let's start with sys_exofork(), possibly the most difficult one here because this syscall is called once but returns twice and it should be tweaked to return 0 to the child.

To figure out how this syscall is used, we need to take a look at user/dumbfork.c. This parent will call sys_exofork, the kernel will make a copy of the Env structure of itself for the child, and set the child's status to ENV_NOT_RUNNABLE. Now the child is created and stopped. sys_exofork will put the child's envid in the eax register according to the calling convention for return value. Then the parent will make a full copy of its memory to the child and set the child to be runnable. After that, the round-robin scheduling will run the child using env_run() at some time. When the child process is resumed, it will try to read the return value from eax, and eax is saved in its Env structure. So to return 0 to the child, we need to modify the value of saved eax register of the child.

With the analysis above and the comment, it's not hard to implement this function.

static envid_t
sys_exofork(void)
{
    struct Env *e;
    int r;

    if ((r = env_alloc(&e, curenv->env_id)) != 0) {
        return r;
    }
    e->env_status = ENV_NOT_RUNNABLE;
    e->env_tf = curenv->env_tf;
    e->env_tf.tf_regs.reg_eax = 0;  // return 0 to child
    return e->env_id;
}

sys_env_set_status() is a easy one. What need to do is to check the parameters and simply alter the status.

static int
sys_env_set_status(envid_t envid, int status)
{
    struct Env *e;
    int r;

    if (status != ENV_RUNNABLE && status != ENV_NOT_RUNNABLE) {
        return -E_INVAL;
    }
    if ((r = envid2env(envid, &e, 1)) != 0) {
        return r;
    }
    e->env_status = status;
    return 0;
}

sys_page_alloc() is also easy. Here we need to use page_alloc() and page_insert() implemented in Lab 2 to allocate a page for the env, and we need to check the parameters first, as the comment says.

static int
sys_page_alloc(envid_t envid, void *va, int perm)
{
    struct Env *e;
    struct PageInfo *pp;
    int r;

    if ((uint32_t)va >= UTOP || PGOFF(va) != 0) {
        return -E_INVAL;
    }
    if ((perm & (PTE_U | PTE_P)) != (PTE_U | PTE_P)) {
        return -E_INVAL;
    }
    if ((perm & ~(PTE_SYSCALL)) != 0) {
        return -E_INVAL;
    }
    if ((r = envid2env(envid, &e, 1)) != 0) {
        return r;
    }
    if((pp = page_alloc(perm)) == NULL) {
        return -E_NO_MEM;
    }
    if((r = page_insert(e->env_pgdir, pp, va, perm)) != 0) {
        page_free(pp);
        return -E_NO_MEM;
    }
    return 0;
}

sys_page_map() is a little long, but not difficult. Here we need to use page_lookup() and page_insert() to map a page to another position. Follow the comment and make sure you have checked all the errors provided.

static int
sys_page_map(envid_t srcenvid, void *srcva,
         envid_t dstenvid, void *dstva, int perm)
{
    struct Env *srcenv, *dstenv;
    struct PageInfo *pp;
    pte_t *pte;
    int r;

    if ((uint32_t)srcva >= UTOP || PGOFF(srcva) != 0) {
        return -E_INVAL;
    }
    if ((uint32_t)dstva >= UTOP || PGOFF(dstva) != 0) {
        return -E_INVAL;
    }
    if ((perm & (PTE_U | PTE_P)) != (PTE_U | PTE_P)) {
        return -E_INVAL;
    }
    if ((perm & ~(PTE_SYSCALL)) != 0) {
        return -E_INVAL;
    }
    if ((r = envid2env(srcenvid, &srcenv, 1)) != 0) {
        return r;
    }
    if ((r = envid2env(dstenvid, &dstenv, 1)) != 0) {
        return r;
    }
    if ((pp = page_lookup(srcenv->env_pgdir, srcva, &pte)) == NULL) {
        return -E_INVAL;
    }
    if ((*pte & PTE_W) == 0 && (perm & PTE_W) == PTE_W) {
        return -E_INVAL;
    }
    if ((r = page_insert(dstenv->env_pgdir, pp, dstva, perm)) != 0) {
        return r;
    }
    return 0;
}

The last one sys_page_unmap() is another easy one, just write follow the comment.

static int
sys_page_unmap(envid_t envid, void *va)
{
    struct Env *e;
    int r;

    if ((uint32_t)va >= UTOP || PGOFF(va) != 0) {
        return -E_INVAL;
    }
    if ((r = envid2env(envid, &e, 1)) != 0) {
        return r;
    }
    page_remove(e->env_pgdir, va);
    return 0;
}

At last, don't forget to modify syscall() in kern/syscall.c to dispatch the syscalls above.

case SYS_exofork:
    return sys_exofork();
case SYS_env_set_status:
    return sys_env_set_status(a1, a2);
case SYS_page_alloc:
    return sys_page_alloc(a1, (void *)a2, a3);
case SYS_page_map:
    return sys_page_map(a1, (void *)a2, a3, (void *)a4, a5);
case SYS_page_unmap:
    return sys_page_unmap(a1, (void *)a2);

Now dumbfork test in make grade will be passed.

Exercise 8

Now it comes to Part B, the hardest part of this lab, also the hardest part so far. But this exercise is as simple as the previous one. Read the comment thoroughly will give you all you need.

static int
sys_env_set_pgfault_upcall(envid_t envid, void *func)
{
    struct Env *e;
    int r;

    if ((r = envid2env(envid, &e, 1)) != 0) {
        return r;
    }
    e->env_pgfault_upcall = func;
    return 0;
}

Also, don't forget to add a case in syscall().

case SYS_env_set_pgfault_upcall:
    return sys_env_set_pgfault_upcall(a1, (void *)a2);

Exercise 9

Now we need to set up the UTrapframe structure and save the trap-time state in it for the copy-on-write page fault handler. Though the comment in page_fault_handler() is very long, it worth reading as such a long comment contains almost everything we need. You should also read the lab page thoroughly. What we need to do can be concluded into the 5 steps following:

Check whether the environment's page fault upcall exists. If not, destroy the environment.
Find where the page fault stack frame UTrapframe should be located. If it is not caused by another page fault, it is right below UXSTACKTOP. Or it is below the previous stack frame, a 32-bit should be pushed first. To test whether tf->tf_esp is already on the user exception stack, check whether it is in the range between UXSTACKTOP-PGSIZE and UXSTACKTOP-1, inclusive.
Check whether the environment allocated a page for its exception stack, the permission to write it and whether the exception stack overflows. If one of these situations occurs, destroy the environment. This can be done with calling user_mem_assert() as the hint says.
Set up the UTrapframe, copy the register values, error code and fault_va into it. Most of these values are in the TrapFrame and fault_va is provided.
Run the page fault handler with UTrapframe in user mode. So we need to modify the esp and eip of TrapFrame and call env_run() to give control back to the user environment and run the page fault handler.

Here is my code:

void
page_fault_handler(struct Trapframe *tf)
{
    uint32_t fault_va;

    // Read processor's CR2 register to find the faulting address
    fault_va = rcr2();

    // Handle kernel-mode page faults.
    if ((tf->tf_cs & 0x3) == 0) {
        panic("page_fault_handler: page fault in kernel mode");
    }

    // We've already handled kernel-mode exceptions, so if we get here,
    // the page fault happened in user mode.
    if (curenv->env_pgfault_upcall) {
        struct UTrapframe *utf;

        // Determine the location
        if (tf->tf_esp >= UXSTACKTOP - PGSIZE && tf->tf_esp < UXSTACKTOP) {
            *(uint32_t *)(tf->tf_esp - 4) = 0;  // push an empty 32-bit word
            utf = (struct UTrapframe *)(tf->tf_esp - 4 - sizeof(struct UTrapframe));
        } else {
            utf = (struct UTrapframe *)(UXSTACKTOP - sizeof(struct UTrapframe));
        }

        // Check permission
        user_mem_assert(curenv, (void *)utf, sizeof(struct UTrapframe), PTE_W | PTE_U);

        // Set up the user trap frame
        utf->utf_esp = tf->tf_esp;
        utf->utf_eflags = tf->tf_eflags;
        utf->utf_eip = tf->tf_eip;
        utf->utf_regs = tf->tf_regs;
        utf->utf_err = tf->tf_err;
        utf->utf_fault_va = fault_va;

        // Switch the environment
        tf->tf_esp = (uint32_t)utf;
        tf->tf_eip = (uint32_t)curenv->env_pgfault_upcall;
        env_run(curenv);
    }

    // Destroy the environment that caused the fault.
    cprintf("[%08x] user fault va %08x ip %08x\n",
        curenv->env_id, fault_va, tf->tf_eip);
    print_trapframe(tf);
    env_destroy(curenv);
}

There is also a simple question to answer, "What happens if the user environment runs out of space on the exception stack?". The comment of page_fault_handler() has already given us the answer. If it runs out of space, it cannot pass the user_mem_assert() and the function will destroy the environment that caused the fault.

Exercise 10

This is really a hard one since what we need to write here is all x86 assembly code here. The _pgfault_upcall routine in lib/pfentry.S has already implemented the procedure to call the page fault handler. What we need to write here is the procedure to return to the original point in the user code that caused the page fault, so we must have a clear view of how a control transfer is performed at the machine level.

The following figure from Figure 3.26 of CS:APP3e shows how a usual call-return procedure is performed on x86-64. x86 is very similar to this, except for the register name.

The call instruction will push the next instruction of the caller to be executed in eip onto the stack, then create a new stack frame for the callee. The callee will push registers to save like ebp onto the stack. After the callee has done its job, it will pop these saved registers back, and use a ret instruction to give control back to the caller. The ret works the same as pop %eip.

Here since it is not a usual call-return. We need to modify the caller's stack frame to "push" the eip onto the stack frame of the caller. That is to say, we need to enlarge the stack of the caller for 4 bytes to put the saved eip. And that's why we need the empty 32-bit word for nested page faults. Then we need to manually pop back the saved registers in a user exception stack frame, and finally, give control back to where the page fault is encountered.

So what we need to adjust the stack like this:

Before:
   Previous Frame                 User Trap Frame
+------------------+            +------------------+
| stack data       |      +---- | trap-time esp    |
| ...              |      |     | trap-time eflags |
+------------------+ <----+     | trap-time eip    |
                                | trap-time eax    |  
                                | ...              |
                                | trap-time esi    |
                                | trap-time edi    |  
                                | tf_err           |
                                | fault_va         |
                                +------------------+  <-- %esp
After:
   Previous Frame                 User Trap Frame
+------------------+            +------------------+
| stack data       |      +---- | trap-time esp-4  |
| ...              |      |     | trap-time eflags |
| trap-time eip    |      |     | trap-time eip    |
+------------------+ <----+     | trap-time eax    |  
                                | ...              |
                                | trap-time esi    |
                                | trap-time edi    |  
                                | tf_err           |
                                | fault_va         |
                                +------------------+  <-- %esp

Beware that previous stack frame can either a normal user stack frame or an exception stack frame. So we cannot use move the trap-time eip directly to 0x34(%esp).

And this is my implementation.

// Save trap-time eip next to previous stack (that's why we need the empty dword)
movl 0x30(%esp), %ecx    // save trap-time esp in ecx
subl $4, %ecx            // enlarge the previous stack for 4 bytes
movl %ecx, 0x30(%esp)    // write the modified esp back
movl 0x28(%esp), %edx    // save trap-time eip in edx
movl %edx, (%ecx)        // save eip at new esp for return

// Restore the trap-time registers.  After you do this, you
// can no longer modify any general-purpose registers.
addl $8, %esp            // skip fault_va and tf_err
popal                    // pop PushRegs

// Restore eflags from the stack.  After you do this, you can
// no longer use arithmetic operations or anything else that
// modifies eflags.
addl $4, %esp            // skip eip
popfl                    // pop eflags

// Switch back to the adjusted trap-time stack.
pop %esp

// Return to re-execute the instruction that faulted.
ret

Exercise 11

This exercise is not difficult. Since it is in user mode, we need to use the syscalls implemented in Exercise 7 & 8 to implement this. Also remember to handle errors.

void
set_pgfault_handler(void (*handler)(struct UTrapframe *utf))
{
    int r;

    if (_pgfault_handler == 0) {
        // First time through!
        if ((r = sys_page_alloc(thisenv->env_id, (void *)(UXSTACKTOP - PGSIZE), PTE_W | PTE_U | PTE_P)) != 0) {
            panic("set_pgfault_handler: %e", r);
        }
        if ((r = sys_env_set_pgfault_upcall(thisenv->env_id, _pgfault_upcall)) != 0) {
            panic("set_pgfault_handler: %e", r);
        }
    }

    // Save handler pointer for assembly to call.
    _pgfault_handler = handler;
}

Now run make grade we shall pass all tests in Part B except for forktree.

Exercise 12

After doing so much prepare work, now we are going to implement the copy-on-write fork.

Before coding, it is suggested to take a look at the uvpt[] and uvpd[] we will use later here. Here, JOS uses a clever mapping trick which maps the page table to UVPT so that user code can read the PTE and PDE contents. x86 processors will read the 2 level page table to access a virtual memory address. If we access the address at UVPT+i, it will return us the PTE, and the offset i is the page number. For example, if we access 0xef410100, the processor will first read the PDE, the 0x3bd entry of page directory, then the PTE, the 0x10 entry of page table, then the content, the 0x100 entry of the page. Here the PDE points to the page table, so the PTE being read is the same as PDE, and the content is the PTE of 0x04040000~0x04040fff. The offset is the DIR and the PAGE part of the virtual address, so it is the page number. In the example, uvpt[0x10100] is the PTE for 0x04040000~0x04040fff. Specially, if we access uvpt + UVPT >> 10, it will return the PDE since the content is still the PDE, and the offset here is the page directory index.

Below shows what is at UVPT:

     0                 4                 8                                ...
     +-----------------+-----------------+-----+----------------------+-----+
UVPT | PTEs for 0x0000 | PTEs for 0x1000 | ... | PTEs for UVPT / PDEs | ... | UVPT+PGSIZE
     +-----------------+-----------------+-----+----------------------+-----+
     ^ uvpt[]                                  ^ uvpd[]

It is very confusing and hard to understand. The following figure may help you understand it.'

In this figure, the circular arrow is followed once for uvpt[] and twice for uvpd[].

Now we can proceed to coding. The first one to implement is fork(). It is very similar to dumbfork(), except how the copy is performed. As the comment says we also need to set up page fault handler, create a child and allocate a new page for the child's user exception stack. A page fault upcall also needs to be set, and that's we will easily neglect here, including me. And here is my implemention.

envid_t
fork(void)
{
    envid_t envid;
    uint32_t addr;
    int r;

    set_pgfault_handler(pgfault);
    envid = sys_exofork();
    if (envid < 0) {
        panic("sys_exofork: %e", envid);
    }
    if (envid == 0) {
        // fix thisenv in child
        thisenv = &envs[ENVX(sys_getenvid())];
        return 0;
    }

    // copy the address space mappings to child
    for (addr = 0; addr < USTACKTOP; addr += PGSIZE) {
        if ((uvpd[PDX(addr)] & PTE_P) == PTE_P && (uvpt[PGNUM(addr)] & PTE_P) == PTE_P) {
            duppage(envid, PGNUM(addr));
        }
    }

    // allocate new page for child's user exception stack
    void _pgfault_upcall();

    if ((r = sys_page_alloc(envid, (void *)(UXSTACKTOP - PGSIZE), PTE_W | PTE_U | PTE_P)) != 0) {
        panic("fork: %e", r);
    }
    if ((r = sys_env_set_pgfault_upcall(envid, _pgfault_upcall)) != 0) {
        panic("fork: %e", r);
    }

    // mark the child as runnable
    if ((r = sys_env_set_status(envid, ENV_RUNNABLE)) != 0)
        panic("fork: %e", r);

    return envid;
}

duppage() is not difficult. Here we map the page for the child environment. If the page is writable, we need to map it in both environments as a COW page so that they won't interfere with each other, and if it is read-only, simply map it in the child. Here we should use the syscalls we implemented in Exercise 8.

static int
duppage(envid_t envid, unsigned pn)
{
    envid_t parent_envid = sys_getenvid();
    void *va = (void *)(pn * PGSIZE);
    int r;

    if ((uvpt[pn] & PTE_W) == PTE_W || (uvpt[pn] & PTE_COW) == PTE_COW) {
        if ((r = sys_page_map(parent_envid, va, envid, va, PTE_COW | PTE_U | PTE_P)) != 0) {
            panic("duppage: %e", r);
        }
        if ((r = sys_page_map(parent_envid, va, parent_envid, va, PTE_COW | PTE_U | PTE_P)) != 0) {
            panic("duppage: %e", r);
        }
    } else {
        if ((r = sys_page_map(parent_envid, va, envid, va, PTE_U | PTE_P)) != 0) {
            panic("duppage: %e", r);
        }
    }

    return 0;
}

The last one pgfault() uses syscalls to copy the page on page faults. The comment provides many hints so it is easy to implement.

static void
pgfault(struct UTrapframe *utf)
{
    void *addr = (void *) utf->utf_fault_va;
    uint32_t err = utf->utf_err;
    pte_t pte = uvpt[PGNUM(addr)];
    envid_t envid = sys_getenvid();
    int r;

    if ((err & FEC_WR) == 0 || (pte & PTE_COW) == 0) {
        panic("pgfault: bad faulting access\n");
    }
    if ((r = sys_page_alloc(envid, PFTEMP, PTE_W | PTE_U | PTE_P)) != 0) {
        panic("pgfault: %e", r);
    }
    memcpy(PFTEMP, ROUNDDOWN(addr, PGSIZE), PGSIZE);
    if ((r = sys_page_map(envid, PFTEMP, envid, ROUNDDOWN(addr, PGSIZE), PTE_W | PTE_U | PTE_P)) != 0) {
        panic("pgfault: %e", r);
    }
    if ((r = sys_page_unmap(envid, PFTEMP)) != 0) {
        panic("pgfault: %e", r);
    }
}

Now forktree test in make grade will be passed.

Exercise 13

This part is much simpler than Part B. This exercise requires to add more interrupt handler in kern/trapentry.S and kern/trap.c:trap(). Adding a handler is very simple. If you forget how to do this, see Exercise 4 of Lab 3.

Section 6.15 of Intel® 64 and IA-32 Architectures Software Developer's Manual: Volume 3 indicates that user-defined interrupts don't contain an error code.

In kern/trapentry.S:

@@ -66,3 +66,9 @@ TRAPHANDLER(th_align, T_ALIGN)
 TRAPHANDLER_NOEC(th_mchk, T_MCHK)
 TRAPHANDLER_NOEC(th_simderr, T_SIMDERR)
 TRAPHANDLER_NOEC(th_syscall, T_SYSCALL)
+TRAPHANDLER_NOEC(th_irq_timer, IRQ_OFFSET + IRQ_TIMER)
+TRAPHANDLER_NOEC(th_irq_kbd, IRQ_OFFSET + IRQ_KBD)
+TRAPHANDLER_NOEC(th_irq_serial, IRQ_OFFSET + IRQ_SERIAL)
+TRAPHANDLER_NOEC(th_irq_spurious, IRQ_OFFSET + IRQ_SPURIOUS)
+TRAPHANDLER_NOEC(th_irq_ide, IRQ_OFFSET + IRQ_IDE)
+TRAPHANDLER_NOEC(th_irq_error, IRQ_OFFSET + IRQ_ERROR)

In kern/trap.c

@@ -90,3 +90,9 @@ trap_init(void)
        void th_mchk();
        void th_simderr();
        void th_syscall();
+       void th_irq_timer();
+       void th_irq_kbd();
+       void th_irq_serial();
+       void th_irq_spurious();
+       void th_irq_ide();
+       void th_irq_error();
@@ -110,3 +116,9 @@ trap_init(void)
        SETGATE(idt[T_MCHK], 0, GD_KT, &th_mchk, 0);
        SETGATE(idt[T_SIMDERR], 0, GD_KT, &th_simderr, 0);
        SETGATE(idt[T_SYSCALL], 0, GD_KT, &th_syscall, 3);
+       SETGATE(idt[IRQ_OFFSET + IRQ_TIMER], 0, GD_KT, &th_irq_timer, 0);
+       SETGATE(idt[IRQ_OFFSET + IRQ_KBD], 0, GD_KT, &th_irq_kbd, 0);
+       SETGATE(idt[IRQ_OFFSET + IRQ_SERIAL], 0, GD_KT, &th_irq_serial, 0);
+       SETGATE(idt[IRQ_OFFSET + IRQ_SPURIOUS], 0, GD_KT, &th_irq_spurious, 0);
+       SETGATE(idt[IRQ_OFFSET + IRQ_IDE], 0, GD_KT, &th_irq_ide, 0);
+       SETGATE(idt[IRQ_OFFSET + IRQ_ERROR], 0, GD_KT, &th_irq_error, 0);

We also need to enable interrupts in user mode. This can be done by adding this to kern/env.c:env_alloc().

@@ -256,2 +256,2 @@ env_alloc(struct Env **newenv_store, envid_t parent_id)
        // Enable interrupts while in user mode.
-       // LAB 4: Your code here.
+       e->env_tf.tf_eflags |= FL_IF;

And don't forget to uncomment the sti instruction in kern/sched.c:sched_halt() so that idle CPUs unmask interrupts.

Exercise 14

Now we are asked to modify kern/trap.c:trap_dispatch() to call sched_yield() to handle a clock interrupt. Simply add a case would do this. Remember to call lapic_eoi() as the comment says.

case IRQ_OFFSET + IRQ_TIMER:
    lapic_eoi();
    sched_yield();
    return;

Now it will pass spin and stresssched test if we run make grade.

Exercise 15

Implementing sys_ipc_recv() in kern/syscall.c is relativly easy. Althogh the comment says we need to give up the CPU, we do not need to explicitly call sched_yield() to do this since marking the environment not runnable will do this implicitly, and if we call it, the return value of syscall cannot be passed to the user environment.

static int
sys_ipc_recv(void *dstva)
{
    if ((uint32_t)dstva < UTOP && PGOFF(dstva) != 0) {
        return -E_INVAL;
    }
    curenv->env_ipc_recving = 1;
    curenv->env_ipc_dstva = dstva;
    curenv->env_status = ENV_NOT_RUNNABLE;
    return 0;
}

sys_ipc_try_send() is very simlar to sys_page_alloc() and sys_page_map() we implemented in Exercise 7. And we should carefully check all the situations as the comment says. Here it should not panic if srcva >= UTOP.

One error was not clearly described in the comment. -E_INVAL should be returned if (perm & PTE_W) and srcva < UTOP, but srcva is read-only in the current environment's address space. Though here it won't make any trouble, it will lead to a bug in Exercise 6 of Lab 5, and it is not easy to find this bug. (Luckily, I found it :)

static int
sys_ipc_try_send(envid_t envid, uint32_t value, void *srcva, unsigned perm)
{
    struct Env *e;
    struct PageInfo *pp;
    pte_t *pte;
    int r;

    if ((r = envid2env(envid, &e, 0)) != 0) {
        return r;
    }
    if (e->env_ipc_recving == 0) {
        return -E_IPC_NOT_RECV;
    }
    if ((uint32_t)srcva < UTOP) {
        if (PGOFF(srcva) != 0) {
            return -E_INVAL;
        }
        if ((perm & (PTE_U | PTE_P)) != (PTE_U | PTE_P)) {
            return -E_INVAL;
        }
        if ((perm & ~(PTE_SYSCALL)) != 0) {
            return -E_INVAL;
        }
        if ((pp = page_lookup(curenv->env_pgdir, srcva, &pte)) == NULL) {
            return -E_INVAL;
        }
        if ((*pte & PTE_W) == 0 && (perm & PTE_W) == PTE_W) {
            return -E_INVAL;
        }
        if ((r = page_insert(e->env_pgdir, pp, e->env_ipc_dstva, perm)) != 0) {
            return r;
        }
        e->env_ipc_perm = perm;
    } else {
        e->env_ipc_perm = 0;
    }
    e->env_ipc_recving = 0;
    e->env_ipc_from = curenv->env_id;
    e->env_ipc_value = value;
    e->env_status = ENV_RUNNABLE;
    return 0;
}

Then add some dispatchers in syscall().

case SYS_ipc_recv:
    return sys_ipc_recv((void *)a1);
case SYS_ipc_try_send:
    return sys_ipc_try_send(a1, a2, (void *)a3, a4);

Now we are going to implement the APIs for user programs in lib/ipc.c. ipc_recv() is the wrapper function for sys_ipc_recv(). The comment provides the detailed procedure. And if pg is null, we should pass UTOP as pg since the syscalls will just ignore the address and it will not panic the kernel.

int32_t
ipc_recv(envid_t *from_env_store, void *pg, int *perm_store)
{
    int r;

    if (pg == NULL) {
        pg = (void *)UTOP;
    }
    if ((r = sys_ipc_recv(pg)) < 0) {
        if (from_env_store != NULL) {
            *from_env_store = 0;
        }
        if (perm_store != NULL) {
            *perm_store = 0;
        }
        return r;
    }
    if (from_env_store != NULL) {
        *from_env_store = thisenv->env_ipc_from;
    }
    if (perm_store != NULL) {
        *perm_store = thisenv->env_ipc_perm;
    }
    return thisenv->env_ipc_value;
}

ipc_send() is very similar to ipc_recv(). Because it keeps trying until it succeeds and it should try at least once, a do-while loop is perfectly suitable here, though do-while loops are seldom used in most cases.

void
ipc_send(envid_t to_env, uint32_t val, void *pg, int perm)
{
    int r;

    if (pg == NULL) {
        pg = (void *)UTOP;
    }
    do {
        r = sys_ipc_try_send(to_env, val, pg, perm);
        if (r < 0 && r != -E_IPC_NOT_RECV) {
            panic("ipc_send: %e", r);
        }
        sys_yield();
    } while(r != 0);
}

Now run make grade we shall pass all the tests and get 80 points like this:

dumbfork: OK (3.0s) 
Part A score: 5/5

faultread: OK (2.1s) 
faultwrite: OK (2.1s) 
faultdie: OK (2.1s) 
faultregs: OK (2.1s) 
faultalloc: OK (2.0s) 
faultallocbad: OK (2.9s) 
faultnostack: OK (3.1s) 
faultbadhandler: OK (3.0s) 
faultevilhandler: OK (3.1s) 
forktree: OK (2.2s) 
Part B score: 50/50

spin: OK (2.9s) 
stresssched: OK (3.6s) 
sendpage: OK (3.1s) 
pingpong: OK (3.2s) 
primes: OK (6.9s) 
Part C score: 25/25

Score: 80/80

MIT 6.828 labs walkthroughs: Lab 4 Preemptive Multitasking