Exercise 1
A pretty easy one, what needs to be done is to add a call to time_tick()
in clock interrupt handler. time_tick()
will increase the system-wide time ticks by one, which means 10ms has passed. The comment says that clock interrupts are triggered on every CPU, so we should only allow one CPU to call time_tick()
. To tell which logical CPU it is, we can use thiscpu->cpu_id
or cpunum()
, here I let CPU0 call time_tick()
since CPU0 is always present.
@@ -252,4 +252,10 @@ trap_dispatch(struct Trapframe *tf)
case IRQ_OFFSET + IRQ_TIMER:
lapic_eoi();
+ // Add time tick increment to clock interrupts.
+ // Be careful! In multiprocessors, clock interrupts are
+ // triggered on every CPU.
+ if (thiscpu->cpu_id == 0) {
+ time_tick();
+ }
sched_yield();
return;
Then implement sys_time_msec()
in kern/syscall.c
.
// Return the current time.
static int
sys_time_msec(void)
{
return time_msec();
}
And a dispatcher in syscall()
.
case SYS_time_msec:
return sys_time_msec();
Exercise 3
JOS uses a pci_driver
array pci_attach_vendor
to keep track of the PCI devices to mount. If it finds a device with both its vendor ID (VEN) and device ID (DEV) matches an entry in pci_attach_vendor
, it will call the device's attachfn
to attach it. To attach the Intel E1000 NIC, we need three elements, its VEN, DEV, and attach function.
In Exercise 2, we know QEMU emulates 82540EM, and, of course, the vendor is Intel. We can just copy the corresponding definitions from Linux's e1000 kernel module, to be more exact, e1000_hw.h
, and paste it to our e1000.h
.
#define E1000_VEN_ID 0x8086
#define E1000_DEV_ID_82540EM 0x100E
Then we need to implement the attach function, for now, it only calls pci_func_enable()
to enable the E1000. Here I put it in e1000.c
.
int
e1000_attach(struct pci_func *pcif)
{
pci_func_enable(pcif);
return 0;
}
Then add an entry to pci_attach_vendor
.
@@ -33,3 +33,4 @@ struct pci_driver pci_attach_class[] = {
struct pci_driver pci_attach_vendor[] = {
+ { E1000_VEN_ID, E1000_DEV_ID_82540EM, &e1000_attach },
{ 0, 0, 0 },
};
That's it, now if we boot the kernel, we will see this message:
PCI function 00:03.0 (8086:100e) enabled
And it will pass the pci attach
test of make grade
.
Exercise 4
pci_func_enable()
has set up the MMIO region for E1000, which is the physical memory area starting from reg_base[0]
with a size of reg_size[0]
. To access this area from our driver, we must map this area in the kernel virtual address space. Similar to mapping the LAPIC region in kern/lapic.c
, we can map this area by simply calling mmio_map_region()
. Here I save the E1000 MMIO base address in a global variable e1000_base
, and create the mapping in my attach function e1000_attach()
.
volatile void *e1000_base;
int
e1000_attach(struct pci_func *pcif)
{
pci_func_enable(pcif);
e1000_base = mmio_map_region(pcif->reg_base[0], pcif->reg_size[0]);
return 0;
}
The exercise says we can test the mapping by printing the device status register and check if the value is 0x80080783
. To read a register, first, we need to know the address of it. Same as what we did in the last exercise, we can copy the definition from e1000_hw.h
and paste it to our e1000.h
.
#define E1000_STATUS 0x00008 /* Device Status - RO */
Then I defined a macro E1000_REG()
to access the register value by accessing the corresponding MMIO address in e1000.c
.
#define E1000_REG(offset) (*(volatile uint32_t *)(e1000_base + offset))
Casting the address to uint32_t *
is just fine as all registers in E1000 are 32 bits. And here the address must be cast to volatile pointer. The reason is simple, the value at e1000_base
is volatile never means the value at e1000_base + offset
is also volatile.
Now the value of the device status register can be printed out easily. I print it in e1000_attach()
by adding this line.
cprintf("e1000: status 0x%08x\n", E1000_REG(E1000_STATUS));
And the kernel will print this on boot, meaning that the mapping is OK.
e1000: status 0x80080783
Exercise 5
To initialize the transmit, as the lab instruction and 8254x Developer's Manual section 14.5 says, we need the following steps:
- Allocate memory for transmit packet buffers.
- Allocate memory for E1000's transmit descriptor list, and set up the buffer address. This address should be physical address because hardware performs DMA directly to and from physical RAM without going through the MMU.
- Set Transmit Descriptor Base Address (TDBAL/TDBAH) registers to the starting address of the transmit descriptor list. This address is also a physical address. TDBAL is the lower 32 bits and TDBAH is the higher 32 bits. Since JOS is a 32-bit OS, the TDBAH should set to 0.
- Set the Transmit Descriptor Length (TDLEN) register to the size of the transmit descriptor list. Since TDLEN must be 128-byte aligned and each transmit descriptor is 16 bytes, so the number of transmit descriptors must be a multiple of 8. The lab instruction says is should <= 64, so here I will use 64.
- Set both the Transmit Descriptor Head and Tail (TDH/TDT) registers to 0.
- Initialize the Transmit Control Register (TCTL) with the following steps.
- Set the Enable (TCTL.EN) bit to 1.
- Set the Pad Short Packets (TCTL.PSP) bit to 1/
- Set the Collision Threshold (TCTL.CT) to 10h.
- Set the Collision Distance (TCTL.COLD) for full-duplex operation, which is 40h.
- Program the Transmit IPG (TIPG) register for the IEEE 802.3 standard IPG, which is
- Set the IPG Transmit Time (TIPG.IPGT) to 10.
- Set the IPG Receive Time 1 (TIPG.IPGR1) to 2/3 of IPGR2, which is 4.
- Set the IPG Receive Time 2 (TIPG.IPGR2) to 6.
First, we need lots of definitions. Luckily, the lab page has provided us the definition of the legacy transmit descriptor and most of the others can be copied from Linux e1000 kernel module, we can just paste them to our e1000.h
.
#define E1000_TCTL 0x00400 /* TX Control - RW */
#define E1000_TIPG 0x00410 /* TX Inter-packet gap -RW */
#define E1000_TDBAL 0x03800 /* TX Descriptor Base Address Low - RW */
#define E1000_TDBAH 0x03804 /* TX Descriptor Base Address High - RW */
#define E1000_TDLEN 0x03808 /* TX Descriptor Length - RW */
#define E1000_TDH 0x03810 /* TX Descriptor Head - RW */
#define E1000_TDT 0x03818 /* TX Descripotr Tail - RW */
/* Transmit Control */
#define E1000_TCTL_EN 0x00000002 /* enable tx */
#define E1000_TCTL_PSP 0x00000008 /* pad short packets */
#define E1000_TCTL_CT 0x00000ff0 /* collision threshold */
#define E1000_TCTL_COLD 0x003ff000 /* collision distance */
/* Collision related configuration parameters */
#define E1000_COLLISION_THRESHOLD 0x10
#define E1000_CT_SHIFT 4
/* Collision distance is a 0-based value that applies to half-duplex-capable hardware only. */
#define E1000_COLLISION_DISTANCE 0x40
#define E1000_COLD_SHIFT 12
/* Default values for the transmit IPG register */
#define E1000_DEFAULT_TIPG_IPGT 10
#define E1000_DEFAULT_TIPG_IPGR1 4
#define E1000_DEFAULT_TIPG_IPGR2 6
#define E1000_TIPG_IPGT_MASK 0x000003FF
#define E1000_TIPG_IPGR1_MASK 0x000FFC00
#define E1000_TIPG_IPGR2_MASK 0x3FF00000
#define E1000_TIPG_IPGR1_SHIFT 10
#define E1000_TIPG_IPGR2_SHIFT 20
Then we need to allocate memory for both the transmit packet buffer and transmit descriptor list. Here I simply use global variables to store them. They are fixed size, linked in .bss
section and initialized to 0. Although it says the size of each transmit packet buffer can be just fit an Ethernet frame (1518 bytes), here, for better performance, I use 1536 bytes, which is 16-byte aligned.
#define TX_BUF_SIZE 1536 // 16-byte aligned for performance
#define NTXDESC 64
static struct e1000_tx_desc e1000_tx_queue[NTXDESC] __attribute__((aligned(16)));
static uint8_t e1000_tx_buf[NTXDESC][TX_BUF_SIZE];
Then write the initialization function e1000_tx_init()
.
static void
e1000_tx_init()
{
// initialize tx queue
int i;
memset(e1000_tx_queue, 0, sizeof(e1000_tx_queue));
for (i = 0; i < NTXDESC; i++) {
e1000_tx_queue[i].addr = PADDR(e1000_tx_buf[i]);
}
// initialize transmit descriptor registers
E1000_REG(E1000_TDBAL) = PADDR(e1000_tx_queue);
E1000_REG(E1000_TDBAH) = 0;
E1000_REG(E1000_TDLEN) = sizeof(e1000_tx_queue);
E1000_REG(E1000_TDH) = 0;
E1000_REG(E1000_TDT) = 0;
// initialize transmit control registers
E1000_REG(E1000_TCTL) &= ~(E1000_TCTL_CT | E1000_TCTL_COLD);
E1000_REG(E1000_TCTL) |= E1000_TCTL_EN | E1000_TCTL_PSP |
(E1000_COLLISION_THRESHOLD << E1000_CT_SHIFT) |
(E1000_COLLISION_DISTANCE << E1000_COLD_SHIFT);
E1000_REG(E1000_TIPG) &= ~(E1000_TIPG_IPGT_MASK | E1000_TIPG_IPGR1_MASK | E1000_TIPG_IPGR2_MASK);
E1000_REG(E1000_TIPG) |= E1000_DEFAULT_TIPG_IPGT |
(E1000_DEFAULT_TIPG_IPGR1 << E1000_TIPG_IPGR1_SHIFT) |
(E1000_DEFAULT_TIPG_IPGR2 << E1000_TIPG_IPGR2_SHIFT);
}
And call it in the attach function.
int
e1000_attach(struct pci_func *pcif)
{
pci_func_enable(pcif);
e1000_base = mmio_map_region(pcif->reg_base[0], pcif->reg_size[0]);
e1000_tx_init();
return 0;
}
Now run make E1000_DEBUG=TXERR,TX qemu-nox
, it will print a e1000: tx disabled
message.
Exercise 6
Before coding, we need to know how E1000's transmit descriptor list works. Figure 3-4 in 8254x Developer's Manual shows its structure.
The next transmit descriptor in the list is indicated by TDT register. To tell if a descriptor is unused or can be reused, we can check its status field. When transmitting a packet, we set the RS bit in the command field, and when the E1000 NIC has transmitted this packet, it will set the DD bit in status, meaning the descriptor can be reused. And if a descriptor is never used, its RS bit will be 0. These definitions can also be copied from the e1000 kernel module.
/* Transmit Descriptor bit definitions */
#define E1000_TXD_CMD_EOP 0x01 /* End of Packet */
#define E1000_TXD_CMD_RS 0x08 /* Report Status */
#define E1000_TXD_STAT_DD 0x01 /* Descriptor Done */
Now we can implement the transmit function. Here I call it e1000_transmit()
.
int
e1000_transmit(const void *buf, size_t size)
{
int tail = E1000_REG(E1000_TDT);
if (size > ETH_PKT_SIZE) {
return -E_PKT_TOO_LARGE;
}
if ((e1000_tx_queue[tail].cmd & E1000_TXD_CMD_RS) && !(e1000_tx_queue[tail].status & E1000_TXD_STAT_DD)) {
return -E_TX_FULL;
}
e1000_tx_queue[tail].status &= ~E1000_TXD_STAT_DD;
memcpy(e1000_tx_buf[tail], buf, size);
e1000_tx_queue[tail].length = size;
e1000_tx_queue[tail].cmd |= E1000_TXD_CMD_RS | E1000_TXD_CMD_EOP;
E1000_REG(E1000_TDT) = (tail + 1) % NTXDESC;
return 0;
}
This function accepts two arguments, the pointer to the packet to send and its size. First, it checks if the packet fits in a ethernet frame (1518 bytes). If not, it returns a E_PKT_TOO_LARGE
error. Then it checks if the descriptor is free. If not, it returns a E_TX_FULL
error. Both of these two errors are defined in inc/error.h
. Then it clears the DD bit, copies the packet to the transmit packet buffer, sets its size and RS, EOP command bit, moves TDT to the next descriptor. Now the packet is ready to be sent by the NIC.
Here, although it is not mentioned in the lab's instruction, the EOP bit of CMD needs to be set to tell the NIC that the descriptor contains a complete packet to send. And if here EOP bit is not set, we won't get the expected result when running make E1000_DEBUG=TXERR,TX qemu-nox
.
Now if we send a packet from the kernel, for example, add the following code to attach function e1000_attach()
after e1000_tx_init()
.
char *str = "hello";
e1000_transmit(str, 6);
Then run make E1000_DEBUG=TXERR,TX qemu-nox
, we will see something like
e1000: index 0: 0x2f4e80 : 9000006 0
This means the code to transmit packets works fine.
Exercise 7
Adding syscalls are easy, we have done it thousands of times.
First, add a syscall type in inc/syscall.h
, here I call it SYS_net_send
.
Then add a syscall handler in kern/syscall.c
, I call it sys_net_send()
. Inside the handler, I use user_mem_assert()
function to check whether buf
is a valid pointer in user space. If not, the kernel will simply kill the user process.
// Transmit a packet from user space
//
// Return 0 on success, < 0 on error. Errors are:
// -E_E_PKT_TOO_LARGE if packet size exceeds tx buffer size (1536 bytes).
// -E_TX_FULL if tx queue is full.
static int
sys_net_send(const void* buf, size_t size)
{
// segfault when address of buf is invalid
user_mem_assert(curenv, buf, size, PTE_U);
return e1000_transmit(buf, size);
}
Then add a dispatcher in kern/syscall.c:syscall()
case SYS_net_send:
return sys_net_send((void *)a1, a2);
Then add a syscall in lib/syscall.c
.
int
sys_net_send(const void* buf, size_t size)
{
return syscall(SYS_net_send, 0, (uint32_t)buf, size, 0, 0, 0);
}
Also declare it in inc/lib.h
, otherwise, it cannot be used in user programs.
@@ -60,3 +60,4 @@ // syscall.c
int sys_ipc_try_send(envid_t to_env, uint32_t value, void *pg, int perm);
int sys_ipc_recv(void *rcv_pg);
unsigned int sys_time_msec(void);
+int sys_net_send(const void* buf, size_t size);
Exercise 8
The lab instruction has told what we need to code in net/output.c
in detail. The output
user program accepts a packet from another user process and sends it to E1000 NIC with the syscall we implemented in last exercise. fs/serv.c:serve()
has provided an example of accepting IPC messages, we can write in a very similar way.
void
output(envid_t ns_envid)
{
binaryname = "ns_output";
uint32_t req, whom;
int r;
while (1) {
// read a packet from the network server
req = ipc_recv((int32_t *) &whom, &nsipcbuf, NULL);
// ignore non-NSREQ_OUTPUT IPC requests
if (req != NSREQ_OUTPUT) {
continue;
}
// send the packet to the device driver
// if tx queue is full, simply wait
while ((r = sys_net_send(nsipcbuf.pkt.jp_data, nsipcbuf.pkt.jp_len)) == -E_TX_FULL) {
sys_yield();
}
if (r < 0) {
// ignore oversized packets
if (r == -E_PKT_TOO_LARGE) {
cprintf("%s: packet too large (%d bytes), ingored\n", binaryname, nsipcbuf.pkt.jp_len);
continue;
} else {
panic("%s: sys_net_send(): unexpected return value %d", binaryname, r);
}
}
}
}
Here I implemented a relatively complex error handling. If the NIC's tx queue is full, it will give away its control by calling sys_yield()
. If the packet is too large (larger than a 1518-byte Ethernet frame), it will drop the packet. For other unexpected errors, it will simply panic the kernel.
Now it will pass the testoutput
tests of make grade
.
Question 1
This has already been answered in the above Exercises. The program will keep giving away its control when the transmit ring is full until there is one empty space. Whether the transmit ring is full is indicated by the return value of sys_net_send()
. When there is no more free transmit descriptors, the driver, to be more exact, e1000_transmit()
will return a E_TX_FULL
error, then the syscall will pass it to output
.
Exercise 10
Receiving packet is very similar to sending packets, so from now on, there would be less detail.
To initialize the receive, as the lab instruction and 8254x Developer's Manual section 14.4 says, we need the following steps. Since what our driver doesn't support long packets or multicast, and it is a poll mode driver that doesn't use interrupts, the related parts are ignored.
- Program Receive Address Registers (RAL/RAH) to our desired MAC address and set the Address Valid (RDH.AV) to 1. RAL contains lower 32-bit of MAC address, while RAH contains higher 16-bit. Here the MAC address is
52:54:00:12:34:56
. - Allocate memory for receive packet buffers.
- Allocate memory for E1000's receive descriptor list, and set up the buffer address. This address should also be a physical address.
- Set Receive Descriptor Base Address (RDBAL/RDBAH) registers. RDBAL should set to the physical address of the receive descriptor list, and RDBAH should set to 0.
- Set the Receive Descriptor Length (RDLEN) register to the size of the receive descriptor list. It also must be 128-byte aligned. The lab instruction says is should >= 128, here I will use 128.
- Set the Receive Descriptor Head (RDH) register to the index of the first valid receive descriptor, which is 0.
- Set the Receive Descriptor Tail (RDT) register with the index of the descriptor beyond the last valid descriptor in the descriptor ring, which is
NRXDESC - 1
. This will be explained later. - Program the Receive Control (RCTL) register (TCTL) with the following steps.
- Set the Enable (RCTL.EN) bit to 1.
- Set the Long Packet Enable (RCTL.LPE) bit to 0.
- Set Loopback Mode (RCTL.LBM) to 00b.
- Set Receive Descriptor Minimum Threshold Size (RCTL.RDMTS) to its default value by clearing it.
- Set Receive Buffer Size (RCTL.BSIZE) bits and the Buffer Extension Size (RCTL.BSEX) to its default value, which means the buffer is 2KB, by clearing them.
- Set the Strip Ethernet CRC (RCTL.SECRC) bit to 1.
The manual doesn't make it clear on how RDT should be initialized. I cannot understand what the descriptor beyond the last valid descriptor is. I tried NRXDESC
and 0 (that's how Linux e1000 driver and FreeBSD em driver initializes it), both of them does not work. DPDK (a well know framework with poll mode NIC driver for Linux) initializes RDT to the index of the last receive descriptor, I tried this way, it just works. Probably the reason is what DPDK e1000 driver's comment says, the E1000 hardware will believe the descriptor ring is full when RDT equals RDH. OSDev.org forum also has a topic about this issue.
Here the lab instruction doesn't provide the struct definition of receive descriptor, we must write our own. The receive descriptor is given in manual section 3.2.
63 48 47 40 39 32 31 16 15 0
+---------------------------------------------------------------+
| Buffer address |
+---------------+-------+--------+---------------+--------------+
| Special | Error | Status | Checksum | Length |
+---------------+-------+--------+---------------+--------------+
The corresponding C struct can be:
struct e1000_rx_desc {
uint64_t addr;
uint16_t length;
uint16_t chksum;
uint8_t status;
uint8_t err;
uint16_t special;
} __attribute__((packed));
Then copy the definition we need from Linux e1000 kernel module and paste them to e1000.h
.
#define E1000_RDBAL 0x02800 /* RX Descriptor Base Address Low - RW */
#define E1000_RDBAH 0x02804 /* RX Descriptor Base Address High - RW */
#define E1000_RDLEN 0x02808 /* RX Descriptor Length - RW */
#define E1000_RDH 0x02810 /* RX Descriptor Head - RW */
#define E1000_RDT 0x02818 /* RX Descriptor Tail - RW */
/* Receive Control */
#define E1000_RCTL_EN 0x00000002 /* enable */
#define E1000_RCTL_LBM 0x000000c0 /* loopback mode */
#define E1000_RCTL_RDMTS 0x00000300 /* rx desc min threshold size */
#define E1000_RCTL_SZ 0x00030000 /* rx buffer size */
#define E1000_RCTL_SECRC 0x04000000 /* strip ethernet CRC */
#define E1000_RCTL_BSEX 0x02000000 /* Buffer size extension */
#define E1000_RCTL_LBM_NO 0x00000000 /* no loopback mode */
#define E1000_RCTL_LBM_SHIFT 6
#define E1000_RCTL_RDMTS_HALF 0x00000000
#define E1000_RCTL_RDMTS_SHIFT 8
#define E1000_RCTL_SZ_2048 0x00000000 /* rx buffer size 2048 */
#define E1000_RCTL_SZ_SHIFT 16
Then allocate memory for the receive packet buffer and receive descriptor list. Here each receive packet buffer is 2040 bytes
#define RX_BUF_SIZE 2048
#define NRXDESC 128
static struct e1000_rx_desc e1000_rx_queue[NRXDESC] __attribute__((aligned(16)));
static uint8_t e1000_rx_buf[NRXDESC][RX_BUF_SIZE];
Then write the initialization function e1000_rx_init()
.
#define JOS_DEFAULT_MAC_LOW 0x12005452
#define JOS_DEFAULT_MAC_HIGH 0x00005634
static void
e1000_rx_init()
{
// initialize rx queue
int i;
memset(e1000_rx_queue, 0, sizeof(e1000_rx_queue));
for (i = 0; i < NRXDESC; i++) {
e1000_rx_queue[i].addr = PADDR(e1000_rx_buf[i]);
}
// initialize receive address registers
// by default, it comes from EEPROM
E1000_REG(E1000_RAL) = JOS_DEFAULT_MAC_LOW;
E1000_REG(E1000_RAH) = JOS_DEFAULT_MAC_HIGH;
E1000_REG(E1000_RAH) |= E1000_RAH_AV;
// initialize receive descriptor registers
E1000_REG(E1000_RDBAL) = PADDR(e1000_rx_queue);
E1000_REG(E1000_RDBAH) = 0;
E1000_REG(E1000_RDLEN) = sizeof(e1000_rx_queue);
E1000_REG(E1000_RDH) = 0;
E1000_REG(E1000_RDT) = NRXDESC - 1;
// initialize transmit control registers
E1000_REG(E1000_RCTL) &= ~(E1000_RCTL_LBM | E1000_RCTL_RDMTS | E1000_RCTL_SZ | E1000_RCTL_BSEX);
E1000_REG(E1000_RCTL) |= E1000_RCTL_EN | E1000_RCTL_SECRC;
}
Then add a call to e1000_rx_init()
in e1000_attch()
and run make E1000_DEBUG=TX,TXERR,RX,RXERR,RXFILTER run-net_testinput-nox
, it will print a e1000: unicast match[0]: 52:54:00:12:34:56
message.
Exercise 11
As the picture above shows, E1000's receive descriptor ring structure is the same as the transmit descriptor ring. But using it might be a bit confusing. The following figure explains this.
Here an empty buffer is also considered as a processed packet. RDH is the index of the last unprocessed packet, aka last received packet, and RDT is the index of the last processed packet. When RDH caught up with RDT, it means the list is filled with unprocessed packets, then NIC will drop all incoming frames until some packet in the buffer has been processed. On the opposite, when RDT caught up with RDH, it means all packets have been processed and it should return an error to tell the list is empty.
Whether a packet is processed or not is indicated by its descriptor's DD bit. When E1000 receives a packet, it will set the DD bit, so we need to clear DD bit and move the tail pointer forward after reading it to tell the NIC that this buffer can be reused for an incoming packet. The first packet to process is the next packet of what tail points to. RDT is the register keeping track of the next received packet.
Now we can implement the receive function. Here I call it e1000_receive()
.
int
e1000_receive(void *buf, size_t size)
{
int tail = E1000_REG(E1000_RDT);
int next = (tail + 1) % NRXDESC;
int length;
if (!(e1000_rx_queue[next].status & E1000_RXD_STAT_DD)) {
return -E_RX_EMPTY;
}
if ((length = e1000_rx_queue[next].length) > size) {
return -E_PKT_TOO_LARGE;
}
memcpy(buf, e1000_rx_buf[next], length);
e1000_rx_queue[next].status &= ~E1000_RXD_STAT_DD;
E1000_REG(E1000_RDT) = next;
return length;
}
This function if there is any unprocessed packet first. If not, it returns an E_RX_EMPTY
error. Then it checks if the buffer buf
is enough to hold the packet. If not, it returns an E_PKT_TOO_LARGE
error. Then it copies the packet to buffer, clears the DD bit, and move tail pointer forward.
The syscall is implemented in the same way as sys_net_send()
. Remember to add a syscall, we need to code in 4 different places.
// Receive a packet from network in user space
//
// Return 0 on success, < 0 on error. Errors are:
// -E_E_PKT_TOO_LARGE if packet size exceeds buffer size.
// -E_RX_EMPTY if no packet is received.
static int
sys_net_recv(void* buf, size_t size)
{
// segfault when address of buf is invalid
user_mem_assert(curenv, buf, size, PTE_U);
return e1000_receive(buf, size);
}
Exercise 12
This program fetches a packet from E1000 NIC and sends it to the network server with NSREQ_INPUT
IPC message. It can be implemented in a similar way like net/output.c
.
#define INPUT_BUFSIZE 2048
void
input(envid_t ns_envid)
{
binaryname = "ns_input";
uint8_t inputbuf[INPUT_BUFSIZE];
int r, i;
while (1) {
// clear the buffer
memset(inputbuf, 0, sizeof(inputbuf));
// read a packet from the device driver
while ((r = sys_net_recv(inputbuf, sizeof(inputbuf))) == -E_RX_EMPTY) {
sys_yield();
}
// panic if inputbuf is too small
if (r < 0) {
panic("%s: inputbuf too small", binaryname);
}
// send it to the network server
nsipcbuf.pkt.jp_len = r;
memcpy(nsipcbuf.pkt.jp_data, inputbuf, r);
ipc_send(ns_envid, NSREQ_INPUT, &nsipcbuf, PTE_P | PTE_U);
// Hint: When you IPC a page to the network server, it will be
// reading from it for a while, so don't immediately receive
// another packet in to the same physical page.
sys_yield();
}
}
To avoid a packet being overwritten before the network server reads it, this program will give away its control after sending the IPC message. This works fine since JOS only uses one CPU in this lab by default, input
won't take control immediately after it calls sys_yield()
. In a multi-CPU environment, it should call sys_yield()
at least [CPU numbers] times to ensure there is no race.
Now it will pass the testinput
tests of make grade
.
Question 2
It works the same way as transmitting packets. When the receive queue is empty, it will return an E_RX_EMPTY
error.
Exercise 13
In send_file()
, we need to open the requested file with open()
, check if it exists and if it is a directory, and call different functions to send the response. The requested file path is req->url
, and we can get the file type and size by calling fstat()
. These APIs are defined in inc/fs.c
.
static int
send_file(struct http_request *req)
{
int r;
struct Stat stat;
int fd;
// open the requested url for reading
// if the file does not exist, send a 404 error using send_error
// if the file is a directory, send a 404 error using send_error
// set file_size to the size of the file
fd = open(req->url, O_RDONLY);
if (fd < 0) {
return send_error(req, 404);
}
if ((r = fstat(fd, &stat)) < 0)
goto end;
if (stat.st_isdir) {
close(fd);
return send_error(req, 404);
}
if ((r = send_header(req, 200)) < 0)
goto end;
if ((r = send_size(req, stat.st_size)) < 0)
goto end;
if ((r = send_content_type(req)) < 0)
goto end;
if ((r = send_header_fin(req)) < 0)
goto end;
r = send_data(req, fd);
end:
close(fd);
return r;
}
send_data()
reads the file to the buffer, and write it to the socket fd until it reaches the end of file in a Unix-like way.
static int
send_data(struct http_request *req, int fd)
{
int r;
char buf[BUFFSIZE];
while ((r = read(fd, buf, BUFFSIZE)) > 0) {
if (write(req->sock, buf, r) != r) {
die("Failed to send bytes to client");
}
}
return 0;
}
Now run make grade we shall pass all the tests and get all 105 points like this:
testtime: OK (8.0s)
pci attach: OK (1.3s)
testoutput [5 packets]: OK (1.6s)
testoutput [100 packets]: OK (1.7s)
Part A score: 35/35
testinput [5 packets]: OK (2.5s)
testinput [100 packets]: OK (1.6s)
tcp echo server [echosrv]: OK (2.3s)
web server [httpd]:
http://localhost:26002/: OK (1.5s)
http://localhost:26002/index.html: OK (2.6s)
http://localhost:26002/random_file.txt: OK (1.4s)
Part B score: 70/70
Score: 105/105
Question 3
We can visit the web page with curl http://localhost:<JOS HTTP Port>/index.html
.
<html>
<head>
<title>jhttpd on JOS</title>
</head>
<body>
<center>
<h2>This file came from JOS.</h2>
<marquee>Cheesy web page!</marquee>
</center>
</body>
</html>
Challenge! Read MAC address from EEPROM
This is the easiest challenge in this lab, I think. 8254x Developer's Manual section 5.3 tells how to read EEPROM and section 5.6 tells what's in EEPROM.
The E1000 NIC keeps its hardware MAC address in EEPROM 00h~02h, from the lowest byte to the highest. Each unit in EEPROM is 16 bit long. The easiest way to read from EEPROM is to use the EEPROM Read (EERD) register.
31 16 15 2 1 0
+-------------------------------+---------------+------+-------+
| Data | Address | DONE | START |
+-------------------------------+---------------+------+-------+
Reading EEPROM with the EERD register needs the following steps.
- Write the address to read in the Read Address field (EERD.ADDR).
- Set the Start Read bit (EERD.START) to 1.
- Wait until the NIC sets the Read Done bit (EERD.DONE) to 1.
- Read data in the Read Data field (EERD.DATA).
Before writing the EEPROM reader, some definitions are needed.
#define E1000_EERD 0x00014 /* EEPROM Read - RW */
/* EEPROM Read */
#define E1000_EERD_START 0x00000001 /* Start Read */
#define E1000_EERD_DONE 0x00000010 /* Read Done */
#define E1000_EERD_ADDR_SHIFT 8
#define E1000_EERD_ADDR_MASK 0x0000FF00 /* Read Address */
#define E1000_EERD_DATA_SHIFT 16
#define E1000_EERD_DATA_MASK 0xFFFF0000 /* Read Data */
Now we can write the reader function e1000_eeprom_read()
.
static uint16_t
e1000_eeprom_read(uint8_t addr)
{
E1000_REG(E1000_EERD) &= ~(E1000_EERD_ADDR_MASK | E1000_EERD_DATA_MASK);
E1000_REG(E1000_EERD) |= (addr << E1000_EERD_ADDR_SHIFT) | E1000_EERD_START;
while (!(E1000_REG(E1000_EERD) & E1000_EERD_DONE)) // wait until read is done
;
return (E1000_REG(E1000_EERD) & E1000_EERD_DATA_MASK) >> E1000_EERD_DATA_SHIFT;
}
To get the hardware MAC address, we need to read EEPROM 3 times. I wrote a function e1000_get_hwaddr()
to read the hardware address into an uint64.
uint64_t
e1000_get_hwaddr()
{
// mac address is returned in little-endian order
// for example, 12:34:56:78:90:ab will return 0x0000ab90 78563412
return e1000_eeprom_read(E1000_EEPROM_ETHADDR_WORD0) |
(e1000_eeprom_read(E1000_EEPROM_ETHADDR_WORD1) << 16) |
((uint64_t)e1000_eeprom_read(E1000_EEPROM_ETHADDR_WORD2) << 32);
}
Since now we use the hardware MAC address, these lines in e1000_rx_init()
should be removed.
@@ -76,5 +76,3 @@ e1000_rx_init()
// initialize receive address registers
// by default, it comes from EEPROM
- E1000_REG(E1000_RAL) = JOS_DEFAULT_MAC_LOW;
- E1000_REG(E1000_RAH) = JOS_DEFAULT_MAC_HIGH;
E1000_REG(E1000_RAH) |= E1000_RAH_AV;
To test if it works, I print the hardware MAC address in e1000_attach()
.
int
e1000_attach(struct pci_func *pcif)
{
pci_func_enable(pcif);
e1000_base = mmio_map_region(pcif->reg_base[0], pcif->reg_size[0]);
uint64_t hwaddr = e1000_get_hwaddr();
uint8_t *mac = (uint8_t *)&hwaddr;
cprintf("e1000: hwaddr %02x:%02x:%02x:%02x:%02x:%02x\n", mac[0], mac[1], mac[2], mac[3], mac[4], mac[5]);
cprintf("e1000: status 0x%08x\n", E1000_REG(E1000_STATUS));
e1000_tx_init();
e1000_rx_init();
return 0;
}
QEMU's E1000 address can be changed in GNUMakefile
by adding macaddr
option in QEMUOPTS
. Here I take 00:11:45:14:19:19
as an example.
@@ -160,6 +160,6 @@ IMAGES = $(OBJDIR)/kern/kernel.img
QEMUOPTS += -smp $(CPUS)
QEMUOPTS += -drive file=$(OBJDIR)/fs/fs.img,index=1,media=disk,format=raw
IMAGES += $(OBJDIR)/fs/fs.img
-QEMUOPTS += -net user -net nic,model=e1000 -redir tcp:$(PORT7)::7 \
+QEMUOPTS += -net user -net nic,model=e1000,macaddr=00:11:45:14:19:19 -redir tcp:$(PORT7)::7 \
-redir tcp:$(PORT80)::80 -redir udp:$(PORT7)::7 -net dump,file=qemu.pcap
QEMUOPTS += $(QEMUEXTRA)
Now run make qemu-nox
, it will print e1000: hwaddr 00:11:45:14:19:19
, meaning it works.
The next step is to add a syscall, I will call it sys_net_hwaddr()
. Since JOS is a 32-bit OS and its syscall uses 32-bit register eax
for return value, we cannot directly return an uint64
in syscall. As a workaround, here I return the hardware MAC address by parameter.
// Return the network adapter hardware address.
//
// Returns 0 on success, < 0 on error.
// Errors are:
// -E_INVAL if buf is not enough to hold an mac address (size < 6)
#define HWADDR_SIZE 6
static int
sys_net_hwaddr(void* buf, size_t size)
{
if (size < HWADDR_SIZE) {
return -E_INVAL;
}
uint64_t hwaddr = e1000_get_hwaddr();
memcpy(buf, (uint8_t *)&hwaddr, HWADDR_SIZE);
return 0;
}
There are two places that use hard-coded mac address, modifying them are not difficult. One is net/lwip/jos/jif/jif.c:low_level_init()
.
@@ -59,11 +59,8 @@ low_level_init(struct netif *netif)
netif->hwaddr_len = 6;
netif->mtu = 1500;
netif->flags = NETIF_FLAG_BROADCAST;
- // MAC address is hardcoded to eliminate a system call
- netif->hwaddr[0] = 0x52;
- netif->hwaddr[1] = 0x54;
- netif->hwaddr[2] = 0x00;
- netif->hwaddr[3] = 0x12;
- netif->hwaddr[4] = 0x34;
- netif->hwaddr[5] = 0x56;
+ // Challenge! Read MAC address from nic via syscall
+ if (sys_net_hwaddr(netif->hwaddr, netif->hwaddr_len) < 0) {
+ panic("jif: failed to read mac address from nic");
+ }
The other is net/testinput.c:announce()
.
@@ -19,7 +19,11 @@ announce(void)
- uint8_t mac[6] = {0x52, 0x54, 0x00, 0x12, 0x34, 0x56};
+ uint8_t mac[6];
uint32_t myip = inet_addr(IP);
uint32_t gwip = inet_addr(DEFAULT);
int r;
+ // Challenge! Read MAC address from nic via syscall
+ if ((r = sys_net_hwaddr(mac, sizeof(mac))) < 0)
+ panic("sys_net_hwaddr: %e", r);
+
if ((r = sys_page_alloc(0, pkt, PTE_P|PTE_U|PTE_W)) < 0)
panic("sys_page_map: %e", r);
Now run make run-httpd-nox
, the HTTP server will run just fine, and make grade
still passes all tests.