Syllabus for

Harvard Extension School CSCI E-95

Compiler Design and Implementation (26179)

Spring 2022
Site last revised 1:35 PM 28-Apr-2022 ET

Dr. James L. Frankel

 

After the upcoming Spring 2022 semester, the next time CSCI E-95 will be offered is in the Fall 2023 semester.

Final Project Presentation Slots on Tuesday, May 10, 2022:
Time Slot Available
6:30 PM ET - 6:50 PM ET Slot 1 Filled by Luis
6:50 PM ET - 7:10 PM ET Slot 2 Filled by Nathan
7:10 PM ET - 7:30 PM ET Slot 3 Filled by Pierre-David
7:30 PM ET - 7:50 PM ET Slot 4 Filled by Eric
7:50 PM ET - 8:10 PM ET Slot 5 Filled by Rix
8:10 PM ET - 8:30 PM ET Slot 6 Filled by Srijesh
8:30 PM ET - 8:50 PM ET Slot 7 Filled by Julian
8:50 PM ET - 9:10 PM ET Slot 8 Filled by Adam
9:10 PM ET - 9:30 PM ET Slot 9 Filled by Justin

We will be holding our first section meeting on Tuesday, January 25, 2022 at 6:30 PM ET. Note that this section meeting is *before* our first class meeting. Both section and class meetings are live streamed and are also available after class for later viewing and reviewing.

Many links in the course website are not yet available. They will be activated as the course progresses.

 

Quick Links:

Class Hours and Location:

Tuesdays 7:40-10:15 PM ET via web conference or on demand online. Students may attend at the scheduled meeting time or watch recorded sessions on demand. The recorded sessions are available within 24 hours of the lecture.

Distance Learning Links including Video Streaming, Chat, and the Midterm Exam:

During Class:

Video Streaming:

Both section & class are are live streamed and also recorded. Students are encouraged to share their video feed and to ask questions verbally using their audio/video link in Zoom. The section & class live video stream is available through the class Canvas web site under Zoom.

Chat:

Questions can also be asked using the text Chat facility in Zoom during class meetings & during section meetings. Please use the chat feature available in Zoom rather than Canvas chat. We will *not* be monitoring Canvas chat.

After Class:

Videos of class and section are available on the course's Canvas web site under Course Videos.

Midterm Exam

Our midterm exam is a three hour long proctored exam that will be available on-line through Canvas with integrity protected by Proctorio. Students must start the exam within the 24-hour period that begins at the designated start time of our midterm exam and ends 24 hours later. Students will have three hours to complete the exam independent of when they start. More information will be available about the exam later in the semester.

The exam does not allow access to any books or notes, but relevant course slides will be made available. No electronic devices are allowed.

Prerequisites:

Knowledge of data structures and programming experience, such as is taught in CSCI E-22 (formerly CSCI E-119) (Data Structures), is required. An advanced algorithms course, such as CSCI E-124 (Data Structures and Algorithms) or equivalents, is preferred, but not required. Students must have sufficient experience to write large programming projects in the C Programming Language that utilize a wide variety of data structures. This course does *not* teach programming.

Brief abstract:

This course is a study of the theory and practice required for the design and implementation of interpreters and compilers for programming languages. Coursework will range from the abstract, such as categorization of grammars and languages, to the concrete, such as specific algorithms used in compilers and practical performance issues. Topics include lexical analysis, parsing, symbol table generation, type checking, error detection, code generation, optimization, and run-time support. Techniques for top-down and bottom-up parsing both with and without the use of automated tools will be studied. Local and global optimization will be covered. An extensive lab project will be required of all students.

4 credits. Graduate credit.

Overview:

Computer Science E-95 is a comprehensive introduction to the theory and practice of compiler design and implementation. Students are expected to be already comfortable with designing, coding, and debugging large programs of modest complexity while employing good programming style and structured techniques. In particular, familiarity with terminal and text file I/O, iterative and conditional control structures, parameter passing and recursion, data structures, classes and object-oriented design in Java or C++ is presumed.

A majority of the class will be focused on the design and implementation of the term project. The project will be developed by students working alone. That project is the creation of a compiler for a significant subset of the C Programming Language (ISO C89) that produces code for the MIPS instruction set. The project will include the lexer, parser, symbol table manager, simple optimizer, and code generator. The programming assignments and the final compiler project will be written in the C Programming Language (ISO C89). Initially, both the classroom lectures and the section meetings will be covering material important to the design and implementation of the final project. Later in the semester, advanced compiler techniques will be covered in class; however, both the class and sections will continue to support students as term projects progress. For the term project, students will continue working on and debugging their projects leading to their complete implementation and a final demonstration.

Because the course includes a required and significant term project involving a large amount of programming, the assignments will be time-consuming; therefore, a significant time commitment to the course is necessary. Although the relevant experience of students in the class is usually quite diverse, depending on background, it is not unusual for students to spend 15-20 hours per week or more completing the readings and homework assignments. Although the computers are available more-or-less around the clock, occasionally they will suddenly become unavailable (this is known as a crash). As with all such events, they always seem to occur at the worst possible time. Plan your computer work so that it is complete in advance of the deadlines. Check in your code to the required class git repository frequently. You have now been forewarned!

Books/Course Bibliography:

All course books are available from the Harvard Coop and are available for on-line ordering. A direct link to the books at the Coop is available for on-line purchasing. Keep in mind that Coop members receive a 10% discount. There are links available on Canvas to find the Library Reserves and, for some books, these include View online versions. In addition, all registered students will be eligible for library services (access, borrowing privileges, group study rooms) in FAS libraries, just like any other student in FAS. The Harvard Library website includes information about how to access library materials remotely as well as the reopening schedule. All registered students will continue to have access to Harvard Library on-line resources.

Textbook:

Compilers: Principles, Techniques, and Tools, 2/e; Alfred V. Aho, Monica S. Lam, Ravi Sethi, Jeffrey D. Ullman; Addison-Wesley, 2007; ISBN-10 0-321-48681-1; ISBN-13 978-0-321-48681-3; Errata for the Second Edition

C Language Reference Manual:

C: A Reference Manual, Fifth Edition; Samuel P. Harbison and Guy L. Steele, Jr.; Prentice Hall, 2002; ISBN-10 0-13-089592-X; ISBN-13 978-0-13-089592-9; Errata for the Third Edition from Sam Harbison; Our additional errata

Optional MIPS Architecture Book:

MIPS RISC Architecture, 2/e; Gerry Kane and Joseph Heinrich; Prentice Hall, 1992; ISBN-10 0-13-590472-2; ISBN-13 978-0-13-590472-5

Optional git Books:

Dangit, Git!: Recipes for Gitting out of a Git Mess; Katie Sylor-Miller and Julia Evans; This short on-line book describes git fundamentals.

Pro Git, 2nd Edition; Scott Chacon and Ben Straub; Apress, 2014; ISBN-13 978-1484200773; ISBN-10 9781484200773. This is a comprehensive book about git. It is also available for free download at the web site above.

There will also be other handouts & supplementary readings

Instructor:
Dr. James L. Frankel Dr. Frankel's Photo,

Teaching Assistants:

We have two Teaching Assistants (TAs) for this course. The TAs hold a weekly section meeting and office hours as described below. Attendance at the TAs' section meeting is strongly recommended. Course material will be covered in section -- either in toto or in more detail -- that time does not permit to be covered in class meetings. For example, this material includes use of git and GitHub; general approaches to solving the problem sets; overviews of algorithms, code snippets, and data structures. Also, the section meetings provide a venue in which it may be easier to ask more lengthy questions.

TA Section Meeting Time/Place Office Hours Time/Place Phone
Alex Shopov
Alex's Photo,
Section Site
Tuesday,
6:30-7:30 PM ET,
Via web conference
Wednesday,
6:30-7:30 PM ET,
Via web conference
E-mail: Alex's e-mail address. Questions whose answers would be relevant to the whole class should be posed via Ed Discussion. When e-mail is appropriate (for grading questions, personal issues, etc.), e-mail should be sent to both TAs and also to the professor.
Max Snyder
Max's Photo,
Section Site
Tuesday,
6:30-7:30 PM ET,
Via web conference
Monday,
7:30-8:30 PM ET,
Via web conference
E-mail: Max's e-mail address. Questions whose answers would be relevant to the whole class should be posed via Ed Discussion. When e-mail is appropriate (for grading questions, personal issues, etc.), e-mail should be sent to both TAs and also to the professor.

Questions and Issues:

When posing questions or bringing up issues of a non-personal nature, please use the class Ed Discussion Forum. Answers to questions posed on Ed Discussion benefit the whole class and allow the course staff to answer questions once for all students. Questions that include code or other information that shouldn't be shared with other students should be sent via e-mail to all course staff at the same time in order to increase the probability of a rapid response.

Ed Discussion Wiki/Forum:

An Ed Discussion Wiki/Forum (on-line discussion list) for CSCI E-95 is set up at Harvard Extension School CSCI E-95 Ed Discussion Forum.

Record a Say Hello! Video:

Using any tool of your choosing (perhaps a cell phone selfie or the camera on your laptop), please record and post a short video (maybe just one to three minutes in length) in Canvas Discussions as a reply to my Say Hello! discussion to introduce yourself to the class. With everyone being remote, anything we can do to create a community for our class would be great. Please tell us a little about yourself possibly including where you are located, your background, what you do when you're not taking classes, and your goals for this class.

Enter Your Location:

Please enter your location in Canvas.

Using git and GitHub:

When using "git" and GitHub, make sure to follow the information on using "git" and setting up your GitHub repository that is available on the section web site.

Grading:

Graduate credit students:

Problem Sets:

All problem sets and programming assignments are due at midnight Eastern Time on Sunday night (i.e., midnight between Sunday and Monday) unless otherwise stated in the assignment or in the syllabus. Unless otherwise stated, all programming assignment solutions must be written in C (ISO C89) because the C Programming Language is nearly the same language that your compiler will accept as input. With special permission, a student may be allowed to write their compiler in C++. All code must build, be tested, and run on cscie95.dce.harvard.edu; be submitted using "git" on GitHub (or, in dire circumstances, via e-mail only if agreed to by the course staff); be well-written (clear coding style, modular structure, appropriately commented and documented in English); and tested (include any programs and/or shell scripts used in testing your solution as part of your submission). Remember, in addition to handing in all parts of the problem set solution or programming assignment program, sample runs of the program which demonstrate that the program works must be attached. In addition, each submission must include a makefile to build the assignment. The grade for programming assignments will include all of these attributes.

Of course, the solutions may be written and tested using any system of the student's choosing; however, when the solution is complete, it must be tested on the cscie95.dce.harvard.edu computer and pushed to the git code repository on GitHub. You may choose to develop under your own Unix/Linux system or under Cygwin under Windows, but testing and grading of your programming assignments will take place on the cscie95.dce.harvard.edu computer. To reiterate, we will be grading the solutions based on their behavior on the cscie95.dce.harvard.edu computer.

The first step to begin to use our computer resources is to ensure that your HarvardKey account has been established. Use a browser to connect to https://key.harvard.edu. From that page, you can claim a new HarvardKey if you don't already have one. There, you can also manage your HarvardKey account. On the "Manage Your HarvardKey Account" page, in addition to your "Login Name," you can see your NetID. Your NetID will be required to login to our instance named cscie95.dce.harvard.edu.

Access to our cscie95.dce.harvard.edu instance requires that a VPN (Virtual Private Network) connection to the Harvard network is established first. The Cisco AnyConnect VPN application can be downloaded by logging into https://vpn.harvard.edu. Then, you can run the Cisco AnyConnect to complete the VPN connection. For creating the VPN, use vpn.harvard.edu as the hostname and click on "Connect." On the two-step verification pop-up window, enter your HarvardKey Login Name as your Username, and for the Password, enter your HarvardKey Password. Do *not* enter a Two-Step Verification Code. You should receive a request for approval via your Duo Mobile app before the VPN can be established.

After establishing a VPN connection to the Harvard network, you can set up an account on cscie95.dce.harvard.edu. Access our cscie95.dce.harvard.edu instance for remote login using "ssh" over the Internet. Your username is your HarvardKey NetID and your password is your HarvardKey password. On your first attempt to login to our instance, your account will be configured. Files may be transferred to these systems using "secure ftp" (SFTP). If you are using a Windows system, the SecureCRT and SecureFX programs are available from the Science Center at http://downloads.fas.harvard.edu/download; these programs implement "ssh" and "secure ftp," respectively. On Unix/Linux systems, the shell commands "ssh" and "sftp"/"scp" can be used for ssh and SFTP, respectively.

Separate documentation is available describing how to install and use git and GitHub on the section web site.

Some assignments may include Extra Credit programming problems. The Extra Credit programming problems can be completed to earn points that can increase the overall grade on the programming portion of your problem set; however, the grade on the programming portion of a problem set including extra credit will never exceed the full credit possible grade on the programming portion. That is, the Extra Credit programming problem(s) can be used to make up for deficiencies in other programming portions of the problem set to allow a higher grade to be earned. Extra Credit points from one problem set are not transferrable and may not be used on any other problem sets.

Late Policy:

All problem sets except for Problem Set 0 may be submitted late for partial credit. A late homework will lose 5% of its original grade for each day it is late (e.g., an assignment handed in two and a half days late will receive its original grade multiplied by 0.85). Late assignments may be submitted via "git" and an e-mail message notifying the instructor and your teaching assistant should be sent immediately after the late assignment is submitted. In addition, each student is given five free late days that may be used freely during the semester. However, keep in mind that almost all of the assignments are built on the previous assignments; handing in one assignment late does not extend the due date for subsequent assignments. The scope and difficulty level of the assignments increases during the class; therefore, we recommend against using the five free late days early in the class.

After a programming assignment has been initially submitted, we will award additional partial credit for corrections made to that assignment. We encourage students to correct any errors found in their code and to make improvements and enhancements. This will improve your grade and, in many cases, will be required to allow the next phase of your compiler to function correctly. No additional partial credit will be awarded for Problem Set 0 or for book problems.

Commented and Documented:

In the "Grading: Problem Sets" section above, the phrase "commented and documented" is used; this paragraph will clarify the necessary comments and documentation that should be provided with all programs. First, there should be a description of the entire application. This should include the user interface (i.e., how a user interacts with the program) and an explanation of what the program does. This documentation may be in a separate file from the program itself. Second, there should be a description at the beginning of each file which outlines the contents of that file. Third, each routine, function, method, etc. must be preceded by a section describing: (1) the name of the routine, (2) the purpose/function of the routine, (3) the parameters to the routine (name, type, meaning), (4) the return value from the routine (type, meaning), and (5) any side-effects (including modifying global variables, performing I/O, modifying heap-based storage, etc.) that the routine may cause. Fourth, declarations of variables should be commented with their purpose. Fifth, blocks of code should be commented to describe the purpose of the code section. Sixth, any complex or difficult to understand code statements or fragments should be commented to clarify their behavior.

Using git:

When using "git" and https://github.com/, make sure to follow the information on using "git" and setting up your repository that is available on the section web site. Create a named branch for each of your problem sets as follows: specify "problem-set-0" for Problem Set 0 (the course questionnaire, fix this program, and word count), specify "problem-set-1" for Problem Set 1, "problem-set-2" for Problem Set 2, etc., and specify "term-project" for the Term Project.

Midterm Exam:

See Distance Learning Links: Midterm Exam for information on the Midterm Exam.

Accessibility:

Harvard Extension School Accessibility Services Office is responsible for providing accommodations to students with disabilities. You can find more information about the process for applying for accommodations by visiting their website at https://www.extension.harvard.edu/resources-policies/accessibility-services-office-aso.

Plagiarizing:

All work should be the personal creation of the individual student. Students are free to consult with each other and to study together, but all problem set solutions, programming assignments, exams, and the final project must be the personal contribution of each individual student. More explicitly, whenever a concept is reduced to a detailed algorithm or a program, no collaboration is allowed. If a paper, assignment, exam, program, or final project contains any information, algorithms, program fragments or other intellectual property taken from another source, that source and material must be explicitly identified and credit given. If you have any questions about this policy, it is the student's responsibility to clarify whether their activity is considered plagiarism.

You are responsible for understanding Harvard Extension School policies on academic integrity (https://www.extension.harvard.edu/resources-policies/student-conduct/academic-integrity) and how to use sources responsibly. Not knowing the rules, misunderstanding the rules, running out of time, submitting the wrong draft, or being overwhelmed with multiple demands are not acceptable excuses. There are no excuses for failure to uphold academic integrity. To support your learning about academic citation rules, please visit the Harvard Extension School Tips to Avoid Plagiarism ((https://www.extension.harvard.edu/resources-policies/resources/tips-avoid-plagiarism), where you'll find links to the Harvard Guide to Using Sources and two free online 15-minute tutorials to test your knowledge of academic citation policy. The tutorials are anonymous open-learning tools.

MIPS Assembly Language Programming and SPIM:

In addition to programming in a conventional language (C -- or C++, with permission), students will learn how to write code in MIPS32 assembly language. This is the low-level language used by MIPS32 computers. All students are required to use the newest version of SPIM, named QtSpim, currently version 9.1.23. This software should be downloaded from SourceForge at https://sourceforge.net/projects/spimsimulator/files/ and is free under the BSD license. The software runs on Microsoft Windows, Apple Mac OS X, or Linux. A command line version of SPIM (named "spim") is also installed and available on our cscie95.dce.harvard.edu instance.

Further information is available about SPIM on SourceForge and Old Versions of SPIM, the MIPS assembly language simulator that is used in class and needed for problem sets and for the term project. Documentation about the MIPS32 instruction set and SPIM is available at http://pages.cs.wisc.edu/~larus/spim.html#information

Sample MIPS assembly code is available:
printint.s
printstring.s
readstring.s
count.s
count2.s
count3.s
squares.s
storedints.s
argcargv.s

Course Outline: Approximate Schedule:

January Description
17 Martin Luther King Jr. Day
20 Registration deadline (Last day to register for spring term courses. Late registration is not permitted after this date.).
21-30 Course change period (For registered students only).
24 Classes begin.
25 First class meeting. Introduction, course information & policies, outline, schedule. Present Problem Sets 0 and 1. Overview of Compiler. Review of the C Programming Language.
30 at Midnight Problem Set 0 (using git, the course questionnaire, fix this program, and word count) due.
30 Course changes deadline.
Course drop deadline for full-tuition refund.

 

February Description
1 Second class meeting. Introduction to Compiling. A Simple One-Pass Compiler. Regular Expressions. Presentation about Lex. Go over lexer-standalone.lex combined with lexer.c. Continue with Review of the C Programming Language. For today's class meeting, read Harbison/Steele chapter 2 & Aho/Lam/Sethi/Ullman chapters 1-3.
6 at Midnight Problem Set 1 due.
6 Course drop deadline for half-tuition refund.
8 Third class meeting. Complete the Review of the C Programming Language. Present Problem Set 2. Syntax Analysis. Context-Free grammars. Ambiguity in grammars. Elimination of Left Recursion. Left Factoring. Presentation about Yacc. Go over lexer.lex combined with parser.y. Go over the grammar for our subset C Language. For today's class meeting, read Aho/Lam/Sethi/Ullman chapter 4.
15 Fourth class meeting. Lexical Analysis theory. Transition Diagrams. Present NFA and DFA. How to create an NFA from a regex. How to convert an NFA into a DFA. Complete Syntax Analysis. Top-Down Parsing. Show recursiveDescentParser.c. Present Problem Set 3. Symbol Table Management. Types in the C Programming Language. Representation of Types in a Compiler. For today's class meeting, read Aho/Lam/Sethi/Ullman chapter 5.
20 at Midnight Problem Set 2 due.
21 President's Day
22 Fifth class meeting. Parse Tree, AST, Type Tree. Generation of Symbol Tables (Incl. Scope & Overloading Classes). FIRST and FOLLOW Functions. LL(1) Grammars. How to construct a Predictive Parsing Table, M. Table-driven Predictive Parsing. Type Checking: Integral and Floating-Point Number Representations. For today's class meeting, read Aho/Lam/Sethi/Ullman chapter 6.

 

March Description
1 Sixth class meeting. Type Checking. Complete IEEE 754 Floating-Point Number Representation. C Standard Conversions. Bottom-Up Parsing. Shift-Reduce Parsing. MIPS Architecture and Instruction Set. Overview of CPU, Registers, Memory. Instruction Formats: I-Type (Immediate), J-Type (Jump), and R-Type (Register). Instruction Set Presentation: Arithmetic R-Type, Arithmetic Immediate, Load/Store, Jump and Branch. For today's class meeting, read Aho/Lam/Sethi/Ullman chapter 6.
6 at Midnight Problem Set 3 due.
8 Seventh class meeting. Syntax-Directed Translation. Run-Time Environments. Storage organization (code/text, static storage, heap, stack). Stack frames/activation records. Nested procedure definitions. Heap management. For today's class meeting, read Aho/Lam/Sethi/Ullman chapter 7.
12-20 Spring Break
15-April 15 Degree program application period for spring.
22 Midterm exam. Eighth class meeting.
27 at Midnight Problem Set 4 due.
29 Ninth class meeting. Intermediate Code Generation. Three-address (quadruples) IR notation. Examples of IR code generation from all parse trees (expressions, accessing user variables for load and store, branching, conditional branching, function calling/return sequence, subscripting, pointer creation and dereferencing, casting). Dealing with types (including size and signedness) in IR. Single assignment form. Correctly handling lvalues and rvalues in the creation of IR nodes. Lvalues as a result of pointer dereferencing. For today's class meeting, read Aho/Lam/Sethi/Ullman chapter 6.

 

April Description
5 Tenth class meeting. Code Generation. MIPS Architecture and Instruction Set. Overview of CPU, Registers, Memory. Instruction Formats: I-Type (Immediate), J-Type (Jump), and R-Type (Register). Instruction Set Presentation: Arithmetic R-Type, Arithmetic Immediate, Load/Store, Jump and Branch. For today's class meeting, read Aho/Lam/Sethi/Ullman chapter 8.
10 at Midnight Problem Set 5 due.
12 Eleventh class meeting. Code Generation continued. MIPS Architecture and Instruction Set. Instruction Set Presentation: Jump and Branch (continued), Shift, Multiply/Divide. Review sample programs on the class web site. Present algorithm using graph coloring for register allocation. For today's class meeting, read Aho/Lam/Sethi/Ullman chapter 8.
19 Twelfth class meeting. Code Optimization. Basic Blocks. Use/Def Analysis. Liveness/Next Use Analysis. Reaching Definitions. Register Allocation & Spilling using graph coloring. Optimizations: Common subexpression elimination, copy propagation, dead code elimination, constant folding, code motion, reduction in strength, induction variables and reduction in strength, identities, inlining of functions, loop reordering, loop unrolling, array alignment/padding/layout, jump threading, instruction scheduling, tail recursion elimination. Order of optimizations. For today's class meeting, read Aho/Lam/Sethi/Ullman chapter 9.
22 Withdrawal deadline (no tuition refund).
24 at Midnight Problem Set 6 due.
26 Thirteenth class meeting. Code Optimization continued. Complete discussion of tail recursion. Types of dependencies: flow/data/true, anti-, output, and control. Loop carried dependencies. Instruction-Level Parallelism. Data flow computing model. VLIW "Trace" scheduling. For today's class meeting, read Aho/Lam/Sethi/Ullman chapter 9.

 

May Description
3 Fourteenth class meeting. Assertions vs. Assumptions. Optimizing for Parallelism and Data Locality. Massively-parallel computing model. Additional topics. Today we will discuss open issues for the Term Projects. For today's class meeting, read Aho/Lam/Sethi/Ullman chapter 10-12.
9-14 Final exams and last class meetings.
10 Final Class Meeting during usual section and class time. Student project presentations/demonstrations.
13 by 2 PM ET Term Project report, slides, code, makefiles, test programs, etc. are due.
24 Grades available online in Online Services
26 Commencement
30 Memorial Day

 

June Description

 

July Description
1-August 1 Degree program application period for summer.

 

Software and Course Documents On-Line:

Slides used in class are available on-line:
The Course Questionnaire and Problem Sets

Useful software is available on-line using the following links: Harvard University Information Technology Programs used in class as examples are also available: Grammar for the C Programming Language: Online Papers Used in Class The C* Slides are available here in PDF format: The C* Language.

Section Home Page