Syllabus for

Harvard Extension School CSCI E-95 (formerly CSCI E-295)

Compiler Design and Implementation (16364)

Fall 2020
Site last revised 10:17 AM 9-Sep-2020 ET

Dr. James L. Frankel

 

After the current Fall 2020 semester, the next time CSCI E-95 will be offered is in the Spring 2022 semester.

 

Quick Links:

Tuesdays 7:40-10:15 PM ET via web conference. Students may attend at the scheduled meeting time or watch recorded sessions on demand. The recorded sessions are available within 24 hours of the lecture.

Distance Learning Links including Video Streaming, Chat, and the Midterm Exam:

During Class:

Video Streaming:

Both section & class are are live streamed and also recorded. Students are encouraged to share their video feed and to ask questions verbally using their audio/video link in Zoom. The section & class live video stream is available through the class Canvas web site under Zoom.

Chat:

Questions can also be asked using the text Chat facility in Zoom during class meetings & during section meetings. Please use the chat feature available in Zoom rather than Canvas chat. We will *not* be monitoring Canvas chat.

After Class:

Videos of class and section are available on the course's Canvas web site under Course Videos.

Midterm Exam

Our midterm exam is a three hour long proctored exam that will be available on-line through Proctorio. More information will be available about the exam later in the semester.

The exam allows open-book access to *only* the two required textbooks. No notes are allowed. No electronic devices are allowed.

Prerequisites:

Knowledge of data structures and programming experience, such as is taught in CSCI E-22 (formerly CSCI E-119) (Data Structures), is required. An advanced algorithms course, such as CSCI E-124 (Data Structures and Algorithms) or equivalents, is preferred, but not required. Students must have sufficient experience to write large programming projects in the C Programming Language that utilize a wide variety of data structures. This course does *not* teach programming.

Brief abstract:

A study of the theory and practice required for the design and implementation of interpreters and compilers for programming languages. Coursework will range from the abstract, such as categorization of grammars and languages, to the concrete, such as specific algorithms used in compilers and practical performance issues. Topics include lexical analysis, parsing, symbol table generation, type checking, error detection, code generation, optimization, and run-time support. Techniques for top-down and bottom-up parsing both with and without the use of automated tools will be studied. Local and global optimization will be covered. An extensive lab project will be required of all students.

4 credits. Graduate credit.

Overview:

Computer Science E-95 is a comprehensive introduction to the theory and practice of compiler design and implementation. Students are expected to be already comfortable with designing, coding, and debugging large programs of modest complexity while employing good programming style and structured techniques. In particular, familiarity with terminal and text file I/O, iterative and conditional control structures, parameter passing and recursion, data structures, classes and object-oriented design in Java or C++ is presumed.

A majority of the class will be focused on the design and implementation of the term project. The project will be developed by students working alone. That project is the creation of a compiler for a significant subset of the C Programming Language (ISO C89) that produces code for the MIPS instruction set. The project will include the lexer, parser, symbol table manager, simple optimizer, and code generator. The programming assignments and the final compiler project will be written in the C Programming Language (ISO C89) -- or possibly in C++, but only with permission from the course staff. Initially, both the classroom lectures and the section meetings will be covering material important to the design and implementation of the final project. Later in the semester, advanced compiler techniques will be covered in class; however, both the class and sections will continue to support students as term projects progress. For the term project, students will continue working on and debugging their projects leading to their complete implementation and a final demonstration.

Because the course includes a required and significant term project involving a large amount of programming, the assignments will be time-consuming; therefore, a significant time commitment to the course is necessary. Although the relevant experience of students in the class is usually quite diverse, depending on background, it is not unusual for students to spend 15-20 hours per week or more completing the readings and homework assignments. Although the computers are available more-or-less around the clock, occasionally they will suddenly become unavailable (this is known as a crash). As with all such events, they always seem to occur at the worst possible time. Plan your computer work so that it is complete in advance of the deadlines. Check in your code to the required class git repository frequently. You have now been forewarned!

Books/Course Bibliography:

All course books are available from the Harvard Coop and are available for on-line ordering. A direct link to the books at the Coop is available for on-line purchasing. Keep in mind that Coop members receive a 10% discount. There are links available on Canvas to find the Library Reserves and, for some books, these include View online versions. In addition, all registered students will be eligible for library services (access, borrowing privileges, group study rooms) in FAS libraries, just like any other student in FAS. Although Harvard's physical libraries are closed during the COVID-19 pandemic, the libraries are open on-line. All registered students will continue to have access to Harvard Library on-line resources.

Textbook:

Compilers: Principles, Techniques, and Tools, 2/e; Alfred V. Aho, Monica S. Lam, Ravi Sethi, Jeffrey D. Ullman; Addison-Wesley, 2007; ISBN-10 0-321-48681-1; ISBN-13 978-0-321-48681-3; Errata for the Second Edition

C Language Reference Manual:

C: A Reference Manual, Fifth Edition; Samuel P. Harbison and Guy L. Steele, Jr.; Prentice Hall, 2002; ISBN-10 0-13-089592-X; ISBN-13 978-0-13-089592-9; Errata for the Third Edition from Sam Harbison; Our additional errata

Optional MIPS Architecture Book:

MIPS RISC Architecture, 2/e; Gerry Kane and Joseph Heinrich; Prentice Hall, 1992; ISBN-10 0-13-590472-2; ISBN-13 978-0-13-590472-5

There will also be other handouts & supplementary readings

Instructor:

Dr. James L. Frankel Dr. Frankel's Photo,

Teaching Assistants:

We have two Teaching Assistants (TAs) for this course. The TAs hold a weekly section meeting and office hours as described below. Attendance at the TAs' section meeting is strongly recommended. Course material will be covered in section -- either in toto or in more detail -- that time does not permit to be covered in class meetings. For example, this material includes use of git and GitHub; general approaches to solving the problem sets; overviews of algorithms, code snippets, and data structures. Also, the section meetings provide a venue in which it may be easier to ask more lengthy questions.

TA Section Meeting Time/Place Office Hours Time/Place Phone
Mark Ford
Mark's Photo,
Section Site
Tuesday,
6:30-7:30 PM ET,
Via web conference
Monday,
6:30-7:30 PM ET by appt. only,
Via phone and/or web conference
E-mail: Mark's e-mail address; +1.978.496.7213 (1:00 PM - 9:00 PM ET). If there's no answer, please leave a message with your name and a call-back number.
Nate Guerin
Nate's Photo,
Section Site
Tuesday,
6:30-7:30 PM ET,
Via web conference
Wednesday,
6:00-7:00 PM ET by appt. only,
Via phone and/or web conference
E-mail: Nate's e-mail address; +1.919.789.1659 (9:00 AM - 5:00 PM ET). If there's no answer, please leave a message with your name and a call-back number.

Questions and Issues:

When posing questions or bringing up issues of a non-personal nature, please use the class Piazza Forum. Answers to questions posed on Piazza benefit the whole class and allow the course staff to answer questions once for all students. Questions that include code or other information that shouldn't be shared with other students should be sent via e-mail to all course staff at the same time in order to increase the probability of a rapid response.

Piazza Wiki/Forum:

A Piazza Wiki/Forum (on-line discussion list) for CSCI E-95 is set up at Harvard Extension School CSCI E-95 Piazza Forum.

Record a Say Hello! Video:

Using any tool of your choosing (perhaps a cell phone selfie or the camera on your laptop), please record and post a short video (maybe just one to three minutes in length) in Canvas Discussions as a reply to my Say Hello! discussion to introduce yourself to the class. With everyone being remote, anything we can do to create a community for our class would be great. Please tell us a little about yourself possibly including where you are located, your background, what you do when you're not taking classes, and your goals for this class.

Enter Your Location:

Please enter your location in Canvas.

Using git and GitHub:

When using "git" and GitHub, make sure to follow the information on using "git" and setting up your GitHub repository that is available on the section web site.

Grading:

Graduate credit students:

Problem Sets:

All problem sets and programming assignments are due at midnight Eastern Time on Sunday night (i.e., midnight between Sunday and Monday) unless otherwise stated in the assignment or in the syllabus. Unless otherwise stated, all programming assignment solutions must be written in C (ISO C89) because the C Programming Language is nearly the same language that your compiler will accept as input. With special permission, a student may be allowed to write their compiler in C++. All code must build, be tested, and run on cscie95.dce.harvard.edu; be submitted using "git" on GitHub (or, in dire circumstances, via e-mail only if agreed to by the course staff); be well-written (clear coding style, modular structure, appropriately commented and documented in English); and tested (include any programs and/or shell scripts used in testing your solution as part of your submission). Remember, in addition to handing in all parts of the problem set solution or programming assignment program, sample runs of the program which demonstrate that the program works must be attached. In addition, each submission must include a makefile to build the assignment. The grade for programming assignments will include all of these attributes.

Of course, the solutions may be written and tested using any system of the student's choosing; however, when the solution is complete, it must be tested on the cscie95.dce.harvard.edu computer and pushed to the git code repository on GitHub. You may choose to develop under your own Unix/Linux system or under Cygwin under Windows, but testing and grading of your programming assignments will take place on the cscie95.dce.harvard.edu computer. To reiterate, we will be grading the solutions based on their behavior on the cscie95.dce.harvard.edu computer.

You can establish an account on cscie95.dce.harvard.edu by accessing the website at URL https://ac-web.dce.harvard.edu/ and clicking on "Reset Password." Please note that the username shown on this screen is the username you will use to login to our server. Our cscie95.dce.harvard.edu computer may be accessed for remote login using "ssh" over the Internet. Files may be transferred to these systems using "secure ftp" (SFTP). If you are using a Windows system, the SecureCRT and SecureFX programs are available from the Science Center at http://downloads.fas.harvard.edu/download; these programs implement "ssh" and "secure ftp," respectively. On Unix/Linux systems, the shell commands "ssh" and "sftp"/"scp" can be used for ssh and SFTP, respectively.

Separate documentation is available describing how to install and use git and GitHub on the section web site.

Some assignments may include Extra Credit programming problems. The Extra Credit programming problems can be completed to earn points that can increase the overall grade on the programming portion of your problem set; however, the grade on the programming portion of a problem set including extra credit will never exceed the full credit possible grade on the programming portion. That is, the Extra Credit programming problem(s) can be used to make up for deficiencies in other programming portions of the problem set to allow a higher grade to be earned. Extra Credit points from one problem set are not transferrable and may not be used on any other problem sets.

Late Policy:

All problem sets except for Problem Set 0 may be submitted late for partial credit. A late homework will lose 5% of its original grade for each day it is late (e.g., an assignment handed in two and a half days late will receive its original grade multiplied by 0.85). Late assignments may be submitted via "git" and an e-mail message notifying the instructor and your teaching assistant should be sent immediately after the late assignment is submitted. In addition, each student is given five free late days that may be used freely during the semester. However, keep in mind that almost all of the assignments are built on the previous assignments; handing in one assignment late does not extend the due date for subsequent assignments. The scope and difficultly level of the assignments increases during the class; therefore, we recommend against using the five free late days early in the class.

After a programming assignment has been initially submitted, we will award additional partial credit for corrections made to that assignment. We encourage students to correct any errors found in their code and to make improvements and enhancements. This will improve your grade and, in many cases, will be required to allow the next phase of your compiler to function correctly. No additional partial credit will be awarded for Problem Set 0 or for book problems.

Commented and Documented:

In the "Grading: Problem Sets" section above, the phrase "commented and documented" is used; this paragraph will clarify the necessary comments and documentation that should be provided with all programs. First, there should be a description of the entire application. This should include the user interface (i.e., how a user interacts with the program) and an explanation of what the program does. This documentation may be in a separate file from the program itself. Second, there should be a description at the beginning of each file which outlines the contents of that file. Third, each routine, function, method, etc. must be preceded by a section describing: (1) the name of the routine, (2) the purpose/function of the routine, (3) the parameters to the routine (name, type, meaning), (4) the return value from the routine (type, meaning), and (5) any side-effects (including modifying global variables, performing I/O, modifying heap-based storage, etc.) that the routine may cause. Fourth, declarations of variables should be commented with their purpose. Fifth, blocks of code should be commented to describe the purpose of the code section. Sixth, any complex or difficult to understand code statements or fragments should be commented to clarify their behavior.

Using git:

When using "git" and https://github.com/, make sure to follow the information on using "git" and setting up your repository that is available on the section web site. Create a named branch for each of your problem sets as follows: specify "problem-set-0" for Problem Set 0 (the course questionnaire, fix this program, and word count), specify "problem-set-1" for Problem Set 1, "problem-set-2" for Problem Set 2, etc., and specify "term-project" for the Term Project.

Midterm Exam:

See Distance Learning Links: Midterm Exam for information on the Midterm Exam.

Plagiarizing:

All work should be the personal creation of the individual student. Students are free to consult with each other and to study together, but all problem set solutions, programming assignments, exams, and the final project must be the personal contribution of each individual student. More explicitly, whenever a concept is reduced to a detailed algorithm or a program, no collaboration is allowed. If a paper, assignment, exam, program, or final project contains any information, algorithms, program fragments or other intellectual property taken from another source, that source and material must be explicitly identified and credit given. If you have any questions about this policy, it is the student's responsibility to clarify whether their activity is considered plagiarism.

MIPS Assembly Language Programming and SPIM:

In addition to programming in a conventional language (C -- or C++, with permission), students will learn how to write code in MIPS32 assembly language. This is the low-level language used by MIPS32 computers. All students are required to use the newest version of SPIM, named QtSpim, currently version 9.1.20. This software should be downloaded from SourceForge at https://sourceforge.net/projects/spimsimulator/files/ and is free under the BSD license. The software runs on Microsoft Windows, Apple Mac OS X, or Linux.

Further information is available about SPIM on SourceForge and Old Versions of SPIM, the MIPS assembly language simulator that is used in class and needed for problem sets and for the term project. Documentation about the MIPS32 instruction set and SPIM is available at http://pages.cs.wisc.edu/~larus/spim.html#information

Sample MIPS assembly code is available:
printint.s
printstring.s
readstring.s
count.s
count2.s
count3.s
squares.s
storedints.s
argcargv.s

Course Outline:

Approximate Schedule:

August Description
27 Registration deadline (Last day to register for fall term courses. Late registration is not permitted after this date.).
28-September 8 Course change period (For registered students only).
31 Classes begin.

 

September Description
1 First class meeting. Introduction, course information & policies, outline, schedule. Present Problem Sets 0 and 1. Overview of Compiler. Review of the C Programming Language.
6 at Midnight Problem Set 0 (using git, the course questionnaire, fix this program, and word count) due.
7 Labor Day
8 Course changes deadline.
Course drop deadline for full-tuition refund.
8 Second class meeting. Introduction to Compiling. A Simple One-Pass Compiler. Regular Expressions. Presentation about Lex. Go over lexer-standalone.lex combined with lexer.c. Continue with Review of the C Programming Language. For today's class meeting, read Harbison/Steele chapter 2 & Aho/Lam/Sethi/Ullman chapters 1-3.
13 at Midnight Problem Set 1 due.
15 Course drop deadline for half-tuition refund.
15 Third class meeting. Complete the Review of the C Programming Language. Present Problem Set 2. Syntax Analysis. Context-Free grammars. Ambiguity in grammars. Elimination of Left Recursion. Left Factoring. Presentation about Yacc. Go over lexer.lex combined with parser.y. Go over the grammar for our subset C Language. For today's class meeting, read Aho/Lam/Sethi/Ullman chapter 4.
22 Fourth class meeting. Lexical Analysis theory. Transition Diagrams. Present NFA and DFA. How to create an NFA from a regex. How to convert an NFA into a DFA. Complete Syntax Analysis. Top-Down Parsing. Show recursiveDescentParser.c. Present Problem Set 3. Symbol Table Management. Types in the C Programming Language. Representation of Types in a Compiler. For today's class meeting, read Aho/Lam/Sethi/Ullman chapter 5.
27 at Midnight Problem Set 2 due.
29 Fifth class meeting. Parse Tree, AST, Type Tree. Generation of Symbol Tables (Incl. Scope & Overloading Classes). FIRST and FOLLOW Functions. LL(1) Grammars. How to construct a Predictive Parsing Table, M. Table-driven Predictive Parsing. Type Checking: Integral and Floating-Point Number Representations. For today's class meeting, read Aho/Lam/Sethi/Ullman chapter 6.

 

October Description
6 Sixth class meeting. Type Checking. Complete IEEE 754 Floating-Point Number Representation. C Standard Conversions. Bottom-Up Parsing. Shift-Reduce Parsing. MIPS Architecture and Instruction Set. Overview of CPU, Registers, Memory. Instruction Formats: I-Type (Immediate), J-Type (Jump), and R-Type (Register). Instruction Set Presentation: Arithmetic R-Type, Arithmetic Immediate, Load/Store, Jump and Branch. For today's class meeting, read Aho/Lam/Sethi/Ullman chapter 6.
11 at Midnight Problem Set 3 due.
12 Indigenous Peoples' Day
13 Seventh class meeting. Syntax-Directed Translation. Run-Time Environments. Storage organization (code/text, static storage, heap, stack). Stack frames/activation records. Nested procedure definitions. Heap management. For today's class meeting, read Aho/Lam/Sethi/Ullman chapter 7.
20 Midterm exam. Eighth class meeting.
25 at Midnight Problem Set 4 due.
27 Ninth class meeting. Intermediate Code Generation. Three-address (quadruples) IR notation. Examples of IR code generation from all parse trees (expressions, accessing user variables for load and store, branching, conditional branching, function calling/return sequence, subscripting, pointer creation and dereferencing, casting). Dealing with types (including size and signedness) in IR. Single assignment form. Correctly handling lvalues and rvalues in the creation of IR nodes. Lvalues as a result of pointer dereferencing. For today's class meeting, read Aho/Lam/Sethi/Ullman chapter 6.

 

November Description
1-December 1 Degree program application dates for fall.
3 Tenth class meeting. Code Generation. MIPS Architecture and Instruction Set. Overview of CPU, Registers, Memory. Instruction Formats: I-Type (Immediate), J-Type (Jump), and R-Type (Register). Instruction Set Presentation: Arithmetic R-Type, Arithmetic Immediate, Load/Store, Jump and Branch. For today's class meeting, read Aho/Lam/Sethi/Ullman chapter 8.
8 at Midnight Problem Set 5 due.
10 Eleventh class meeting. Code Generation continued. MIPS Architecture and Instruction Set. Instruction Set Presentation: Jump and Branch (continued), Shift, Multiply/Divide. Review sample programs on the class web site. Present algorithm using graph coloring for register allocation. For today's class meeting, read Aho/Lam/Sethi/Ullman chapter 8.
17 Twelfth class meeting. Code Optimization. Basic Blocks. Use/Def Analysis. Liveness/Next Use Analysis. Reaching Definitions. Register Allocation & Spilling using graph coloring. Optimizations: Common subexpression elimination, copy propagation, dead code elimination, constant folding, code motion, reduction in strength, induction variables and reduction in strength, identities, inlining of functions, loop reordering, loop unrolling, array alignment/padding/layout, jump threading, instruction scheduling, tail recursion elimination. Order of optimizations. For today's class meeting, read Aho/Lam/Sethi/Ullman chapter 9.
20 Withdrawal deadline (no tuition refund).
22 at Midnight Problem Set 6 due.
24 Thirteenth class meeting. Code Optimization continued. Complete discussion of tail recursion. Types of dependencies: flow/data/true, anti-, output, and control. Loop carried dependencies. Instruction-Level Parallelism. Data flow computing model. VLIW "Trace" scheduling. For today's class meeting, read Aho/Lam/Sethi/Ullman chapter 9.
25-29 Thanksgiving Break

 

December Description
1 Fourteenth class meeting. Assertions vs. Assumptions. Optimizing for Parallelism and Data Locality. Massively-parallel computing model. Additional topics. Today we will discuss open issues for the Term Projects. For today's class meeting, read Aho/Lam/Sethi/Ullman chapter 10-12.
8 Fifteenth class meeting. Additional topics. Today we will discuss open issues for the Term Projects.
14-19 Final exams and last class meetings.
15 Final Class Meeting during usual section and class time. Student project presentations/demonstrations.
18 by 2 PM ET Term Project report, slides, code, makefiles, test programs, etc. are due.
20-January 3 Winter Break

 

January 2021 Description
7 Grades available online in Online Services
18 Martin Luther King Jr. Day

 

Software and Course Documents On-Line:

Slides used in class are available on-line:

The course questionnaire is available on-line.

The class problem sets are also available:

Look here for information about the GNU Project and the Free Software Foundation.

Look here for information about getting GNU Emacs for Windows 95/98/ME/NT/XP and 2000.

Look here for information about getting the Cygwin Linux-like environment for Windows.

Look here for a List of Linux distributions.

There are computers available for Extension student use in 53 Church Street and at Grossman Library.

Software is available for free download from Harvard Information Technology

Harvard University Information Technology:

Programs used in class as examples are also available:

The grammar for the C Programming Language is available here.
The HTML version of the grammar for the C Programming Language is available here.

The C* Slides are available here in PDF format: The C* Language.

Section Home Page