Problem Set 0 Harvard Extension School CSCI E-95: Compiler Design and Implementation Fall 2023 Due: September 10, 2023 at Midnight ET (Eastern Time) Total of 20 Points This problem set requires several different activites to be completed as detailed below: o A GitHub account needs to be established. o A HarvardKey account needs to be established. o A GitHub CSCI E-95 repository needs to be established. o Connection to the Harvard network using vpn.harvard.edu needs to be established. o Successful login to cscie95.dce.harvard.edu using SSH needs to be verified. o git needs to be installed on computers on which you'll be developing code. o The course questionnaire needs to be completed. o The fix-this-program.c needs to be corrected so that it builds and runs correctly on cscie95.dce.harvard.edu. o word-count.c needs to be written and tested so that it builds and runs correctly on cscie95.dce.harvard.edu. o The questionnaire, corrected fix-this-program.c, and word-count.c need to be submitted to your GitHub repository. Unlike all other Problem Sets, additional partial credit will *not* be awarded for corrections made after this problem set has been submitted. The total number of points for Problem Set 0 is 20 points. 1. (5 Points) Follow the directions given in the section web site at https://cscie95.dce.harvard.edu/fall2023/section/index.html to: (1) create a repository on https://github.com/ which is a clone of the CSCI E-95 sample code repository, (2) install git on your computers on which you'll be developing code, (3) create a branch named problem-set-0, (4) complete the course questionnaire, (5) on the cscie95.dce.harvard.edu computer, use make to build: src/fix-this-program/c/fix-this-program.c or, only with special permission it may be possible to use the C++ version: src/fix-this-program/c++/fix-this-program.cpp *without* changing the corresponding makefile, create a branch for problem-set-0, modify the appropriate source file to allow it to build without any warnings or errors on the cscie95 computer, (6) also on the instance, write a program named word-count in C (or in C++, but only with special permission) to accomplish the tasks given in part 2 below in this PS0, (7) commit your filled-in questionnaire, corrected fix-this-program.c (or fix-this-program.cpp), and the word-count program, push the branch, create a pull request, add a comment with the text: @frankelharvard @dwillens @massfords and accept the pull request. Tasks 1 through 5 above (considered together) are worth 5 points. Task 6 (writing and testing word-count.c) is worth 15 points. The course questionnaire is available in the sample code repository at doc/Questionnaire.txt. For C, fix-this-program.c is available in the sample code repository at src/fix-this-program/c/fix-this-program.c. For C++, fix-this-program.cpp is available in the sample code repository at src/fix-this-program/c++/fix-this-program.cpp. 2. (15 Points) Write a program named word-count.c in C (or word-count.cpp in C++, but only with special permission), that uses file I/O to read a text file whose name is specified on the command line to the program. All code for word-count should be placed in a src/word-count directory. The program will parse the input file into words and will use a data structure of your choice to keep track of the number of occurrences of each unique word that is found in the input file. Words will be delimited by white space which for our purposes is defined to be any mix of spaces (' '), horizontal tabs ('\t'), or newlines ('\n') -- including multiple occurrences of white space. You are welcome to include a carriage-return ('\r') as another white space character -- this facilitates code development on Windows computers. Lines are delimited by a newline character (or additionally and optionally for development on Windows computers, by a carriage-return immediately followed by a linefeed). No other characters should be treated specially (i.e., punctuation, hyphens, etc. should just be considered as non-white-space characters that should be treated as part of words). Do not convert the case of letters (e.g., your program should consider the words "case" and "Case" to be different words). Upon reaching end-of-file, the program will output to stdout: (1) the number of lines in the input file (include blank lines in the count of the number of lines), (2) the total number of words in the input file (*not* the number of unique words), (3) a list of each unique word in the input file along with the number of times that word appears in the file. The list of unique words need not be sorted in any particular order. Your solution must compile and execute correctly on our cscie95 instance. Compilation must not produce any warnings (or errors!). Remember to include a makefile to complete the build. You *must* check return values from *all* functions with meaningful return values. Depending on your programming experience, you should choose an appropriate data structure in which the words and word counts are stored. (1) Using an array of fixed size (i.e. not allowing an arbitrary number of unique words), (2) reading the entire text file into memory, (3) making more than one pass through the input file, and (4) accepting words of a compile-time maximum length are all not worthy of full credit. Using a linked list -- perhaps sorted in alphabetical order -- would be worthy of full credit even though the performance of searching and inserting into a linked list may be inefficient. Do not spend your time implementing a more efficient solution unless you are sure that you can complete this problem set on time. Last revised 12-Sep-23