Problem Set 0

		 Harvard Extension School CSCI E-95:
		  Compiler Design and Implementation

			      Fall 2023

	Due: September 10, 2023 at Midnight ET (Eastern Time)

Total of 20 Points

This problem set requires several different activites to be completed
as detailed below:
  o  A GitHub account needs to be established.
  o  A HarvardKey account needs to be established.
  o  A GitHub CSCI E-95 repository needs to be established.
  o  Connection to the Harvard network using vpn.harvard.edu needs to be
     established.
  o  Successful login to cscie95.dce.harvard.edu using SSH needs to be
     verified.
  o  git needs to be installed on computers on which you'll be
     developing code.
  o  The course questionnaire needs to be completed.
  o  The fix-this-program.c needs to be corrected so that it builds
     and runs correctly on cscie95.dce.harvard.edu.
  o  word-count.c needs to be written and tested so that it builds and
     runs correctly on cscie95.dce.harvard.edu.
  o  The questionnaire, corrected fix-this-program.c, and word-count.c
     need to be submitted to your GitHub repository.

Unlike all other Problem Sets, additional partial credit will *not* be
awarded for corrections made after this problem set has been
submitted.  The total number of points for Problem Set 0 is 20 points.

1. (5 Points) Follow the directions given in the section web site at
  https://cscie95.dce.harvard.edu/fall2023/section/index.html to:
  (1) create a repository on https://github.com/ which is a clone of
  the CSCI E-95 sample code repository, (2) install git on your
  computers on which you'll be developing code, (3) create a branch
  named problem-set-0, (4) complete the course questionnaire, (5) on
  the cscie95.dce.harvard.edu computer, use make to build:
  
  src/fix-this-program/c/fix-this-program.c

  or, only with special permission it may be possible to use the C++
  version:

  src/fix-this-program/c++/fix-this-program.cpp

  *without* changing the corresponding makefile, create a branch for
  problem-set-0, modify the appropriate source file to allow it to
  build without any warnings or errors on the cscie95 computer, (6)
  also on the instance, write a program named word-count in C (or in
  C++, but only with special permission) to accomplish the tasks given
  in part 2 below in this PS0, (7) commit your filled-in
  questionnaire, corrected fix-this-program.c (or
  fix-this-program.cpp), and the word-count program, push the branch,
  create a pull request, add a comment with the text:

    @frankelharvard @dwillens @massfords

  and accept the pull request.

  Tasks 1 through 5 above (considered together) are worth 5 points.
  Task 6 (writing and testing word-count.c) is worth 15 points.

  The course questionnaire is available in the sample code repository
  at doc/Questionnaire.txt.

  For C, fix-this-program.c is available in the sample code repository
  at src/fix-this-program/c/fix-this-program.c.  For C++,
  fix-this-program.cpp is available in the sample code repository at
  src/fix-this-program/c++/fix-this-program.cpp.

2. (15 Points) Write a program named word-count.c in C (or
  word-count.cpp in C++, but only with special permission), that uses
  file I/O to read a text file whose name is specified on the command
  line to the program.  All code for word-count should be placed in a
  src/word-count directory.  The program will parse the input file
  into words and will use a data structure of your choice to keep
  track of the number of occurrences of each unique word that is found
  in the input file.  Words will be delimited by white space which for
  our purposes is defined to be any mix of spaces (' '), horizontal
  tabs ('\t'), or newlines ('\n') -- including multiple occurrences of
  white space.  You are welcome to include a carriage-return ('\r') as
  another white space character -- this facilitates code development
  on Windows computers.  Lines are delimited by a newline character
  (or additionally and optionally for development on Windows
  computers, by a carriage-return immediately followed by a linefeed).
  No other characters should be treated specially (i.e., punctuation,
  hyphens, etc. should just be considered as non-white-space
  characters that should be treated as part of words).  Do not convert
  the case of letters (e.g., your program should consider the words
  "case" and "Case" to be different words).  Upon reaching
  end-of-file, the program will output to stdout: (1) the number of
  lines in the input file (include blank lines in the count of the
  number of lines), (2) the total number of words in the input file
  (*not* the number of unique words), (3) a list of each unique word
  in the input file along with the number of times that word appears
  in the file.  The list of unique words need not be sorted in any
  particular order.

  Your solution must compile and execute correctly on our cscie95
  instance.  Compilation must not produce any warnings (or errors!).
  Remember to include a makefile to complete the build.  You *must*
  check return values from *all* functions with meaningful return
  values.

  Depending on your programming experience, you should choose an
  appropriate data structure in which the words and word counts are
  stored.  (1) Using an array of fixed size (i.e. not allowing an
  arbitrary number of unique words), (2) reading the entire text file
  into memory, (3) making more than one pass through the input file,
  and (4) accepting words of a compile-time maximum length are all not
  worthy of full credit.  Using a linked list -- perhaps sorted in
  alphabetical order -- would be worthy of full credit even though the
  performance of searching and inserting into a linked list may be
  inefficient.  Do not spend your time implementing a more efficient
  solution unless you are sure that you can complete this problem set
  on time.


			Last revised 12-Sep-23