Homework 3
[Solutions]Due Date
This assignment is due before class on Wednesday, June 25th. Please refer to the page on electronic submission for more information. If you choose to use a single late day, the deadline becomes extended until 11:59pm on June 26th (so it extends the deadline by almost 30 hours). After that, each additional late day will extend the deadline an additional 24 hours. Remember that you can use at most 2 late days on this assignment.Programming
- (60 Points) Search Engine: Your task is to
write a simple search engine that works over a collection of
news articles. The overall functionality is as follows:
- The user is asked to type in a filename containing a list of news articles. The specified file is opened by the program (and the program quits if an invalid filename is entered).
- The user is asked to type in a single keyword or phrase.
- Armed with these two pieces of information, your program
reads through the file trying to find the top three news
articles that contain the
highest number of instances of your keyword. So if you enter "economy" as your keyword, the program should retrieve the top three articles that have the highest number of the word "economy" in them.
parse the file contents to figure out where articles start and end. As you are reading the input, look for "</REUTERS>". That string will always signify the end of an article.Once you have read through all the articles and tallied the number of keywords in each, then it is time to print out the article text of the top three articles. If this part becomes too difficult, try printing out only the single top article rather than three.
This program is longer and more difficult than the others since it is a combination of coding tricks as well as logic problems. There is not a whole lot of code to write, but you want to use the input functions we've seen along with arrays to get all of this done. Also, keeping pointers to, say, the beginning of articles in an internal array would be useful. So this serves as an example where pointers actually make the problem easier. You will want to start this program by opening the file and reading from it (try using fgetc(FILE *fp) rather than scanf to read from the file). Read the file one character at a time and keep adding the characters to an array. Then look into the functions in
for searching, comparison, etc. I will go over these, but try to get comfortable reading technical documentation. I can't post the file on the web since it's not really for anonymous public use. Instead, if you enter cunix and type
cp ~gmw51/cs1003/reut2-000.sgm .
then it will copy the news file (reut2-000.sgm) to your cunix directory. From there you can copy/download the file to anywhere you like. Beware, it's 1.3MB.
Don't get paranoid about finishing this project. Get as much completed as possible. Work in such a way that you can build the code incrementally and always have code that compiles and runs (but might not do everything required). Compile/test often. Write pieces of code that exercise and test your code even if you will later delete the testing code. Even if your code only reads the articles and correctly locates the end of articles, you will get substantial credit.
- (40 Points) Addressbook: For this, implement a simple addressbook program. User should be able to both add and lookup contacts in the addressbook. On exiting the program, you should SAVE the addressbook to a file. When the program starts up, load the addressbook information from that file.
Users should be able to:
- Search (by lastname only)
- List all entries
- Add an entry
Concentrate on getting this to work without a file first. So design a program that always starts with an empty address book. Then build the add/search/list features. Finally, work on saving that array to a file (in a format of your design). Lastly, try to be able to read the records upon startup.
Again, this is an open-ended problem. Get as much done as you can, but I am not expecting a polished result. The point of these problems is to tackle the difficult algorithmic parts, solve them, and incrementally work to a robust and usable solution.
For both of these, I will be providing helpful code along the way (we'll see some in class Wednesday). For each of these assignments, you can write all your code in one file if you choose (submitting a total of two .c files upon completion).
Grading
Grading will be based on correctness as mentioned in each problem. Read each problem very carefully to ensure that you've completed all that is required. A programmer needs to be able to read a paragraph description and implement the requirements rigorously. (80%)COMMENTS and description will account for 10% of each assignment. You must use comments to describe in a succinct fashion what is happening in your code. Also, name, id, etc should be placed in a comment at the very top of all your files.
Your code must be ANSI compliant. If you follow what the books say, this should not be an issue, but bad habits are known to pop up at this stage. (5%)
The last 5% is based on elegance/efficiency/style. For this project, efficiency won't be taken heavily, but formatting your code as we've been doing will be. Don't write code that completely defies any reasonable formatting convention (even if it is correct). For example, don't do things like:
int main(void) { int x;int y; int z; int c,d;c=5; if (c == 1) { d = 7; c = 4; } return 0; }
You are submitting two files: search.c and addressbook.c, each with a description of how your code works either in comments or in a separate README file. Don't submit binaries (like a.out). Place your .c files (and perhaps README file as plain text) in a directory on the cunix machine and see the submission instructions. For this assignment, a README file is recommended to describe exactly what you completed for both assignments.