CS1003 Homework #6
Due by Wednesday, May 5, at
5:00pm
There are two parts to this homework assignment: a theory
portion, worth 10 points, and a programming portion,
worth 15 points. Please be sure to review the submission instructions in
advance, and make sure to include a README.
Written questions
- (2 points) Explain, in a few sentences and in the context of
the operating system's memory manager and the memory hierarchy
in a computer, why having low amounts of RAM in your computer
leads to a degradation in performance.
- (6 points) The "distance" between an Internet client and
server is often a important metric in determining how responsive
a particular host will be. Often, that distance is measured in
"hops" -- and the traceroute program (which you can run
from the Windows NT/2000/XP command line as tracert, or
/usr/sbin/traceroute on CUNIX) helps you to measure
this. traceroute takes one important command-line
parameter: the name (or IP) of the remote host to trace a route
to. (You can also type in man traceroute on CUNIX
for a detailed description of the tool.)
- (2 points) Try to find a computer "as near as possible"
to yours without actually being your computer. On CUNIX,
there are a number of machines that are exactly 1 hop away.
Show, using your traceroute results, that it is as
near as possible, and suggest why that machine is "so near".
- (2 points) Conversely, try to find a machine "as far
away as possible" to yours, and include your
traceroute results to demonstrate this. Note that,
at times, firewalls will make it difficult to obtain a
complete path: if the nth hop only shows stars
without actually finishing the traceroute, you'll have to
try another host that's not firewalled. Can you find
something that is at least 15 hops away? What "part" of the
Internet, topologically, might be "far away" from an
institution like Columbia?
- (2 points) Based on the characterizations of the two
subproblems above, what can you conclude about Internet
topology?
- (2 points) Brookshear Chapter Review Problem 10.16 (page
448).
Programming assignment: text formatter
In this assignment, you're going to build a program that reads a
text file via file I/O functions, stores each word in a unique
array cell, and then prints out the text formatted nicely for a
screen. The key is that the program will be able to handle
files of a non-predetermined size, so we will have to use
malloc to dynamically size the array.
I define "nice formatting" as a form of plain text
word-wrapping. In other words, if you have something like:
the
quick
brown fox jumped over
the lazy dog!
... nicely formatted would suggest:
the quick brown fox
jumped over the lazy dog!
Obviously, we can't always fit all the text on one line. So,
we'll use the old typewriter convention for word-wrapping: 80
columns, which is standard for a letter piece of paper if you're
using a fixed-width font.
-
(4 points) Write a function called countWords that
takes a FILE pointer as an argument, and returns an int -- the
number of words found in this file. A UNIX tool called
wc -- short for word count -- provides similar
functionality, although it goes a bit further and tells you
the number of lines, words and characters encountered;
we're only concerned with the number of words. (Nevertheless,
you may choose to use wc as a sanity check to make
sure your countWords function is working.)
There are several strategies to do this. The easiest way is
to process the file, one character (char) at a
time, and see if it's whitespace or not. If it's whitespace,
it's generally separating two words. If it's not, it's part
of a word. In theory, all we'd have to do is to count every
character that happens to be whitespace, and we'd be done.
Unfortunately, you may have multiple spaces between words, and
counting the exact number of whitespace characters will give
you a skewed answer. So here's the strategy:
- Read a character using fgetc. If it's equal to
EOF (end-of-file), you're finished -- return the # of words
encountered as you've counted up in the next step.
Otherwise...
- If it's whitespace, increment a local counter variable
that serves as the number of words by one. Then,
continue reading whitespace until you either reach
EOF or a non-whitespace character, at which point you're
done with this step. (If you encounter EOF, return just
like you did in the previous step.)
- Go back to the first step.
The strategy here is that if there are multiple spaces between
words, they get "slurped up" when they're encountered, without
repeatedly increasing the word count. "Slurping" works
because that's how FILE streams are structured -- they
repeatedly feed you characters as you grab them.
Once you've finished counting the number of
words, simply return the result.
- (3 points) Write a function called readWord that
takes a FILE pointer as an argument, and returns a pointer to
char (i.e., a string) containing a single word. The
strategy for this is similar to countWords, except that
here you'll need a temporary string (100 characters should be
sufficient) that you'll copy non-whitespace characters into.
Since this function only needs to read a single word, you can
follow a simplified process:
- Read a character. If it's whitespace, keep on reading
until you reach the first non-whitespace character.
- As long as you haven't reached EOF, start copying the
word, one character at a time, into the temporary
array.
- As soon as you reach whitespace or EOF, stop. Put a
\0 as the last character in your temporary array
(to terminate it properly), and then return a malloc'ed
duplicate of this word. Making a duplicate is necessary
because the temporary array is a local variable which will
disappear as soon as you leave this function. You can
either malloc a string with the length of the word in the
temporary array (i.e., strlen) and then use
strcpy or you can use the function specially
designed for this: strdup. (The latter is much
easier to use!)
- (3 points) Write a function called loadWords that
takes three parameters: a FILE pointer, a pointer to a string
array (i.e., pointer to a pointer to char array, or a "double
pointer"), and an int suggesting the number of words. This
function is very simple: it just calls readWord
repeatedly, storing the results in the string array, one word at
a time, and repeats this for the number of words provided as a
parameter. No return value is needed. (Note that when I say
"storing the results", I don't mean a strcpy
operation; rather, we've already allocated new memory for
this string, and all that you need to do is to make a literal
pointer assignment to the appropriate array cell in the string
array.)
- (3 points) Write a function called printWords that
takes two parameters: a pointer to a string array and the number
of words in that string array, and prints them out properly
formatted to standard out. It'll walk through the words array,
one word at a time, and do some simple string length
calculations to see if it'll fit on the current line being
printed out (in other words, you'll need to keep a line length
counter tracking where "your character is" on the line, and
check to see if the next word would still fit within the
80-column limit). If there's enough space, it should just print
it out without a newline at the end, and update the line
length counter by using strlen on the string being
printed out. Otherwise, it should print it out immediately
after a newline, and should reset the counter (i.e., no
hyphenating needed!). Make sure to include a space between each
word.
- (2 points) Write the main function. The main
function will take one command-line argument -- the name of the
file -- and will attempt to open it. (If the user neglects to
supply a command-line filename, or if it's unopenable, the
program should print an error and exit.) Once the file is open,
the main function should compute the number of words and print
it out on the screen. It should then malloc an array
of strings (e.g., a double char-pointer) so that it has enough
memory space for n char pointers (e.g., single
char-pointers) where n is the number of words. Finally,
it should call functions to load the words into this array and
print out the nicely-formatted result.
Here are some hints you may find helpful.
- I strongly suggest you test each part separately.
For example, once you've written the countWords
function, write a simple main that does an fopen
immediately followed by countWords, and compare the
results to an execution of the UNIX wc command on a
simple text file.
- To determine if a character is whitespace, use the
isspace() function as declared in ctype.h (which you'll
have to include). It takes a single char as input and returns
a 0 if it's not, and a nonzero if it is.
- There is a function called rewind in C that takes
one parameter -- a FILE pointer -- and resets it back to the
beginning of the file. You may find this useful as a way to
return to the beginning of the file after the words have been
counted so that they can then be rescanned for loading
purposes.
(5 points extra credit) Modify the logic of
readWord and printWords so that the formatter
keeps paragraphs separate (that is, it rejustifies each paragraph,
but prints them out separately). In order for this to work,
you'll have to encode a newline inside individual word strings,
and then printWords must check to see if such a newline occurs as
it's printing data out. (Hint: if, while reading, you encounter
two newlines in a row, you've found a paragraph break; store one
in the word you're reading.)