Integrating Email, Text to Speech Synthesis and Advanced Telephony Services
Contents
-
Abstract
-
Software Overview
-
Related Work and Products
-
Overview of the system architecture
-
Group Task List
-
Software Documentation
-
Source Code
-
References
Abstract
The Email By Phone system allows the end user to dial in via a regular
touch-tone phone, enter in their login-id and password, retrieve their
email messages and have these messages read to them over the phone.
After dialing up the system, the user interacts with it only through the
touch-tone keys on a regular telephone. At the other end, these tones
are received by a Teltone T-311 Telephone Access Unit. The Teltone
unit converts the signals into the ASCII codes which are used to control
the system.
The system interacts with a POP mail server. It authenticates
the user's login name and password on that POP server, provides the user
with information on the number of messages they have received, allows the
user to listen to message headers, the text of messages and delete the
message. The user may also navigate forwards, backwards or to a specific
message. One important feature is that the user may interrupt the
reading of any message at any point while it is being read back by the
system and cancel, delete or back-up the reading of the message.
The messages are read back by using the Bell Labs TTS system.
This speech output is directed to the Line-Out jack of the system which
is hooked directly back into the Teltone T-311.
The system is interesting in its integration of a variety of different
pre-existing products- the Teltone Telephone Access unit, and the Bell
Labs Text to Speech C programming library.
Software Overview
On start-up the system initializes the Teltone to answer calls and starts
up the TTS system as a daemon process to accept text data and output audio
data.
When a call is received, the caller is prompted for their user ID and
password and enters it using the digits on the phone utilizing a pre-defined
translation system from the ascii characters to the handset's digits.
The Teltone converts this information to ASCII.
After receiving this information, the system authenticates the user
via the user's default pop server, and retrieves messages.
The user is prompted with the number of messages and prompted for an
action - listen to all messages or listen to a specific message number.
For each message played back the user is first prompted with the sender,
subject and date header information for the message and then the system
begins to play-back the message text.
The user may interrupt the play-back of message or header information
at any time and choose a new action.
The server will playback the retrieved text messages using the AT&T
Text to Speech software.
After hearing a message the user has the option to delete the message.
Related Work and Products
The Email By Phone system is not based on any single new technology.
Rather, it brings together a collection of existing disparate technologies
to create a new functionality for users. Its utility is that it brings
mobility to email. Email no longer requires the use of a computer,
just a touch-tone telephone.
A number of private companies have begun to develop and market similar
systems. These companies are all trying to provide a simple, single
integrated solution for handling a variety of different communication services-
voicemail, email and in some cases fax.
Pure
Speech's SpeechMail software which allows users to dial-in and retrieve
email for their PC using voice commands as well as touch-tone keys was
recently licensed by Compaq
and is being packaged for sale along with their new PC's. A number
of services have cropped up which offer email by telephone and other features
to their subscribers. E-Now
is a California-based start-up which is offering access to user email accounts
via a 1-800 number and voice/phone inputs. E-Now users don't need
to install the software or the server themselves, rather they access E-Now's
system for a flat monthly fee plus additional per transaction charges.
VirtualOffice
clients receive their voice mail and faxes on their telephone number, and
can retrieve all their mail, including e-mail, via touch-tone phone or
the Internet. In Germany, EteX
software, has developed a similar product for email by phone which has
also been licensed for use.
In general, the main technical challenge presented by these systems
is the quality of the text-to-speech system and, if offered, the voice
recognition. For our system, we utilized the TTS synthesizer developed
by Bell Labs. This is discussed in more detail here.
Architecture
The following scheme provides a high level view of the software modules
developed in the project:
The User Interface
User Interface Overview:
One of the main design challenges of this project was to create a user
interface for a technology which normally uses a full computer keyboard
and monitor in a telephone handset. We needed to design an interface
which was consistent and easy to use yet powerful and also secure.
Our main difficulty here was presented by the user's log-in ID and password
entry. In our design, the user is required to enter their full log-in information
in order to be authenticated. Although this is not the most user-friendly
option, it provides a high level of security.
This required that our system supports the mapping of all the characters
that are acceptable as a login ID and password characters into a phone
touch tone key or keys. The following implementation attempts to use an
easy to remember way of mapping phone keys to keyboard characters (the
user will mainly have to remember his password) and also map smaller number
of touch tone keys to more frequently used character (based on our opinion
only).
Telephone keys are mapped as follows:
-
A number in the password will be mapped to the same number.
-
A lower case letter will be mapped to the following:
-
The touch tone key where the letter appears. For example: a->2, b->2, d->3.
-
Since q and z do not appear in the touch tone key pad, the "1" key will
be used.
-
"1", "2" or "3" will be pressed depending to differentiate among the 3
letters that appear in each key.
-
For example: "21" will be used for a, "22" for b and "23" for
c.
-
An upper case letter will be represented using:
-
The first 2 keys that are used for the lower case of this letter, followed
by a "1" key, which will represent upper case.
-
Since punctuation can be used for passwords, "." will be represented by
"*", ";" by "**" .
-
Finally, all characters, numbers, etc. are terminated by the # key.
Here we summarize the mappings from a keyboard character to a touch tone
telephone key:
1 |
1# |
a |
21# |
k |
52# |
u |
82# |
A |
211# |
K |
521# |
U |
821# |
2 |
2# |
b |
22# |
l |
53# |
v |
83# |
B |
221# |
L |
531# |
V |
831# |
3 |
3# |
c |
23# |
m |
61# |
w |
91# |
C |
231# |
M |
611# |
W |
911# |
4 |
4# |
d |
31# |
n |
62# |
x |
92# |
D |
311# |
N |
621# |
X |
921# |
5 |
5# |
e |
32# |
o |
63# |
y |
93# |
E |
321# |
O |
631# |
Y |
931# |
6 |
6# |
f |
33# |
p |
71# |
z |
12# |
F |
331# |
P |
711# |
Z |
121# |
7 |
7# |
g |
41# |
q |
11# |
|
|
G |
411# |
Q |
111# |
. |
*# |
8 |
8# |
h |
42# |
r |
72# |
|
|
H |
421# |
R |
721# |
; |
**# |
9 |
9# |
i |
43# |
s |
73# |
|
|
I |
431# |
S |
731# |
|
|
0 |
0# |
j |
51# |
t |
81# |
|
|
J |
511# |
T |
811# |
|
|
User Interface Implementation:
At each point in the user's interaction with the system they may be given
only certain options to choose from. For example, before retrieving
any messages the user must first pass through the authentication procedure.
The system, therefore needs some state information maintained and so we
have modeled this as a state machine.
Each state is processed as follows: the state message is played out
, if there is any required user input the input is accepted from the user,
based on the input, if there is any extra processing that needs to occur,
it is performed by the system.
The following table summarizes the fields in the state table.
Each state may specify a message to be played out, the user input, any
extra processing which needs to occur and its result based on this extra
processing. To make changes to the messages, etc. the corresponding
information can be changed in the source code.
State Name |
Message to play |
Input Digit/s |
Extra Process |
Extra Process Result |
Next State |
Security Considerations:
The security design goal is to allow each user to access his own messages
and to avoid anybody else accessing them. Since mail message security
is provided by the user's login name and password, these must be entered
at the beginning of each email by phone session for the user to retrieve
the messages. Each user is allowed three attempts to log-in, after
which point the system will automatically disconnect them. This is
to prevent attempts at password guessing. Also, to make this harder
the system only reports limited information back when a user enters invalid
data, it will not specifically say if the problem is with the login name
or with the password.
Security Limitations
The problem of having somebody listen to the user's phone line can compromise
the security of the mail messages. No encryption can be done for the communication
in this media, since no decryption tool is available at the phone end.
This puts both the retrieved messages and the user's password in risk,
but provides the ease of use by a very common device: the phone.
This problem cannot be addressed by this project since its intent is
to get mail messages by a regular phone and the problem is part of the
phone system.
Scalability - Support of users from multiple pop servers:
The default mail server name is currently "cs.columbia.edu". The system
can be easily extended to be used for another single pop server by storing
the default mail server in a configuration file. It also could be
extended to support users from multiple pop servers using a database that
will map each "Email-by-Phone login ID" to "pop server login ID" and "pop
server name". The purpose of the "Email-by-Phone loginID" is to resolve
the possible conflict when having the same login ID in different pop servers.
This database will have to be populated with the users information (Email-by-Phone
login ID, pop server login ID and pop server name). Still users that
belong to the default mail server will be able to use the system without
prior configuration( or database population).
Teltone Interface
The Teltone Access Unit provides the phone line computer communication.
The Teltone unit is responsible for answering the phone calls and passing
the server the user input; and in the other direction, for passing the
callers the spoken messages. Teltone accepts input through an R232-C port
AT command.
A small Teltone library was implemented in teltone.c to support the
project. It includes low level routines that write and listen to the R232
computer port, sending standard AT telephony commands as well as some commands
specific to the Teltone functionality. It checks the return codes
of these commands; and high level routines that hide the complexity of
the above and can be used without any prior AT telephony command knowledge.
The teltoneInit() function should be called to initiate the connection,
and teltoneEnd() to end it. Some high level routines examples are:
-
setToAnsweringMode(int ringNum)
-
acceptIncommingCall( int timeout)
-
disconnect()
-
acceptUserInputFromTeltone(char *data, int timeoutSec, int charCount)
The low level routines were designed to be generous in what they accept.
For example, when reading the return code of the commands that return OK
or ERR, the function will read a maximum of 10 lines until OK or ERR are
read. Of course, OK or ERR should normally appear on the next line or the
second next line. If any minor changes are added in future versions of
the Teltone unit these functions will still work.
Also, Teltone is set to return it's return codes as words and not number
codes, to make it easier to debug.
POP Mail Interface
The modules dealing with the POP mail server involves two parts- interactions
with the POP mail server via TCP and parsing of messages for headers
and MIME attached files. The POP RFC provides a simple framework
for interactions with the POP server. The pop.c module provides an
interface layer through which the email by phone system can send messages
to the POP server and receives back useful information for the system.
The functions which directly interact with the POP server are intended
to be as generous as possible with what they will accept from the POP server.
These functions were tested against the servers running on the SUN Solaris
systems in the CS Lab and a Linux system. These functions should
also interact with an IMAP server.
In retrieving messages from the POP server, the system places a limit
on the size of each individual message. This limits the delay between the
request for a message and playing it out. It also provides more stable
memory management.
The parsing of the headers returns back information pulled out from
the relevant header fields in a form which is intended to be more easily
read out by the TTS. The RFC's describing mail messages and message
headers allow a lot of latitude to the sender mail client. The parsing
functions attempt to deal with this. These functions were tested
against mail messages composed from a variety of different clients.
The POP mail interface will find MIME boundaries and MIME content-type
and encoding fields. It also provides a separate function to decode
from Base64 encoded attachments.
Bell Labs TTS
In using the TTS system, we were mainly constrained by the requirement
that the user should be able to cancel play-out of the mail data at any
point, without having to wait for all data to be played out. This
requires that the system implement some type of concurrency and the play-out
of TTS data should be interruptable by the system.
The TTS library has two options- allow the TTS server to handle the
speech output by sending it directly to the audio port or send the data
back to the client. In order to implement this type of concurrency
we had to be able to interrupt the speech output so we needed to maintain
control over audio input within the email by phone system. Unfortunately,
the current TTS library did not always act as expected when sending back
the speech data to the client. In particular, the library is supposed
to allow the user to specify a function to handle the user output.
This capability did not seem to be available. Instead in order to
control the output we specified a file into which TTS writes the speech
data, the only other option TTS provides for returning data to a client.
After writing out the data, the TTS returns and the file can be processed
by our system. This allows the system to interrupt speech play-out
when the user presses a key on the phone.
The main drawback to this is that there is an increased latency between
requesting a message and hearing it. In order to be sure that the
file is ready to be safely played-out to the audio port, the entire message
is first fully processed by the TTS server and saved to disk before being
output to the audio port.
Task List
Design user interface at the phone-end: (Miriam Tauil)
-
Design and implement security, so each user can access only his own messages.
-
Allow the user to navigate and manage their list of messages.
Communicate with the Teltone T-311 through the serial port. (Miriam Tauil)
-
Accept the ASCII commands received from the "Teltone T-311" hardware.
-
Send this information to pop server.
Communication with the PopServer including:
User authentication, message text and header retrieval, parsing of messages
and headers. (Jeremy Blumenfeld)
Playback of user text messages into the "Teltone T-311" port using
the AT&T Text to Speech software. (Jeremy Blumenfeld)
Control of system to allow user-input to cancel.
References
Last updated: May 6, 1998.