This paper studies and analyzes instant messaging (IM) behavior. Users are exploring various mediums of communications. Interaction between two users is not just restricted through phone or e-mail. It is being observed that more and more users are moving to other non-obtrusive forms of communication like IM and text messaging. This paper analyzes IM-based communication of a user with other users to study how often he chats with them, amount of time he spends on instant messaging with a contact, frequency of messaging on a per contact basis, average duration of IM conversation and distribution of total chat to chat per conversation. We analyzed the chat logs generated by IBM Sametime 7.5 IM application. The logs were collected from IBM Sametime IM users who volunteered to share their chat data by running a chat-analyzer program on their chat transcripts. Analysis of user IM usage pattern can be used for server side capacity planning based on IM usage trends, automatically adjusting the presence subscription rate for each of the contacts based on amount of inter-user IM activity, providing personalized services to different users based on usage pattern profiling.
The use of instant
messaging has been increasing over time with addition of newer functionalities
to IM in an effort to ease the pains of virtual communication.
In this paper, we analyze IM usage patterns and identify applications of such trends and data in other areas. Our work provides the tools and baseline measurements to support the analysis of cross-cultural communications for coalition command processes. There are a few problems we encountered during performing IM usage analysis. IM usage data cannot be obtained from server logs because typical servers and service providers don’t expose this data. It requires users to share privacy sensitive chat transcripts or archives. Also, different users use different and often multiple IM clients and the format of each archive is different and sometimes stored encrypted, hence, making it infeasible to analyze.
The initial approach we adopted was to analyze client side chat (IM conversations) logs from Microsoft's MSN Messenger [4]. But not many MSN users were willing to share their chat transcripts. In order to have unbiased analysis from real data it was crucial that the users continue to use their instant messaging clients as they would normally do without going out of their way to alter their behavior for this study. Hence, we decided to switch to study IM usage behavior using the logs from corporate chat client IBM Sametime 7.5. We plan to make chat-analyzer program for MSN messenger available in public domain so that users who are interested in running it against their MSN chat transcripts can do so and send us the results.
The chat analyzer program written for Sametime only gathers non-privacy sensitive data pertaining to IM usage pattern for each user such as:
· Per day distribution of total chat with each buddy in contact list;
· Percentage of time a user chats with each individual buddy;
· Amount of time a user spends with a buddy per day;
· Per buddy distribution of total chat each day;
· Typical length of conversations.
Traditionally, the overall IM volume is used to do server capacity planning. There was no emphasis on the IM usage pattern of clients. As described above, with more clients requiring real time presence updates, due to increased usage of IM from context introduced because of presence, analyzing the pattern helps to automatically adjust presence notification rate. Some other usages can fall in the area of social networking.
There are multiple types of analysis from simple message frequency and number of message, type of data exchanged (URL, file transfers, emoticons) to complex linguistic analysis. This may in turn be used for determining the primary use of IM, trust level between users, and the social or work relationship dynamics.
Such an analysis of IM usage patterns can be used to analyze trends in IM usage, to study if a user is using IM instead of e-mail or phone over a period of time and to determine if the usage of phone is decreasing with increased use of IM. Additionally, usage patterns in terms of number of messages and time spent over IM communication can be used for automatically adjusting presence subscription rate for user’s contacts [5].
There
has been some previous work in this area but none of them were focused on
determining inter-user communication timings.
Muller et. al., [9] show analysis of IM usage with a
large number of users but it is based on user self reports and surveys. They studied maturity of IM network over a
period of 24 months showing development of chat behaviors and social network as
well as attitude.
Another
study by Herbsleb et. al, [7] describes experiences based on introducing
instant messaging and group chat in geographically diverse work groups. In such
environment, informal nature of communication and cross-site communication is
perceived to reduce the utility of IM based communication.
Issacs
et.al. in [8] did a
major study based on logging IM at workplace. They found workplace IM was for
complex work discussion. 28% conversation were simple and single purpose, 31%
about scheduling and coordination, heavy IM users and frequent IM partners were
generally working together or collaborating and involved many fast paced
discussion. Light and infrequent IM users involved more for scheduling and
coordination.
There
have also been studies on instant messaging, and interruption and productivity.
The work proposes techniques to how such interruptions can be queued to avoid
productivity loss [11].
Other studies include observation in IM usage by introducing presence [10].
The software is written for java 1.5 which approximated to 3400 lines of code. The program has a main invocation class ChatParserANDWriter.java. This loads the chat folders and sends it to FilePersonHistory.java and FileChatTranscript.java parse and analyze them. The analysis creates four files with data in xml format that is stored in a chatdata folder. The xml files are then used to plot graphs using an IBM internal charting service and Microsoft Excel. (See Figure 1)
The Sametime chat logs are stored in HTML format. The chat analyzer program
creates three objects: Chat, User, and Date objects that store entire program
related data. Chat data has information like the start time, end time,
initiator of a conversation, and length of a conversation. User object stored
information like chat with per user, total chats, and percent chats per user.
Date object stores data based on conversations for a day to keep track of user
monthly, weekly, daily activity. The chat analyzer processes the HTML based
chat logs and generates four XML files. Example XML files are given below:
·
chat_by_user.xml –
<userid>
<user creationTime="24 Jan 2007 17:38:46 GMT" id="5"
initiator="6" lastActivityTime="24 Jan 2007 17:57:10
GMT" percent="8.44" total="853">
<date chatdate="20070124">20</date>
<date chatdate="20070129">21</date>
<date chatdate="20070214">35</date>
...
...
</user>
</userid>
·
chat_by_time.xml -
<userid>
<tally distribution="9" length="15"></tally>
<tally distribution="4" length="62"></tally>
<tally distribution="2" length="79"></tally>
<tally distribution="4" length="107"></tally>
...
...
...
</userid>
·
chat_by_date.xml _
<userid>
<date chatdate="20070122" total="0" />
<date chatdate="20070123" total="84">
<user id="1">49</user>
<user id="2">24</user>
<user id="3">0</user>
...
...
</user>
</date>
</userid>
·
chat_by_bytes.xml -
<userid>
<user chatlength_in_bytes="7" name="1"></user>
<user chatlength_in_bytes="7" name="2"></user>
<user chatlength_in_bytes="9" name="3"></user>
<user>
</userid>
That xml data is formatted to provide it to a plotting API that will generate histograms, line charts, column charts displayed in the measurements section.
Fig. 1 Program Architecture
The goal of studying user instant messaging behavior was to find a pattern in the usage. Chat logs can have from minimal to a lot of data but a careful decision had to be made as to what data can be shared to make such analysis without invading user privacy. Only non-confidential user data like timestamp, hashed unique user id, chat length, and number of chat bytes over a span of time was used for this study. Some things that would come out of analysis would be and useful for user to know:
· Which users do I talk to more often? (A-list users)
· What percentage of users do I talk to more often?
· What is the typical length of my conversations?
· What days of the week do I use IM more?
· What month has more activity over others?
· Is there a pattern in my usage?
The data set used is for period of four months (Jan 2007 – Apr 2007). The XML files generated by chat analyzer were used to plot data such as length of chat conversations with users, total number of chat messages (bytes and lines) with users, total chats pattern over 4 months span for a set of users.
In Fig. 2, we see for a sample user how the chat
data is plotted against time. From this particular user’s data it can be seen
that moving average of total number of messages is increasing with time (date)
(in chart 1). The chat distribution per day also reflects the distributed
nature of co-workers. A user in Eastern Time zone, working with users from
Figure 2 Chat distribution
with date
Figure 3a is a representation of chat usage (in
number of lines) for five users over four months of time. We observed that
there is overall increase in usage of instant messaging over time. We observed
drops in usage over the weekend that can be attributed to the fact that it is
workplace IM client. If you see Chart2b, an interesting observation made was
that there were peaks in beginning and towards the end of the week. For
example, on
Fig 3a. Total chat messages distribution with
time Figure 3b IM usages of 5 users over 2
weeks
The graph in chart 4 shows that most users have small conversations with small occurrences of larger conversations. A conversation is like a sequence of chat messages in a thread, separated in interval from others by thirty minutes. Hence, this study breaks users conversations based on 30 minutes inactivity period. The occurrences of chat conversations range from zero to ten conversations. This data is consistent for all users as you see below. Zero conversations mean users ping other users and get no response from the other party for at least another 30 minutes. After every 30 minutes of no conversation (determined from timestamps) with a user a new chat with that user is considered a new topic / conversation. This is also used to determine total number of conversations per day.
Figure 4
Length of IM conversations vs.
Occurrences
The graph in Figure 5(a) and 5(b) shows distribution of number of chat messages for each user (on x axis). The total amount of message for user is gathered by 1) counting the total lines typed and 2) counting the total number of bytes generated by the message. Some users use one line to deliver their entire message while others type multiple lines with 1 or 2 words at a time. Although, the bytes and lines have relationship, but we wanted to see if such relationship holds same for all contacts in IM or a user has different communication pattern for each contact. This may also depend on relationship with the contact.
We observe both number of lines and bytes have similar pattern as can be seen in Fig. 5(a) and 5(b). From Fig. 5(c) we observe that the pattern is similar for many IM users.
Another, interesting observation to make is distribution of total chat to number of users. As can be seen, majority of chat is with very few contacts. For user in consideration more than 10% of conversations are with less than 10 people, more than 20% are with less than 5 people and 50% is with single person.
Figure
5(a) Length of IM conversations vs.
Users (number of messages)
Figure 5(b)
Length of IM conversations vs. Users
(number of bytes)
Figure
5(c) Length of IM conversations vs.
Users
Figure 6 Normalized total chats vs.
Users
In Figure 6, we observe the distribution of the total chat (or number of
instant messages exchanged) for multiple users. As it can be seen, most of
messages are concentrated among few users. This is an important observation as
this can be used for presence profiles of users having higher communication.
Fig 7(a) and 7(b) show amount of chat (number of IM, normalized over all users)
vs. number of users. Figure 7(b) is on a log scale.
Figure
7(a) and 7(b, log scale) Normalized total chats vs. Users
In the future we plan to extend this work by doing studies using IM applications like MSN, Googletalk, Yahoo, and Trillian. Since these are non workplace specific, they will give better insight into instant messaging usage analysis from a social communication point of view. Another plan is to study variation of IM usage against phone and e-mail usage for a user over a period of time. IM is real-time against e-mail and non-intrusive as compared to phone, so it would be interesting to see if IM is replacing either of them to an extent. Linguistic analysis of instant messages can also be done to see the purpose and type of IM utilization. Socially connected clouds can be created based on IM usage. We plan to provide tools for measurement and analysis of such communication pattern which in turn may improve the quality and effectiveness of communication.
IM based communication pattern is very useful and can be correlated to situation analysis. A change in set of user could mean changing work or social relationship or tentative change in job or work environment. The variation of chat usage with time on daily basis, increase of overall chat over a period of time, more communication with specific users with whom you share work relationship were some observations. We found inter-user IM usage distribution time and also found that bulk of IM usage tend to be concentrated to few users. Also, we found some users have exceptionally high average chat usage. These could be typically project managers or program coordinators which may be considered when analyzing average conversations for a job role. Some other questions which such a study could help us to answer could be: Does IM usage increases towards the end of a quarter? Do studying IM patterns demonstrate that no patterns exist in IM usage? Since IM communication, just like text messaging, is non-obtrusive and the newer generation (more so than other age groups) is learning to just stay online anywhere they go – work, school, or home? Can this study be expanded to online behavior in general?
Most of
the code was written by me except:
ValueSortMap.java - http://www.programmersheaven.com/download/49349/download.aspx
[1] IBM Same time, www.ibm.com/lotus/sametime
[2] Microsoft-communicator, office.microsoft.com/communicator
[3]
IM Usage
survey, http://www.aim.com/survey/
[4]
MSN, www.msn.com
[5] Singh V. and Schulzrinne H.,
“Presence Traffic Optimizations”,
[6]
Houri, A., "Problem Statement for SIP/SIMPLE",
draft-ietf-simple-interdomain-scaling-analysis-00.txt (work in progress), Feb
2007.
[7]
Ljungstrand, P., Hard af
Segerstad, Y., “Instant messaging with WebWho,”
International Journal of Human-Computer Studies Volume 56, Issue
1 (January 2002), Pages: 147 – 171.
[8]
James D. Herbsleb and David L. Atkins and David G. Boyer and Mark
Handel and Thomas A. Finholt, “Introducing instant
messaging and chat in the workplace”, CHI '02: Proceedings of the SIGCHI
conference on Human factors in computing systems, Pages 171-178.
[9]
Ellen Isaacs and
Alan Walendowski and Steve Whittaker and Diane J. Schiano and Candace Kamm, “The
character, functions, and styles of instant messaging in the workplace”, CSCW
'02: Proceedings of the 2002 ACM conference on Computer supported cooperative
work. Pages-11-20.
[10] Michael J. Muller and Mary Elizabeth Raven and Sandra Kogan and David R. Millen and Kenneth Carey, “Introducing
chat into business organizations: toward an instant messaging maturity model”
GROUP '03: Proceedings of the 2003 international ACM SIGGROUP conference on
Supporting group work. Pages 50-57.
[11] Peter Ljungstrand , Ylva Hård
af Segerstad, “Awareness of
presence, instant messaging and WebWho,” ACM SIGGROUP Bulletin,
v.21 n.3, p.21-27, December 2000.
[12] Czerwinski, M., Cutrell,
E., & Horvitz, E. (2000). Instant Messaging and Interruption: Influence of
Task Type on Performance. Proceedings of OZCHI 2000, 356-361.