COMS 4419: Internet Technology, Economics and Policy (Fall 2022)
Projects and Term Papers
The categories below are approximations - all projects and term
papers may include quantitative analysis, interviews with experts,
literature surveys and software development. In all cases, the relevant
literature should be carefully considered and cited.
Any of the projects listed as part of the
Datasets
and Potential Research Questions 2013 would likely be suitable
as a project.
Project Plan
Each team must submit a project plan
outlining their goals for their project or term paper.
Progress Report
Each project team member should submit one updated progress report as a PDF file once every
two weeks, on Thursday evenings starting two weeks after the project
proposal, clearly indicating the work accomplished in those two weeks as
well as any obstacles. There should be some verifiable signs of
progress - "80% complete" is not helpful; "wrote functions to do X (app.
200 LoC)" is better.
Project Presentation
The presentation should be targeted to last no more than 12 minutes,
leaving 3 minutes for questions, for both one-person and group projects.
Since talks are back-to-back, we will have to cut short talks that exceed
their time allotment. For most speakers and slide styles, this
translates to (at most) 7-8 slides, including the title slide. For
group projects, you can either split the presentation or designate a
single speaker. The former is preferred, to give everyone a chance to
practice. You should consider your talk like a "pitch talk", i.e., get
the listener interested in your project. What problem did you tackle or
what area did you investige? Why is this interesting or important?
What were you most surprising results? What approaches did not work
well? Briefly, what would be next steps? Please be sure to practice
your talk so that you are sure of timing, content and hand-offs.
See talk hints, and Writing
technical articles also links to materials related to talks.
Project Final Report (= Research Paper)
Project reports are typically 3,000 to 5,000 words per project team
member, i.e., 6,000 to 10,000 words for a two-person project.
Papers should be single-spaced, 11 or 12 pt font, and should conform
to the recommendations on writing
style and avoid
common mistakes. You can include any extensive graphs or tables as
appendices if needed. Use of the IEEE
templates is strongly suggested. The structure in this guidance
page should be followed, although it is somewhat less applicable for
analysis and review projects, where a standard "term paper" format is
called for. Please ignore the guidance about page limits and individual
project reports - the guidance is from a different university.
Experiments and Implementation
Some, but not all, of the projects below require computer networking background,
e.g., from a class like CSEE 4119.
- Privacy for the Internet of Things or smart TVs:
- What kind of data do Internet of Things devices, smart TVs or video
devices (Roku, Amazon Fire, Google TV) exchange with the outside world?
Who do they "talk" to? The project requires knowledge of Wireshark or
scapy.
- Measuring Internet port blocking:
- Most consumer Internet services block some Internet ports, sometimes
for historical reasons, sometimes for security reasons and sometimes for
reasons that are less obvious. Develop a tool that allows a user to
test which UDP and TCP ports can be used for both incoming and outgoing
packets, and whether other IP features such as IP options and IPv6 are
usable.
- Video quality:
- How does bandwidth and packet loss affect the video quality of
streaming and interactive applications such as Zoom, WebRTC
applications, YouTube, Netflix and Skype? Consider using a network
emulator to simulate various network conditions.
- Finding Internet bottlenecks:
- When Internet applications suffer from performance problems, it is
often difficult to tell whether the problem is found in the home (Wi-Fi)
network, in the first-hop access network (e.g., the shared cable
network), the middle-mile network, the Internet backbone or at the
server or CDN. Develop a tool for either a desktop or mobile OS to
estimate where the performance problem is likely to be found.
- Wi-Fi performance:
- It is not uncommon that Wi-Fi is slower than LTE. Map the
performance of Wi-Fi (e.g., the Columbia Wi-Fi network) vs. LTE in a
geographic area, including indoors, e.g., using the FCC mobile
measurement application.
- Wi-Fi congestion:
- Measure Wi-Fi spectrum usage in the 2.4 and 5.8 GHz bands in various
locations in New York City (or wherever you may want to travel...), both
indoors and outdoors. How many stations are visible on what frequency
channels? Where are publicly accessible access points, such as "Cable
Wi-Fi" visible and accessible? Measure the impairment due to
interference, i.e., how much lower throughput is between a mobile device
and the base station compared to a "silent" radio environment.
- Internet speed tests and broadband labels:
- There are a large number of speed tests available, including from
the FCC (Measuring Broadband America), Ookla (speedtest.net), and
Google. For the same network, they often provide very different
results. Compare these approaches and analyze how they measure speed
and latency. Can you explain the differences? How do you compare
proposals for broadband consumer labels?
- Location measurements:
- What is the reliability of handset-provided geographic location
data? Build a tool that allows users to indicate their true location
based on a map and compare it to the location provided by GPS, Wi-Fi or
cellular tower data. Explore the reliability systematically, both
outdoors and within a building.
- Indoor positioning:
- Can you determine the room (apartment, office, ...) you are in by
comparing Wi-Fi "fingerprints"? Can you apply machine learning
techniques to the task?
- Altitude information:
- Many modern smartphones have built-in altimeters based on barometric
pressure. Altitude (elevation) information can be very useful to
dispatch first responders after a 911 call. Conduct experiments that
allow you to evaluate the accuracy of this data in various
buildings.
- Emergency assistance:
- How can citizens be better integrated into emergency response
activities, e.g., after large-scale natural disasters? Consider an app
that allows citizens to volunteer, be vetted, and then be dispatched
similar to official first responders.
- TTY replacement:
- People who are Deaf or hard-of-hearing use text-based communication,
either directly or via a relay service. The first text-based
communication used TTYs,
using analog modems. Architect and design a system that replaces the
outdated TTY technology with an Internet-based system that can use
either a dial-up modem or broadband, but still communicate with relay
services (via IP) and existing analog TTYs (via a gateway).
- Speech-to-text meeting summary:
- Using recordings or after live participation by citizen reporters,
can we auto-transcribe local government or regulatory meetings and
provide summaries, to augment reporting by local journalists (who may no
longer be able to cover every borough or county meeting)?
- Ad tracking and cookie permissions:
- Many websites allow you to choose whether to accept cookies or
select among categories of cookies. Determine which cookies are
affected? Does the loading speed or data volume of the website
change?
Data Analysis
- Peering:
- Using routing and peering
data, characterize peering relationships between carriers, content
providers and CDNs. Who peers with whom? Under what conditions?
- Robocalls:
-
- I have access to a variety of data sources about robocalls, which
may address questions such as: How many calls pretend to be local? Is
this changing over time? Are there distinct campaigns that wane and wax (e.g., car
warranties, electric utility scams, medical insurance)? Are the same
numbers, real or fake, or are numbers being rapidly rotated? Who are the
numbers assigned to? How many such calls are signed (STIR/SHAKEN)?
- Broadband metrics:
- The FCC now gathers a range of broadband performance indicators that
are highly correlated, e.g., as part of the Measuring Broadband
America data set. What is their relationship with each other?
Which of these are independent or dependent variables?
- Radio, TV:
- Using FCC databases and TVStudy (OET69) software, estimate the
number of TV or FM radio stations that can be received in various
places. Provide estimates of population averages by state and
population reach by station. (For example: "The average household in
North Dakota can receive 3.5 TV channels. The average TV station
reaches 150,000 households.")
- Broadband pricing:
- Try to estimate, based on online surveys (e.g., your Facebook or
LinkedIn contacts) what they pay for wireless or wireline Internet
connectivity and compare by performance, region and country.
- Consumer expenditures:
- Gather all available data on consumer expenditures for telephone,
cellular and Internet services, comparing government data, industry
analysis and corporate annual reports. (The BLS consumer expenditures survey
provides some information, but may not map cleanly into current
categories.) Is the data consistent? Can it be compare against other
major OECD economies? How have expenditures changed?
- Broadband deployment:
- Analyze the FCC Form 477 broadband deployment data to show how
connectivity, technology and bandwidth (speed) have changed over time.
How do changes correlate to population density and household income or
other demographic variables? Using the Universal Service Fund data, how
does funding correlate to changes in broadband availability?
- Broadband subsidies:
- In the United States, both the Federal government and states
subsidize broadband and communication services (mobile phones, mainly).
Who benefits - consider rural vs. urban, richer vs. poorer areas, using
data provided by the FCC, Census data and other sources.
- Rural electric cooperatives:
- Analyze the service territories of rural electric cooperatives.
Using the FCC Form 477 data, how good (or bad) is broadband connectivity
in those areas? Has it changed recently?
- Network reliability:
- Can you determine network outages, both "sunny day" and "rainy day",
from the FCC Measuring Broadband America or ATLAS measurement
infrastructure data?
- Vaccine misinformation:
- Various tropes of vaccine (or, more generally, COVID-19)
misinformation seem to ebb and flow over time. Can one detect such
changes? Are they reflected in Google searches, traditional media
coverage or across multiple social media platforms such as Twitter and
Facebook?
Literature Review and Analysis
The projects below summarize key resources in the topic area. They
may involve data, but are likely to require smaller volumes of data and
less advanced statistics. They may also draw on interviews you conduct
with domain experts.
- Digital "papers":
- Read Carpenter v. United States, United States v.
Jones, and Riley v. California and maybe some lower court
cases, summarize how courts are handling search warrants of digital
"papers". How has treatment changed? How do these decisions reflect (or
not) the differences between traditional and digital letters and other
personal documents?
- Web scraping:
- Read Van Buren v. United States and CFAA (about legality
of webscraping). Read some of the amicus opinions. What are the main
arguments? What are the equities involved?
- FCC Internet privacy rule:
- Read regulatory filings for the (now rescinded) FCC internet privacy
rule. Evaluate the technical arguments about CPNI, encryption and the
internet.
- CLOUD Act:
- Summarize the text of the CLOUD Act as well as the opinions on
either side.
- Social media monitoring:
- Do a survey of different police department policies on monitoring
social media. (You may need to contact the police departments public
affairs office or search legal cases.)
- Transparency report:
- Do a survey and data analysis of tech company's transparency
reports. What do they cover? How do they differ in categories and
geographic detail? Do they indicate what they do not disclose? Can you
design a template similar to, say, a 10-Q disclosure?
- Mergers:
- Track data about mergers and acquisitions before and after FIRRMA
was passed. How did this impact foreign M&A?
- Media:
- For different TV and radio stations (e.g., in the NYC
area), determine their programming mix, e.g., children's programming,
local news, advertisements, syndicated programming, ...
- Data portability:
- For major consumer services for photos, messages, social media
posts, address books, and email (e.g., various Google services,
Facebook, Instagram, TikTok, Whatsapp, Yahoo Mail, Apple photos and
email), can you extract your data, e.g., to move to a new service? How
long does it take? How useful is the data you can extract? Can you
import the data (e.g., email or photos) to another service? Are there
tools to help?
- Ad blocking:
- Among popular websites, e.g., for news, which function well with ad
blockers and which fail or explicitly refuse to provide content?
- Content moderation:
- What kind of discussion forums, ranking and content moderation do
national and local news sites employ? Is there a way to measure the
quality of the discussion? Consider contacting newspaper staff to
gather their experiences.
- Rural broadband:
- Analyze the cost of deploying fiber in rural areas. What are the
cost components, such as planning, fiber, electronics and construction?
How does take-up affect cost and viability? What are financing
models?
- Cost of Internet access:
- Using bills gathered from (Facebook, LinkedIn, real-life) friends
and family, try to evaluate the typical cost structure of Internet and
phone service. How much variation is there for similar services? How
does this compare to the advertised rates?
- Communication networks during natural disasters:
- Using interviews with residents and public safety officials, as well
as various data sources, describe how well various communication
facilities help up during Harvey and Irma, including land mobile radio
("walkie-talkies"), cellular, landline and Internet access.
- Spectrum usage:
- Analyze what spectrum is used for, by whom and where, comparing use
for categories such as broadcast, communication and non-communication
(radar, medical, industrial) applications.
- Spectral efficiency:
- Compare the spectral efficiency of FM radio, digital over-the-air
(ATSC 3) TV, land-mobile radio and cellular systems. Consider the
encoding of information, the air interface, and how many bits of content
are delivered to users, or how much spectrum it would take to replace a
traditional service such as radio or TV with a cellular service. Note
that there is no single definition of spectral efficiency, so the
project should consider existing definitions in the literature and
justify choices.
- TV stations:
- Investigate whether one could put all TV stations on cable or
satellite, either generally or in more rural areas. How many stations
are must-carry vs. retransmission consent? What would be the costs,
potential sources of revenue and benefits?
- Cybersecurity:
- What are the principal causes of cybersecurity problems? Is there
quantitative evidence? What remedies are likely to reduce the frequency
or impact of such events? (Cite research to support your
arguments.)
- Cybersecurity:
- Consider developing a label similar to a nutrition label or
EnergyStar label that gives consumers basic information about the
cybersecurity and privacy of an Internet of Things device or an app.
You should at least informally evaluate your ideas with non-experts,
e.g., in interviews or surveys.