COMS 4419: Internet Technology, Economics and Policy (Spring 2025)

Projects and Term Papers

The categories below are approximations - all projects and term papers may include quantitative analysis, interviews with experts, literature surveys and software development. In all cases, the relevant literature should be carefully considered and cited.

Many of the projects listed as part of the Datasets and Potential Research Questions 2013 may be suitable as a project, although some of the datasets (e.g., Form 477) and issues are now outdated.

Project Plan

Each team must submit a project plan outlining their goals for their project or term paper.

Progress Report

Each project team member should submit one updated progress report as a PDF file once every two weeks, on Thursday evenings starting two weeks after the project proposal, clearly indicating the work accomplished in those two weeks as well as any obstacles. There should be some verifiable signs of progress - "80% complete" is not helpful; "wrote functions to do X (app. 200 LoC)" is better.

Project Presentation

The presentation should be targeted to last no more than 12 minutes, leaving 3 minutes for questions, for both one-person and group projects. Since talks are back-to-back, we will have to cut short talks that exceed their time allotment. For most speakers and slide styles, this translates to (at most) 7-8 slides, including the title slide. For group projects, you can either split the presentation or designate a single speaker. The former is preferred, to give everyone a chance to practice. You should consider your talk like a "pitch talk", i.e., get the listener interested in your project. What problem did you tackle or what area did you investige? Why is this interesting or important? What were you most surprising results? What approaches did not work well? Briefly, what would be next steps? Please be sure to practice your talk so that you are sure of timing, content and hand-offs.

See talk hints, and Writing technical articles also links to materials related to talks.

Project Final Report (= Research Paper)

Project reports are typically 3,000 to 5,000 words per project team member, i.e., 6,000 to 10,000 words for a two-person project.

Papers should be single-spaced, 11 or 12 pt font, and should conform to the recommendations on writing style and avoid common mistakes. You can include any extensive graphs or tables as appendices if needed. Use of the IEEE templates is strongly suggested. The structure in this guidance page should be followed, although it is somewhat less applicable for analysis and review projects, where a standard "term paper" format is called for. Please ignore the guidance about page limits and individual project reports - the guidance is from a different university.

Experiments and Implementation

Some, but not all, of the projects below require computer networking background, e.g., from a class like CSEE 4119.

Privacy for the Internet of Things or smart TVs:
What kind of data do Internet of Things devices, smart TVs or video devices (Roku, Apple TV, Google TV) exchange with the outside world? Who do they "talk" to? What countries does your data "visit"? The project requires knowledge of Wireshark or scapy.
Measuring Internet port blocking:
Most consumer Internet services block some Internet ports, sometimes for historical reasons, sometimes for security reasons and sometimes for reasons that are less obvious. Develop a tool that allows a user to test which UDP and TCP ports can be used for both incoming and outgoing packets, and whether other IP features such as IP options and IPv6 are usable.
Video quality:
How does bandwidth and packet loss affect the video quality of streaming and interactive applications such as Zoom, WebRTC applications, YouTube, Netflix, or TikTok? Consider using a network emulator to simulate various network conditions.
Finding Internet bottlenecks:
When Internet applications suffer from performance problems, it is often difficult to tell whether the problem is found in the home (Wi-Fi) network, in the first-hop access network (e.g., the shared cable network), the middle-mile network, the Internet backbone or at the server or CDN. Develop a tool for either a desktop or mobile OS to estimate where the performance problem is likely to be found.
Wi-Fi performance:
It is not uncommon that Wi-Fi is slower than LTE. Map the performance of Wi-Fi (e.g., the Columbia Wi-Fi network) vs. LTE in a geographic area, including indoors, e.g., using the FCC mobile measurement application.
Wi-Fi performance at home:
Characterize the performance of your Wi-Fi router (single or mesh) as a function of distance, time-of-day, interference (e.g., a second router) and other factors. (You do not need the highest speed service - you can use a local test server.)
Wi-Fi congestion:
Measure Wi-Fi spectrum usage in the 2.4, 5 and 6 GHz bands in various locations in New York City (or wherever you may want to travel...), both indoors and outdoors. How many stations are visible on what frequency channels? Where are publicly accessible access points, such as "Cable Wi-Fi" visible and accessible? Measure the impairment due to interference, i.e., how much lower throughput is between a mobile device and the base station compared to a "silent" radio environment.
Internet speed tests:
There are a large number of speed tests available, including from the FCC (Measuring Broadband America), Ookla (speedtest.net), M-Lab, Netflix, and Google. For the same network, they often provide very different results. Compare these approaches and analyze how they measure speed and latency. Can you explain the differences?
Location measurements:
What is the reliability of handset-provided geographic location data? Build a tool that allows users to indicate their true location based on a map and compare it to the location provided by GPS, Wi-Fi or cellular tower data. Explore the reliability systematically, both outdoors and within a building.
Indoor positioning:
Can you determine the room (apartment, office, ...) you are in by comparing Wi-Fi "fingerprints"? Can you apply machine learning techniques to the task?
Altitude information:
Many modern smartphones have built-in altimeters based on barometric pressure. Altitude (elevation) information can be very useful to dispatch first responders after a 911 call. Conduct experiments that allow you to evaluate the accuracy of this data in various buildings.
Speech-to-text and AI-based meeting summary:
Using recordings or after live participation by citizen reporters, can we auto-transcribe local government or (FCC) regulatory meetings and provide structured summaries (e.g., alignment to agendas), to augment reporting by local journalists (who may no longer be able to cover every borough or county meeting)?
Ad tracking and cookie permissions:
Many websites allow you to choose whether to accept cookies or select among categories of cookies. Determine which cookies are affected. Does the loading speed or data volume of the website change?

Data Analysis

The projects below can use a variety of data analysis techniques, from SQL to statistics packages to ML, often in combination. For all projects, the instructor has pointers to data sources.

BEAD challenge process:
The BEAD program aims to deploy high-speed internet to every location that is currently unserved or underserved. The BEAD challenge process aims to find out if the public maps are accurate. Analyze the BEAD challenge process data to find out what kind of corrections were made during the process, who participated and how different states handled the process.
Broadband deployment:
The FCC collects broadband deployment data (BDC). Analyze which technologies were deployed where and which technologies disappeared. How does the data compare to the older Form 477 data?
Broadband subsidies:
In the United States, both the Federal government and states subsidize broadband and communication services (mobile phones, mainly). Who benefits - consider rural vs. urban, richer vs. poorer areas, "red" vs. "blue" areas, using data provided by the FCC, Census data and other sources.
Broadband pricing, broadband label:
Using the FCC broadband labels, collect pricing data for both promotional and long-term pricing. Does the price change where there is more competition? What other factors (e.g., rurality) explain pricing differences? Does price change linearly with speed or is there some other correlation?
Peering:
Using routing and peering data, characterize peering relationships between carriers, content providers and CDNs. Who peers with whom? Under what conditions?
Broadband metrics:
The FCC now gathers a range of broadband performance indicators that are highly correlated, e.g., as part of the Measuring Broadband America data set. What is their relationship with each other? Which of these are independent or dependent variables?
Consumer expenditures:
Gather all available data on consumer expenditures for telephone, cellular and Internet services, comparing government data, industry analysis and corporate annual reports. (The BLS consumer expenditures survey provides some information, but may not map cleanly into current categories.) Is the data consistent? Can it be compare against other major OECD economies? How have expenditures changed?
Rural electric cooperatives:
Analyze the service territories of rural electric cooperatives. Using the FCC Form 477 data, how good (or bad) is broadband connectivity in those areas? Has it changed recently?
Network reliability:
Can you determine network outages, both "sunny day" and "rainy day", from the FCC Measuring Broadband America or ATLAS measurement infrastructure data?

Literature Review and Analysis

The projects below summarize key resources in the topic area. They may involve data, but are likely to require smaller volumes of data and less advanced statistics. They may also draw on interviews you conduct with domain experts.

BEAD policies:
The Broadband Equity, Access and Deployment project is aiming to deploy high-speed internet to all unserved and underserved locations. Analyze how different states and territories approached key facets, such as the low-cost option, the challenge process and subgrantee selection.
Mobile broadband and financial inclusion:
Explore the impact of mobile broadband accessibility on financial inclusion in the global south: This project should entail a comparative analysis of mobile vs. fixed broadband adoption amongst unbanked or underbanked populations, with emphasis on pricing, customer experience, and service quality. (Likely combines literature survey, data analysis, and interviews.) Consider countries where mobile providers also serve as "banks." (M-Pesa); J-PAL, study
Content moderation and amplification for social media:
Survey the literature on content moderation for social media - current approaches, tools, effectiveness, transparency, requirements (e.g., in Europe).
Content moderation for discussion forums:
What kind of discussion forums, ranking and content moderation do national and local news sites employ? Is there a way to measure the quality of the discussion? Consider contacting newspaper staff to gather their experiences.
Digital "papers":
Read Carpenter v. United States, United States v. Jones, and Riley v. California and maybe some lower court cases, summarize how courts are handling search warrants of digital "papers". How has treatment changed? How do these decisions reflect (or not) the differences between traditional and digital letters and other personal documents?
Transparency report:
Do a survey and data analysis of tech company's transparency reports. What do they cover? How do they differ in categories and geographic detail? Do they indicate what they do not disclose? Can you design a template similar to, say, a 10-Q disclosure?
Media:
For different TV and radio stations (e.g., in the NYC area), determine their programming mix, e.g., children's programming, local news, advertisements, syndicated programming, ...
Data portability:
For major consumer services for photos, messages, social media posts, address books, and email (e.g., various Google services, Facebook, Instagram, TikTok, Whatsapp, Yahoo Mail, Apple photos and email), can you extract your data, e.g., to move to a new service? How long does it take? How useful is the data you can extract? Can you import the data (e.g., email or photos) to another service? Are there tools to help?
Ad blocking:
Among popular websites, e.g., for news, which function well with ad blockers and which fail or explicitly refuse to provide content? How effective is ad blocking?
Rural broadband:
Analyze the cost of deploying fiber in rural areas. What are the cost components, such as planning, fiber, electronics and construction? How does take-up affect cost and viability? What are financing models?
Cost of Internet access:
Using bills gathered from (Facebook, LinkedIn, real-life) friends and family, try to evaluate the typical cost structure of Internet and phone service. How much variation is there for similar services? How does this compare to the advertised rates?
Communication networks during natural disasters:
Using interviews with residents and public safety officials, as well as various data sources, describe how well various communication facilities help up during Harvey and Irma, including land mobile radio ("walkie-talkies"), cellular, landline and Internet access.
Spectrum usage:
Analyze what spectrum is used for, by whom and where, comparing use for categories such as broadcast, communication and non-communication (radar, medical, industrial) applications.
Spectral efficiency:
Compare the spectral efficiency of FM radio, digital over-the-air (ATSC 3) TV, land-mobile radio and cellular systems. Consider the encoding of information, the air interface, and how many bits of content are delivered to users, or how much spectrum it would take to replace a traditional service such as radio or TV with a cellular service. Note that there is no single definition of spectral efficiency, so the project should consider existing definitions in the literature and justify choices.
TV stations:
Investigate whether one could put all TV stations on cable or satellite, either generally or in more rural areas. How many stations are must-carry vs. retransmission consent? What would be the costs, potential sources of revenue and benefits?
Cybersecurity:
What are the principal causes of cybersecurity problems? Is there quantitative evidence? What remedies are likely to reduce the frequency or impact of such events? (Cite research to support your arguments.)