INF5070 - H?st 2004

Assignment

TCP Friendliness

There are mainly two transport layer protocols in use in the Internet today, TCP and UDP. Of these, TCP has built-in congestion control, while UDP has not. This implies that TCP reduces it sending speed when it has to compete with other connections for the bandwidth on the bottleneck link, no matter what the applications wants, while UDP will send packets as quickly as the application requests. TCP's behaviour of TCP is called TCP-fairness. It means that two TCP connection between the exact same hosts will on average achieve the identical throughput.

An increasing portion of multimedia traffic leads to an increase in the use of other transport protocols, which usually means UDP. Without rules, these multimedia applications can consume so much network bandwidth that TCP connections are unfairly penalized. Therefore, the idea of TCP friendliness was introduced. A data flow between two machines is considered TCP friendly if a TCP connection between those two machines would achieve the same throughput on average.

Several TCP-friendly mechanisms have been designed that can be implemented by an application (or a new transport protocol, for that matter). A while back, Widmer, Denda and Mauve wrote a survey of these protocols. You find the article here in IEEE's digital library.

Assignment:

Implement the algorithms of TFRC, pgmcc, TFRP and Rainbow. This should be done in user space on top of UDP. You will have to read papers that are referenced in the text to do this in an appropriate way.
Use three computers (sender, network emulator, receiver) to test the transmission of a very large file from sender to receiver under several network conditions (different latencies, different packet loss rates).
If that step has been completed successfully, replace random packet loss with the emulation of cross traffic. A possible traffic generator it 'tg', which we can supply. You can also search for another one.

Student group: Goran Karabeg, Mariam Pervez

New TCP implementations

TCP is a rather old transport protocol, but it's congestion control mechanism is the reference in the Internet today. This is called TCP's fairness. TCP itself implements congestion control using the so-called AIMD approach, that is implemented in such a way that the sending rate of a connection is reduced by half if at least one packet got lost during one round-trip time of that connection. This leads to very unstable sending rates, especially when distances are long and sending rates are high. New TCP implementations and TCP competitors try to avoid this instability while staying fair.

HSTCP (HighSpeed TCP) for Large Congestion Windows behaves like regular TCP in case of high or medium packet loss, but if there is hardly any packets loss, it will not reduce its rate as substantially as regular TCP when a loss occurs. A paper here and an RFC here and a web page with links to the actual implementation here .
Scalable TCP changes the behaviour of TCP in case of packet loss or in case of the absense of packet loss. An overview of Scalable TCP and links to the actual implementation are here
FAST TCP uses both queueing delay and packet loss as indication of congestion, while regular TCP uses only packet loss. Papers on the topic, as well as links to the actual implementation are here .
XCP (eXplicit Control Protocol) requires router support. Packets carry a congestion header that is used for communication among routers end end-systems. Katabi et al. wrote a paper about XCP that you find in ACM's digital library here . XCP is mentioned for your interest only, as it can't be compared in an INF5070 assignment.

Assignment:

Download HSTCP, Scalable TCP and FAST TCP.
Created kernels that support these TCP variations. It is rather likely that you will need one kernel per variation since they modify code in the same places of the kernel.
Use three computers (sender, network emulator, receiver) to test the transmission of very large files from sender to receiver under several network conditions (different latencies, different packet loss rates).
If that step has been completed successfully, replace random packet loss with the emulation of cross traffic. A possible traffic generator it 'tg', which we can supply. You can also search for another one.
The additional paper for presentation is the one on XCP that is referred to above.

Student group: Bj?rn Olav Ruud, Ivar Stein Rasmussen, Espen Nilsen

Network emulators

Quantitative investigations in distributed systems can be performed in several manners, such as analysis, simulation, emulation, or field trial. In case of emulation, large parts of a system are actually implemented, but some parts of the whole are simulated in real time. For testing of high-layer protocols, network emulation is used rather often.

The probably best know tools for network emulation are Dummynet and NistNet. Recently, the network simulator ns-2 has been extended to support passing on packets from externally connected hosts, thereby giving it emulation capability as well.

Dummynet is described in a paper that is available from the ACM digital library here . Dummynet is included in FreeBSD since version 3.4. A short overview of using it in FreeBSD is given here . A single diskette image can be downloaded from here .
NistNet is described in a paper that is available from the ACM digital library here . NistNet runs on Linux and is available from here .
The Network simulator ns-2 is available from here . A note concerning its emulation capabilities can be found here . These capabilities have been developed for FreeBSD, and even though Linux code exists, it doesn't really seem to work.

Assignment:

Use three computers (sender, network emulator, receiver) to test the various traffic emulators.
Perform several experiments with the 3 emulators.
Find out how they handle packet loss when you set random loss in the parameters.
Find out how they handle network delay when you specify it (NistNET is said to have higher delays than you actually specify).
Find out which network modeling features they have.
Synchronize the clocks of sender and receiver computers before and after each experiment. An experiment is invalid if the clock drift has become too high.
The additional paper is "EmuNET: a real-time network emulator", found here

Student group: Jarle S?berg, Martin N. Nielsen, ?yvind Stensby

Comparison of Linux schedulers

In Linux 2.6, schedulers in various places of the OS are exchangable. These are CPU scheduler, disk scheduler, and network scheduler. Each has a default behaviour that can be changed if desirable.

Each of these schedulers has effects only if the system is loaded with a mixed workload. To compare each of the schedulers, it is therefore necessary to define an appropriate set of workload mixes first, and measure the system scalability with respect to each mix afterwards.

Filesystem benchmarking

The performance of file systems has a large influence on the scalability of on-demand multimedia applications. These applications tend to deliver considerable amounts of data. Compared to High Performance Computing applications, the amount of data may not be extremely high, but multimedia servers have to serve more than one client at a time, and data has to be delivered to every one of them with timing restrictions.

It is therefore important that file systems for multimedia servers manage to handle large files, to be scalable with respect to the number of concurrent access to a single file and to several files, and to deliver them within timing bounds. The latter will usually be implemented through admission control, but both OS or application can implement that.

Two years ago, the two file systems XFS and JFS for Linux 2.4 were the clear winners of a comparison of Linux file systems when it came to large files. Today, ReiserFS seems to have caught up, according to this webpage .

Determine appropriate benchmarks for choosing a file system for multimedia servers. Other operating systems should also be in on the competition.

Assignment (general part):

Use one computer for testing. The supplied computer has 2 Seagate X15 disks in an external cabinet. If has also an IDE disk for the operating system, so the SCSI disks can be used for the benchmarks alone.
Use Linux 2.6 kernels with the appropriate externsions. Consider modern Linux filesystems, including ReiserFS 4, Ext 2, Ext 3, JFS and XFS.
When testing, use the same physical space on the disks for the various tests even though that means frequent reformatting.
Run all tests with freshly formatted disks, with mild fragmentation, and with heavy fragmentation. Scripts or tools used for creating the dirt on the disks should be shared among the two groups.
Don't bother with networking experiments. Keep only statistics of the data that was read, and write junk to disk, either by copying in a loop from RAM disk or by generating data randomly.

Assignment A:

This test aims at the flat throughput of the file system implementation. This is an important factor for classical on-demand streaming media systems.
Test reading and writing of different file sizes, including files that exceed 4GB size.
Investigate the top speed for reading and writing of single files (no competition).
Investigate concurrent reading from several large files, and concurrent writing to several large files.
Test with and without a logical volume manager (either LVM2 or EVMS).
The additional paper is about the Zebra filesystem, performing similar operations to journaling. The paper is found here

Assignment B:

This test aims at the throughput that can be achieved with many small files that are used together in an application. Look also at file system space waste when many small files are used or when many directories are used.
Test reading and writing many small files to and from the disk concurrently. Make scripts/test programs that do handle file reads and writes in repeating patterns.
Try both the situation where all files are placed in the same,large directory, and the situation where files are placed in deep directory trees for structure.
Some of the file systems are highly configurable (journaling, synchronous reading and writing, buffer sizes, "notail", "ordered", "writeback")
The additional paper is "The Design and Implementation of a Log-Structured File System", found here

Student Group 1: Erling Ringen Elvsrud, Espen Ramton, Petter Mosebekk

Student Group 2: Alex Bai, H?kon Stensland, Frank Petter Larsen

Helix vs. Darwin

Helix is the open source version of the RealNetwork's streaming server. It can be found here

Darwin is the open source version of Apple's streaming server. It can be found through this or this link.

Both servers are capable of supporting streaming via RTSP and RTP. There are interesting speed differences in the communication between various servers and players, and servers and players from the same vendor do not necessarily achieve the fastest communication. Here, we would like to find out where in the code the servers spend their time. Which of the servers uses less resources and would scale better than the other?

Assignment A:

This test aims at a performance comparison of the servers themselves, under good network conditions.
Insert probes into the servers' code to determine the time thatthey in the several phases of streaming (connection negotiation, lookup, setup of the data path, sending out data packets, ...)
Test several clients and encodings, but include standard-basedprotocols (that use RTSP and RTP) and content (such as ISO MPEG-4 as opposed to Microsoft MPEG-4).
Examine the effect of concurrent clients.
The additional paper is "Overview of fine granularity scalability in MPEG-4 video standard", found here

Assignment B:

This test aims at the adaptability of the streaming servers to varying network conditions (or rather at examining whether the versions that are made available at open source offer scalability at all).
Try various clients, encoding formats and file formats.
Examine whether and how the servers adapt their choice of delivered ontent and protocol used to the network quality that clients report at first contact. Is there any additional checking of the quality of the connection?
Check whether and how the servers adapt to changes of the connection quality while streams are being delivered.
Examine also the effects of very long latency, strongly varying latency and packet loss.

Student group A: Andreas Viik, Kim Larsen, Cuong Huu Truong

Student group B: Wladimir Palant, Vidar Opseth

Aggregating game flows

Communication in networked computer games tends to be bursty. One way to deal with burstiness is overprovisioning: allocate enough network resources to accomodate the peak rate. However, overprovisioning is inefficient and may be not be an option in some cases. Busse et al. propose a method to aggregate multiple game flows by adapting a model from queueing theory, and evaluate this method with a simulation using traffic data from the game GPL Arcade Volleyball. The idea is that the aggregate flow is less bursty than an individual game flow. Reservation can then be performed for the aggregate. The paper presents an admission control algorithm which determines, given the resources available to the aggregate flow, whether an additional individual flow can be accepted.

Assignment:

Implement the algorithm presented in the paper
Obtain the traffic data from the authors and reproduce their experimental results
Time permitting, repeat the experiment using different data

Student group: Chris Majewski

Network monitoring methods

Network monitoring is a rather important task in distribution systems that can choose more than one best-effort path through a network from a sender to a receiver. It is useful for both life broadcast and for on-demand applications. Usually availability of several best-efforts paths between sender and receivers is achieved by installing proxy servers at several locations and combining them into and overlay network. Alternatively, it can be achieved by using a peer-to-peer approach. When this is done, the sender can send directly to the receiver, or it can send to a proxy, which forwards ("reflects") to yet another proxy, or the receiver.

For multicast applications, the sense of this can be explained easily because IP multicast doesn't really work. It is slightly more difficult to explain for unicast communication. Since both proxy servers and peer-to-peer hosts are unlikely to on the direct path between sender and receiver, it is not immediately clear why this should be relevant in unicast communication. However, routes are usually chosen based on link delay, with AS policy coming into play as well. Distribution systems have to maintain a certain bandwidth as well, so searching for a path from server to client that provides the required bandwidth is a task that can be addressed by an overlay network. And, distribution system deliver to more than one receiver at a time. Therefore, several streams from the same server can compete for the same network resources if they take partially the same route. By routing through an overlay, this issue can be alleviated as well.

As a precondition for choosing such routes, it is necessary to measure the quality of the path between a pair of nodes that are in most cases not directly to each other.

The packet train technique was investigated in several papers. Pathrate and BProbe are used for capacity estimation.
Pathload , PathChirp , PTR use packet trains for available bandwidth estimation.
Netperf and TTCP measure the throughput that TCP would achieve.

Assignment:

Look at the various measurement techniques mentioned above. Make one UDP implementation that can run the same tests as the capacity estimation as well as the available bandwidth estimation techniques.
Run tests over a network emulator. What do the various algorithms report when you have artificial packet loss? What do they report when you have cross traffic from a traffic generator?
Run tests between various hosts in the Internet (e.g. IFI to Simula, IFI to modem at home, ...)

Student group: Sharath Babu Musunoori

Security & streaming

Assignment:

Find recent various ciphers/cyphers: 1)block cyphers, e.g. Rijndael (or AES), Twofish, Blowfish, RC6, Serpent, and 2) stream cyphers, e.g. RC4, SEAL, PIKE, Helix
Make loopback tests with and without packet loss. Which cipher can recover from packet loss? How much data is lost when one bit, byte or packet is lost?
Compare the raw performance of the stream cyphers.
Compare the performance when several cyphers run concurrently (and that CPU cache flush occurs). Avoid swapping.

Student group: Thomas Kvalv?g

Publisert 3. sep. 2004 10:48 - Sist endret 28. sep. 2004 15:48