Task 1 - Java RMI

This task aims at basic understanding of distributed objects and remote procedure invocation (RPC).

Modify a simple provided application, which works with a graph data structure, to make use of remote objects. Measure and analyze how choices in working with distributed objects can affect the performance of the application.

Due to update to the task, the standard deadline is extended by one week (23th March)

Preparation

Download the example and local implementation from: https://d3s.mff.cuni.cz/teaching/nswi080/labs/

The task requires understanding of the following:

Definition of a remotely accessible interface
(interface java.rmi.Remote, exception java.rmi.RemoteException).
Implementation of a remotely accessible object
(class java.rmi.server.UnicastRemoteObject, inheriting from this class, method exportObject)
Connecting the client and server using the RMI registry
(class java.rmi.Naming, the rmiregistry application)
Creating charts with data from CSV files in a Jupyter notebook

Part 1 - Explore the provided code

The task is based on computing distances between pairs of nodes in a graph, simulating work with dynamic data structures.

The provided Java application consists of these parts:

Definition of a graph data structure.
Implementation of an algorithm to measure distance between nodes.
A benchmark application, which:
- Generates a random graph.
- Runs multiple searches between random nodes and measures the time each search takes.

Graph data structure

The node objects are instances of the NodeImpl class, and implement the interface Node:

public interface Node {
    Set<Node> getNeighbors();
    void addNeighbor(Node neighbor);
}

The addNeighbor(Node) method adds the argument node to the set of neighbors of the receiver node object. This method is used when generating a graph. The getNeighbors() method returns a set of all neighbors of the receiver node and is used by the distance computation algorithm.

Graph data structure

The actual distance computation is done through a Searcher interface:

public interface Searcher {
    public static final int DISTANCE_INFINITE = -1;
    public int getDistance(Node from, Node to);
}

This interface is implemented by the SearcherImpl class. The getDistance(Node, Node) method computes and returns the distance between nodes from and to. It uses a simple breadth-first-search algorithm to measure the distance between two nodes in the graph. If to is not reachable from from, it returns DISTANCE_INFINITE.

There is a variant of this method called getDistanceTransitive, which uses a modified algorithm described later.

Benchmark

The main method is in the Main class. First, it generates a graph with a fixed number of nodes and randomly generated edges. Then, it runs a benchmark, selecting random pairs of nodes in the graph, measuring the distance between them and showing the time.

Your tasks are

Convert the program into an RMI client-server application, implement several variants (local measurement, remote searcher, remote nodes, both remote, transitive algorithm) and measure their performance according to the following steps.

Notes:

The program should allow doing all measurements without needing to recompile.
When comparing different approaches on randomly generated graphs, use the same graphs. You can measure multiple variants within one run, but if you need to run your application multiple times to get all measurements, then use a fixed random seed.

1. Local measurement

Explore the provided implementation of the task that works locally.

Measure the speed of execution on several randomly generated graphs of different sizes (this is just the first of 5 variants).

Use the code provided in the Jupyter notebook to create a chart, which shows how the time of the query depends on the distance between the nodes and on the density of the graph.

UPDATED: Sizes of the graph

To see the differences between the five variants, it is best to try increasing the number of nodes and look separately at dense and sparse graphs. The following parameters are the recommended graph sizes. You can choose different parameters, as long as the differences between the variants are apparent in the results.

For sparse graphs, generate graphs with number of nodes ranging from 100 to 1000 in steps of 50, with number of edges being three times the number of nodes. For each size, run at least 10 searches.
For dense graphs, generate graphs with number of nodes ranging from 100 to 700 in steps of 50, with number of edges being a hundred times the number of nodes. For each size, run 10 searches.

Notes:

The provided implementation formats the output for viewing in the terminal. Change the output format to separate fields by commas instead of spaces to be able to read it in the Jupyter notebook.

2. Remote Searcher

Create a server that provides a remotely accessible object with the Searcher interface. Update the Searcher interface so that it can be used with Java RMI. Extend the provided benchmark implementation to measure searching using the remote searcher object through the Searcher remote interface. The server object with the Searcher interface should accept node objects from the client. The nodes implement Node interface, and must not be remotely accessible.

Measure the speed of the implemented variants and show how the times depend on the distance and density (sparse/dense).

Question: How does the server Searcher object access the Node objects and the set of their neighbors? What parts of RMI are involved in this process?

Notes:

The application will now consist of a client and a server. The client will look up a reference to the remote Searcher at startup using java.rmi.Naming.lookup(path).
Keep the code of both the server and the client in the same directory (no need for projects, packages). We want to avoid setting permissions for codebase etc.
Make sure the client application does not export any objects
- The client should terminate normally after doing the task. Do not use System.exit(0).
Use launcher scripts from the Example.
Run the rmiregistry application in background.
- If port is in use, choose a different port number (> 1024)
  - Edit path in calls to [re]bind() and lookup()
  - localhost becomes localhost:1234.
Modify the Searcher interface (see Example), using the java.rmi.Remote interface and the exception class java.rmi.RemoteException.
The remotely accessible object (see ExampleImpl) must be exported. There are 2 ways:
- Derive from java.rmi.server.UnicastRemoteObject.
- Call UnicastRemoteObject.exportObject(obj) manually.
You can use the same class to implement the local searcher on the client and the remote one on the server. But make sure that the client does not export any local objects.
Keep the possibility to measure searches using the local searcher. You can simply use both Searcher implementations after each other on the same pair of nodes in the graph to measure both variants. Add a call to remote Searcher.getDistance() with local Node objects to the searchBenchmark() method.
Just print an additional row or add a column to results in searchBenchmark().
You may encounter a StackOverflowError when using a large graph. In that case, you can increase the stack size limit using the -Xss option.

3. Remote Nodes

Update the server to provide remotely accessible objects with the Node interface that would be created upon client request. Update the provided implementation to allow computation of distance in the graph using a local Searcher working with server Node objects along the existing functionality.

To create and return Node instances for client requests, define and implement a new NodeFactory interface with method createNode().

Measure the speed again and show how the times depend on the distance and density (sparse/dense).

Question: How does the local Searcher object access the server Node objects? What exactly does the NodeFactory return to the client from createNode? What parts of RMI are involved in this process?

Notes:

Modify the interface Node to support RMI (like Searcher in the previous step).
NodeFactory is designed similarly to Searcher – it has an interface with RMI, an implementing class, create and call Naming.bind inside the existing server.
Client gets the reference to the NodeFactory using lookup, then it creates the remote Node objects together with the local graph.
The remote graph is just another array Node[] on the client, so it is easy to use both the local and remote graph.
Do not forget that it is necessary to measure the same graphs (local and remote ones) to get a relevant comparison. When generating the edges, add the same edges to both graphs.
Make sure you do not break how the previous variants work.
Do not create a standalone server, we want just one server for the next variant.

4. Remote Nodes and Searcher

Add another variant: compute the distance using the remote Searcher object on the server, to which you pass (from the client) the remote Node objects from the server graph.

Measure the speed again and show how the times depend on the distance and density (sparse/dense).

Question: How does the server Searcher access the server Node objects (on the same server)? What parts of RMI are involved in this process?

Notes:

Everything is ready, just add this variant to searchBenchmark() and compare the speed

5. Passing by Value vs. Passing by Reference

Results of the previous measurements in variants 2 and 3 help to distinguish when it is faster to pass dynamic data structures by value and when it is faster to pass them by reference. The previous variants in this assignemnt demonstrate this on extreme all-or-nothing cases when either everything is passed by value or everything is passed by reference. But often a combination of both is the right choice.

In the provided implementation of the Searcher interface, there is another method for computing the distance, getTransitiveDistance(int,Node,Node). This method in each step retrieves not only direct neighbors of a node, but a whole set of neighbors that are at most as far from the node as specified by the first argument.

This algorithm uses the getTransitiveNeighbors(int) method of the Node interface that returns all neighbors that are close enough -- i.e. that are accessible by at most n edges where n is an argument to that method.

Question: In which one of the four variants (both local, remote searcher, remote nodes, both remote) does this parameter have a significant effect on network traffic? Question: How does the parameter affect the amount of data passed through the network during execution of the alogrithm? Compare with the previous variants.

Measure this new variant - choose the best local/remote configuration and a reasonable value of n, which you expect to perform differently from the previous variants.

6. Impact of the Network

So far, client and server were running on the same machine, with the overhead of RMI communication, but no network latency.

Compare the speed of the five variants when both client and server are running on the same machine. Measure everything in one run to ease comparison. Plot the results into a chart with five data series corresponding to the five variants. You can make separate charts sparse and dense graphs.

Explain the cause of the differences in the times, based on the role of RMI in each variant.

Do the same for a situation when client and server are on different physical computers connected by a network. Test in a higher latency environment, e.g. between your computer and a computer in the school lab.

Compare the results of the two situations.

Explain the cause of the differences in the times, based on the role of RMI in each variant.

Notes:

Change paths in [re]bind() and lookup(), to the remote machine name instead of localhost. Ideally, use program argument to allow specifying the host name when starting the client.
Run rmiregistry and Server in SSH session on the remote machine.
Run the client locally.
Beware of CLASSPATH. Make sure the server has access to the required class files. You can simply compile the sources on the server.

Instructions

Submit by e-mail.
Zip file containing:
- A working implementation of the four variants
  - Include all required source code
  - Keep the implementation simple
  - Do not leave "commented-out" sections in the code without an explanation.
  - It shall be easy to start and shall work on the computer in the school labs
  - Use modified versions of the run-server, etc. scripts (no need for packages, project files, maven or ant scripts)
- Documentation
  - Answer all the questions from the assignment
  - Charts with measurement results
    - Explanation of the results
If something is unclear, ask on the mailing list.