Instructions


0. Web Application Information


* The sample graph displayed on the Run App page is not queryable and can't be used for anything.
* Elements that don't exist in queries will be ignored.
* The program will time-out after 1 minute due to limited server resources. The source code can be downloaded and used to run custom computations without a time-limit.
* If you receive an alert saying, "No paths were found with the selected parameters.", then either the vertices you are searching for don't exist in the dataset or were not found with the selected heuristic parameters. If you believe the vertices exist in the dataset, try re-running the query with an increased depth under basic parameters.
* If you receive an alert, "Invalid information was entered, please reformat and try again.", this means that one of the input parameters was entered incorrectly.
* Whenever possible use the right-click functionality to remove edges and nodes from the graph query, this will help prevent an invalid input alert.

1. General Parameters


Dataset Selection

Datasets can only be used in one of two formats:
(1) String based: Datasets containing strings can only contain the following characters: "()_-.a-zA-A".
(2) Integer based: Datasets containing integers can only characters "0-9".

There are 4 built-in datasets to choose: (1) DBLP, (2) DBLP w/ Names, (3) Youtube and (4) LiveJournal. Datasets (1), (3) and (4) contain only integers and therefore only integer values can be entered into the search fields. Dataset (2) contains only names and therefore accepts strings in place of numbers. Datasets (1), (3) and (4) are from the Stanford SNAP collection while (2) is a modified version of the information from the DBLP site, "dblp.uni-trier". In order to see datasets (1), (3) and (4) please visit either the Stanford SNAP website and download it. To view dataset (2), please download it from the following link or see a sample version on Github.

Alternatively, a dataset can be uploaded by the user if it is less than 10MB in size. If it is an integer dataset, it must be in this form:
3 (number of unique vertices)
2 (number of edges)
0 1 (undirected connection between two vertices)
1 3 (undirected connection between two vertices)

If the dataset contains strings instead of numbers, it must be in this form:
name1 name2
name1 name3
name4 name5
...


Node Removal

Nodes (vertices) can be removed from the graph when running a query. This can allow for situations when the user wants to avoid going through a particular node.

Up to five vertices may be removed from a query. Each removed node must be in the following form whether it is a number or a string:
vertex1, vertex2, vertex3,...


Edge Removal

Edges (connections) can be removed from the graph when running a query. This can allow for situations when the user wants to simulate a missing connection between two nodes.

Up to two connections may be removed from a query. Each removed edge must be in one of the following forms whether it is a number or a string:
vertex-vertex
vertex-vertex, vertex-vertex


Node Size

This parameter determines the size of the visualzed node. This parameter will dramatically change the way the graph looks. If a small node size is chosen then the user will need to zoom in to see the node text.


Color Scheme Selection

There are 3 types of nodes:
(1) A query vertex, which is provided by the user.
(2) A path vertex, which is part of a shortest path or minimum spanning tree but not a query vertex.
(3) A vertex which neighbors a path or query vertex.

Query vertices are colored red and path vertices are colored gold.

Neighboring vertices will take on a different color depending on how close they are to a path or query vertex.

Each of the 7 color schemes represents a different way of displaying the neighboring vertices. There are 4 colors in each color scheme and each color represents a distance away from the path and query vertices. The color at the far left is the closest to the path and query vertices while the color to the far right is the farthest away from them. Each color, from leff to right, signifies an additional degree of seperation away from the path and query vertices. Any vertex that is farther than four degrees of seperation will be the same color as the fourth degree of seperation.

There are also 2 edge schemes: (1) colored or (2) gray.


Prior Computation Reuse

This feature allows the user to re-query any results found from a dataset without having to run all the prior compuations again. This allows for a significant speed-up of algorithm run time at the cost of lost information. The only graph information that will be available for analysis by the algorithms is the visualized information. In addition, selecting this parameter overrides any chosen or uploaded network in favor of using the currently visualized graph.

Note: this does not apply to the initial graph visualization provided on the website.


Right Clicking Nodes and Edges

If a node is right clicked, a sub-menu will pop up giving the options to (1) add a node to the minimum spanning tree query vertices, (2) replace the k-simple shortest path start vertex, (3) replace the k-simple shortest path end vertex and (4) add a node to the list of nodes to be removed from the query.

If an edge is right clicked, a sub-menu will be displayed allowing the user to add the edge to the list of removed edges.




2. K-Simple Shortest Paths


Basic Controls

Start Vertex/End Vertex: these vertices are the start and end points for the paths

Depth to Search: this controls the Search Space 1 parameter, a higher depth means that the program looks deeper into the network for potential connections/paths. This higher the depth, the greater chance there is for a higher path count; the downside is that a higher depth also means a slower run time. The second search space reduction and the community identification depth to search default to a value of one.

Number of Paths: this determines the number of paths to find between the start and end vertices. The program will attempt to find the specified number of paths but will graph as many as were found.


Advanced Controls

Modification of the advanced controls is only recommended if you can't find relationships using the basic controls and you have fully read the advanced control instructions.


Search Space Reduction One:

Overview: this search space reduction heuristic is run on the adjacency list before the initial single source shortest path calculation. This is done to reduce the run time of calculating the first shortest path.

Number of Vertices to Include in 1st Iteration: this parameter controls how many vertices are included in the reduced adjacency list on the first depth level. E.g. if a depth level of 2 and a value of 20 is chosen for this parameter, then at a depth of 0 the top 20 vertices will be included in the reduced adjacency list. At each subsequent depth level, the parameter below this be used to determine the number of vertices included. All selected vertices are in sorted in decreasing vertex centrality.

Number of Vertices to Include in Subsequent Iterations: this parameter controls how many vertices are included in the reduced adjacency list on evey depth level except the first. E.g. if a depth level of 2 and a value of 5 is chosen for this parameter, then at a depth of 0 the above parameter will be used to determine the number of vertices included in the reduced adjacency list. At each subsequent depth level, the parameter here will used to determine the number of vertices included, in this case, 5. All selected vertices are in sorted in decreasing vertex centrality.

Number of Vertices to Include as Critical Vertices in Next Iteration: this parameter controls how many of the vertices in the current iteration are carried into the next iteration. E.g. if a user has a start vertex of 2 and an end vertex of 10, these are the vertices that are considered critical in the first iteration. To determine which vertices are to be considered critical in the next depth level, this parameter selects the top 'x' vertices in decreasing order of vertex centrality.

Depth to Search: this value comes from the basic depth control and is used to determine how many "hops" or edge connections away from a critial vertex to search. The higher the value, the greater the chance of finding a path. However, a high value also greatly increases the program run time.

Search Space Reduction Two:

Overview: this search space reduction heuristic is run on the original adjacency list before the k-1 shortest paths are calculated. This heuristic is run a second time to increase the odds of finding short paths between the start and end vertices. The reasoning is that now that a first path has been established, those vertices along the 1st shortest path can now be included to look for new key neighboring vertices.

Number of Vertices to Include in 1st Iteration: this parameter controls how many vertices are included in the reduced adjacency list on the first depth level. E.g. if a depth level of 2 and a value of 20 is chosen for this parameter, then at a depth of 0 the top 20 vertices will be included in the reduced adjacency list. At each subsequent depth level, the parameter below this be used to determine the number of vertices included. All selected vertices are in sorted in decreasing vertex centrality.

Number of Vertices to Include in Subsequent Iterations: this parameter controls how many vertices are included in the reduced adjacency list on evey depth level except the first. E.g. if a depth level of 2 and a value of 5 is chosen for this parameter, then at a depth of 0 the above parameter will be used to determine the number of vertices included in the reduced adjacency list. At each subsequent depth level, the parameter here will used to determine the number of vertices included, in this case, 5. All selected vertices are in sorted in decreasing vertex centrality.

Number of Vertices to Include as Critical Vertices in Next Iteration: this parameter controls how many of the vertices in the current iteration are carried into the next iteration. E.g. if a user has a start vertex of 2 and an end vertex of 10, these are the vertices that are considered critical in the first iteration. To determine which vertices are to be considered critical in the next depth level, this parameter selects the top 'x' vertices in decreasing order of vertex centrality.

Depth to Search: this value is used to determine how many "hops" or edge connections away from a critial vertex to search. The higher the value, the greater the chance of finding a path. However, a high value also greatly increases the program run time.

Community Identification:

Overview: the community identification uses the search space reduction heuristic on the original adjacency list with all of the k-shortest paths vertices as critical vertices. Depending on the depth selected, this allows for either a sparse tree containing only the path connections or a full graph containing many key vertices at each depth level.

Number of Vertices to Include in 1st Iteration: this parameter controls how many vertices are included in the reduced adjacency list on the first depth level. E.g. if a depth level of 2 and a value of 20 is chosen for this parameter, then at a depth of 0 the top 20 vertices will be included in the reduced adjacency list. At each subsequent depth level, the parameter below this be used to determine the number of vertices included. All selected vertices are in sorted in decreasing vertex centrality.

Number of Vertices to Include in Subsequent Iterations: this parameter controls how many vertices are included in the reduced adjacency list on evey depth level except the first. E.g. if a depth level of 2 and a value of 5 is chosen for this parameter, then at a depth of 0 the above parameter will be used to determine the number of vertices included in the reduced adjacency list. At each subsequent depth level, the parameter here will used to determine the number of vertices included, in this case, 5. All selected vertices are in sorted in decreasing vertex centrality.

Number of Vertices to Include as Critical Vertices in Next Iteration: this parameter controls how many of the vertices in the current iteration are carried into the next iteration. E.g. if a user has a start vertex of 2 and an end vertex of 10, these are the vertices that are considered critical in the first iteration. To determine which vertices are to be considered critical in the next depth level, this parameter selects the top 'x' vertices in decreasing order of vertex centrality.

Depth to Search: this value is used to determine how many "hops" or edge connections away from a critial vertex to search. At lower values you will only see path relationships and nearby key vertices. At higher values you will see key vertices at each depth level, color coded for intuitive visualiztion.

3. Shortest Paths MST


Basic Controls

Nodes: these are the vertices to find relationships between.

Depth to Search: this controls the Search Space 1 parameter, a higher depth means that the program looks deeper into the network for potential connections/paths. This higher the depth, the greater chance there is for a higher path count; the downside is that a higher depth also means a slower run time. The second search space reduction and the community identification depth to search default to one.


Advanced Controls

Modification of the advanced controls is only recommended if you can't find relationships using the basic controls and you have fully read the advanced control instructions.


Search Space Reduction:

Overview: this search space reduction heuristic is run on the adjacency list before the Shortest Paths MST algorithm. This is done to reduce the run time of calculating the inter-node connections.

Number of Vertices to Include in 1st Iteration: this parameter controls how many vertices are included in the reduced adjacency list on the first depth level. E.g. if a depth level of 2 and a value of 20 is chosen for this parameter, then at a depth of 0 the top 20 vertices will be included in the reduced adjacency list. At each subsequent depth level, the parameter below this be used to determine the number of vertices included. All selected vertices are in sorted in decreasing vertex centrality.

Number of Vertices to Include in Subsequent Iterations: this parameter controls how many vertices are included in the reduced adjacency list on evey depth level except the first. E.g. if a depth level of 2 and a value of 5 is chosen for this parameter, then at a depth of 0 the above parameter will be used to determine the number of vertices included in the reduced adjacency list. At each subsequent depth level, the parameter here will used to determine the number of vertices included, in this case, 5. All selected vertices are in sorted in decreasing vertex centrality.

Number of Vertices to Include as Critical Vertices in Next Iteration: this parameter controls how many of the vertices in the current iteration are carried into the next iteration. E.g. if a user has a start vertex of 2 and an end vertex of 10, these are the vertices that are considered critical in the first iteration. To determine which vertices are to be considered critical in the next depth level, this parameter selects the top 'x' vertices in decreasing order of vertex centrality.

Depth to Search: this value comes from the basic depth control and is used to determine how many "hops" or edge connections away from a critial vertex to search. The higher the value, the greater the chance of finding a path. However, a high value also greatly increases the program run time.

Community Identification:

Overview: the community identification uses the search space reduction heuristic on the original adjacency list with all of the k-shortest paths vertices as critical vertices. Depending on the depth selected, this allows for either a sparse tree containing only the path connections or a full graph containing many key vertices at each depth level.

Number of Vertices to Include in 1st Iteration: this parameter controls how many vertices are included in the reduced adjacency list on the first depth level. E.g. if a depth level of 2 and a value of 20 is chosen for this parameter, then at a depth of 0 the top 20 vertices will be included in the reduced adjacency list. At each subsequent depth level, the parameter below this be used to determine the number of vertices included. All selected vertices are in sorted in decreasing vertex centrality.

Number of Vertices to Include in Subsequent Iterations: this parameter controls how many vertices are included in the reduced adjacency list on evey depth level except the first. E.g. if a depth level of 2 and a value of 5 is chosen for this parameter, then at a depth of 0 the above parameter will be used to determine the number of vertices included in the reduced adjacency list. At each subsequent depth level, the parameter here will used to determine the number of vertices included, in this case, 5. All selected vertices are in sorted in decreasing vertex centrality.

Number of Vertices to Include as Critical Vertices in Next Iteration: this parameter controls how many of the vertices in the current iteration are carried into the next iteration. E.g. if a user has a start vertex of 2 and an end vertex of 10, these are the vertices that are considered critical in the first iteration. To determine which vertices are to be considered critical in the next depth level, this parameter selects the top 'x' vertices in decreasing order of vertex centrality.

Depth to Search: this value is used to determine how many "hops" or edge connections away from a critial vertex to search. At lower values you will only see path relationships and nearby key vertices. At higher values you will see key vertices at each depth level, color coded for intuitive visualiztion.