GDF Graph Loader for TinkerPop 2.x

Recently, we came across .gdf files that are a CSV like format for Graphs primarily used by GUESS. Although GDF file format is supported by Gephi, it was still missing from TinkerPop, one of the widely used graph computing framework.

Today, we are happy to release gdfpop, an open source implementation of GDF File Reader for TinkerPop 2.x under Apache License, Version 2.0. It allows you directly import .gdf files into FORMCEPT’s FactorDB storage engine that is compliant to TinkerPop 2.x blueprint APIs.

gdfpop APIs

gdfpop provides a method GDFReader.inputGraph that takes in an existing com.tinkerpop.blueprints.Graph instance and an input stream to the GDF file. There are three optional parameters-

  1. buf: Buffer size for BatchGraph. See BatchGraph for more details.
  2. quote: You can specify the quote character that is being used for the values. Default is double quotes.
  3. eidp: Edge property to be used as an ID

The implementation handles all the missing values, datatypes, default values and quotes gracefully. Here is a sample .gdf file that can be loaded via gdfpop-

nodedef>name VARCHAR,label VARCHAR2,class INT, visible BOOLEAN default false,color VARCHAR,width FLOAT,height DOUBLE
a,'Hello "world" !',1,true,'114,116,177',10.10,20.24567
b,'Well, this is',2, ,'219,116,251',10.98,10.986123
c,'A correct 'GDF' file',,,, ,
edgedef>node1 VARCHAR,node2 VARCHAR,directed BOOLEAN,color VARCHAR, weight LONG default 100
a, b,true,' 114,116,177',
b,c ,false,'219,116,251 ',300
c, a  , ,,

Example

For example, consider the following graph taken from default TinkerPop implementation-

gdfpop

It has 6 vertices and 6 edges with each vertex having two properties- label and age and each edge having a weight. The only change that we have done to convert it into a GDF file is that the property name has been renamed to label because name is used as node/vertex ID in GDF. See GDF File Format for all the possible properties for a vertex. The gdf file corresponding to the above graph is shown below-

nodedef>name VARCHAR,label VARCHAR,age INT,lang VARCHAR
1,marko,29,
2,vadas,27,
3,lop,,java
4,josh,32,
5,ripple,,java
6,peter,35,
edgedef>node1 VARCHAR,node2 VARCHAR,name VARCHAR,label VARCHAR,weight FLOAT
1,2,7,knows,0.5
1,4,8,knows,1.0
1,3,9,created,0.4
4,5,10,created,1.0
4,3,11,created,0.4
6,3,12,created,0.2

Although, GDF specification does not talk about an ID for the edges but you can ask gdfpop to use a specific edge property as an edge ID using the eidp parameter.

Using gdfpop

Consider an example.gdf file with the above vertices and edges is provided as input and you wish to use all the awesomness of TinkerPop 2.x stack on it. To do so, follow these steps-

Step-1: Build gdfpop

Currently, gdfpop is not available on Maven Central, so you will have to pick the latest release or build from source using the following command-

mvn clean compile install

Once Maven builds gdfpop, it will be available within your local maven repository and good to be integrated with your existing code base using the following maven dependency-

<dependency>
	<groupId>org.formcept</groupId>
	<artifactId>gdfpop</artifactId>
	<version>0.2.0</version>
</dependency>

Step-2: Load GDF files

Now, you can use the org.formcept.gdfpop.GDFReader functions to process and load the above example.gdf file as shown below-

// initialize
Graph graph = new TinkerGraph();
// load the gdf file
GDFReader.inputGraph(graph, new FileInputStream(new File("example.gdf")), "\"", "name");
// write it out as GraphSON
GraphSONWriter.outputGraph(graph, System.out);

The above code snippet will create a TinkerGraph, load it with all the vertices and edges as defined in example.gdf file and dump the loaded graph in GraphSON format that we can easily verify. For example, here is a JSON dump from the sample run of the above code-

{
    "mode": "NORMAL",
    "vertices": [{
        "name": "3",
        "label": "lop",
        "lang": "java",
        "_id": "3",
        "_type": "vertex"
    }, {
        "age": 27,
        "name": "2",
        "label": "vadas",
        "_id": "2",
        "_type": "vertex"
    }, {
        "age": 29,
        "name": "1",
        "label": "marko",
        "_id": "1",
        "_type": "vertex"
    }, {
        "age": 35,
        "name": "6",
        "label": "peter",
        "_id": "6",
        "_type": "vertex"
    }, {
        "name": "5",
        "label": "ripple",
        "lang": "java",
        "_id": "5",
        "_type": "vertex"
    }, {
        "age": 32,
        "name": "4",
        "label": "josh",
        "_id": "4",
        "_type": "vertex"
    }],
    "edges": [{
        "weight": 1.0,
        "node1": "4",
        "name": "10",
        "node2": "5",
        "_id": "10",
        "_type": "edge",
        "_outV": "4",
        "_inV": "5",
        "_label": "created"
    }, {
        "weight": 0.5,
        "node1": "1",
        "name": "7",
        "node2": "2",
        "_id": "7",
        "_type": "edge",
        "_outV": "1",
        "_inV": "2",
        "_label": "knows"
    }, {
        "weight": 0.4,
        "node1": "1",
        "name": "9",
        "node2": "3",
        "_id": "9",
        "_type": "edge",
        "_outV": "1",
        "_inV": "3",
        "_label": "created"
    }, {
        "weight": 1.0,
        "node1": "1",
        "name": "8",
        "node2": "4",
        "_id": "8",
        "_type": "edge",
        "_outV": "1",
        "_inV": "4",
        "_label": "knows"
    }, {
        "weight": 0.4,
        "node1": "4",
        "name": "11",
        "node2": "3",
        "_id": "11",
        "_type": "edge",
        "_outV": "4",
        "_inV": "3",
        "_label": "created"
    }, {
        "weight": 0.2,
        "node1": "6",
        "name": "12",
        "node2": "3",
        "_id": "12",
        "_type": "edge",
        "_outV": "6",
        "_inV": "3",
        "_label": "created"
    }]
}

You can notice that it has 6 vertices and 6 edges that were defined in the example.gdf file earlier.

Currently, gdfpop is compatible with only TinkerPop 2.x implementation. Going forward we may look into providing a plug-in for TinkerPop 3.x as well based on the interest of the community. Feel free to give us a shout at gdfpop.

References

  1. GDF: A CSV Like Format For Graphs – http://datascholars.com/post/2013/03/09/gdf/
  2. GUESS: The Graph Exploration System – http://guess.wikispot.org/The\_GUESS\_.gdf_format
  3. Gephi: The Open Graph Viz Platform – http://gephi.github.io/
  4. TinkerPop: An Open Source Graph Computing Framework – http://www.tinkerpop.com/
  5. gdfpop: Open source GDF File Reader for TinkerPop 2.x – https://github.com/formcept/gdfpop
  6. Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0.html
  7. GraphSON Reader and Writer Library: https://github.com/tinkerpop/blueprints/wiki/GraphSON-Reader-and-Writer-Library
This entry was posted in Development, FORMCEPT, Open Source, Research and tagged , , , , . Bookmark the permalink.

Comments are closed.