Recently, we came across .gdf files that are a CSV like format for Graphs primarily used by GUESS. Although GDF file format is supported by Gephi, it was still missing from TinkerPop, one of the widely used graph computing framework.
Today, we are happy to release gdfpop, an open source implementation of GDF File Reader for TinkerPop 2.x under Apache License, Version 2.0. It allows you directly import .gdf files into FORMCEPT's FactorDB storage engine that is compliant to TinkerPop 2.x blueprint APIs.
gdfpop APIs
gdfpop provides a method GDFReader.inputGraph that takes in an existing com.tinkerpop.blueprints.Graph instance and an input stream to the GDF file. There are three optional parameters-
- buf: Buffer size for BatchGraph. See BatchGraph for more details.
- quote: You can specify the quote character that is being used for the values. Default is double quotes.
- eidp: Edge property to be used as an ID
The implementation handles all the missing values, datatypes, default values and quotes gracefully. Here is a sample .gdf file that can be loaded via gdfpop-
nodedef>name VARCHAR,label VARCHAR2,class INT, visible BOOLEAN default false,color VARCHAR,width FLOAT,height DOUBLE
a,'Hello "world" !',1,true,'114,116,177',10.10,20.24567
b,'Well, this is',2, ,'219,116,251',10.98,10.986123
c,'A correct 'GDF' file',,,, ,
edgedef>node1 VARCHAR,node2 VARCHAR,directed BOOLEAN,color VARCHAR, weight LONG default 100
a, b,true,' 114,116,177',
b,c ,false,'219,116,251 ',300
c, a , ,,
Example
For example, consider the graph that has 6 vertices and 6 edges with each vertex having two properties- label and age and each edge having a weight. The only change that we have done to convert it into a GDF file is that the property name has been renamed to label because name is used as node/vertex ID in GDF. See GDF File Format for all the possible properties for a vertex. The gdf file corresponding to the above graph is shown below-
nodedef>name VARCHAR,label VARCHAR,age INT,lang VARCHAR1,
1,marko,29,
2,vadas,27,
3,lop,,java
4,josh,32,
5,ripple,,java
6,peter,35,
edgedef>node1 VARCHAR,node2 VARCHAR,name VARCHAR,label VARCHAR,weight FLOAT
1,2,7,knows,0.51,
4,8,knows,1.01,
3,9,created,0.44,
5,10,created,1.04,
3,11,created,0.46,
3,12,created,0.2
Although, GDF specification does not talk about an ID for the edges but you can ask gdfpop to use a specific edge property as an edge ID using the eidp parameter.
Using gdfpop
Consider an example.gdf file with the above vertices and edges is provided as input and you wish to use all the awesomness of TinkerPop 2.x stack on it. To do so, follow these steps-
Step-1: Build gdfpop
Currently, gdfpop is not available on Maven Central, so you will have to pick the latest release or build from source using the following command-
mvn clean compile install
Once Maven builds gdfpop, it will be available within your local maven repository and good to be integrated with your existing code base using the following maven dependency-
<dependency>
<groupId>org.formcept</groupId>
<artifactId>gdfpop</artifactId>
<version>0.2.0</version>
</dependency>
Step-2: Load GDF files
Now, you can use the org.formcept.gdfpop.GDFReader functions to process and load the above example.gdf file as shown below-
// initialize
Graph graph = new TinkerGraph();
// load the gdf file
GDFReader.inputGraph(graph, new FileInputStream(new File("example.gdf")), "\"", "name");
// write it out as GraphSON
GraphSONWriter.outputGraph(graph, System.out);