CS 59000-GDS: New Trends in Graph Data Systems

Cr. Hours:   3

Instructor:  Walid G. Aref

Time:        MWF 9:30AM-10:20AM

Location:    LWSN 1106

 

In this course, we will study the various proposed big data systems for graph data management, including native graph systems, systems built on top of relational data systems, and hybrid approaches. We will cover centralized and cluster-based big data systems for graph handling. The course will involve a semester-long project. Each student is expected to study and present in class 2-4 papers depending on the size of the class and conduct a semester-long project. Students can partner in a group project depending on the scope of the project and the prior approval of the instructor. A list of papers will be provided. Students will pick from this list to study and later on present these papers in class.

 

This course assumes basic knowledge of under-the-hood development techniques of data management systems, mainly, relational query processing and data indexing techniques. Coverage of CS448 or CS541 or CS542 or equivalent courses would be sufficient. Also, this course assumes basic knowledge of graph operations and their algorithms, e.g., shortest path and reachability, minimum cut, vertex cover, and maxflow.

 

Course Workload:

There are no midterm or final exams for the course. Each student is expected to study and present in class 2-4 papers from the tentative list of papers below or other papers agreed upon with the instructor. Students, individually or in groups of two or three, will participate in a semester-long project. Each student/group will present a weekly (or bi-weekly) progress report of the project in a separate meeting to be scheduled with the course instructor. Each student/group is expected to give a final project presentation in class. A list of tentative course projects is provided below. Students can propose a project of their own but pre-approval of the instructor is required before adopting a project for the course. Successful projects are expected to result in a conference publication (although having a conference paper submission is not required for passing and succeeding in the course).

 

Tentative List of Papers:

1. Andrew V. Goldberg, Haim Kaplan, and Renato F. Werneck. 2007. Better landmarks within reach. In Proceedings of the 6th international conference on Experimental algorithms (WEA'07), Camil Demetrescu (Ed.). (To be presented By: )

 

2. Daniel Delling and Dorothea Wagner. 2007. Landmark-based routing in dynamic graphs. In Proceedings of the 6th international conference on Experimental algorithms (WEA'07), Camil Demetrescu (Ed.). (To be presented By: Guo Li)

 

3. Robert Kallman, Hideaki Kimura, Jonathan Natkins, Andrew Pavlo, Alexander Rasin, Stanley Zdonik, Evan P. C. Jones, Samuel Madden, Michael Stonebraker, Yang Zhang, John Hugg, and Daniel J. Abadi. 2008. H-store: a high-performance, distributed main memory transaction processing system. Proc. VLDB Endow. 1, 2 (August 2008), 1496-1499. (To be presented By: Abdullah Almamun)

 

4. Hanan Samet, Jagan Sankaranarayanan, and Houman Alborzi. 2008. Scalable network distance browsing in spatial databases. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data (SIGMOD '08). (To be presented By: Abdullah Almamun)

 

5. Robert Geisberger, Peter Sanders, Dominik Schultes, and Daniel Delling. 2008. Contraction hierarchies: faster and simpler hierarchical routing in road networks. In Proceedings of the 7th international conference on Experimental algorithms (WEA'08), Catherine C. McGeoch (Ed.). (To be presented By: )

 

6. ttai Abraham, Amos Fiat, Andrew V. Goldberg, and Renato F. Werneck. 2010. Highway dimension, shortest paths, and provably efficient algorithms. In Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete algorithms (SODA '10). (To be presented By: )

 

7. Grzegorz Malewicz, Matthew H. Austern, Aart J.C Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: a system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data (SIGMOD'10). (To be presented By: )

 

8. Wenfei Fan, Jianzhong Li, Shuai Ma, Nan Tang, and Yinghui Wu. 2011. Adding regular expressions to graph reachability and pattern queries. In Proceedings of the 2011 IEEE 27th International Conference on Data Engineering (ICDE '11). (To be presented By: Xilun Wu)

 

9. Jun Gao, Ruoming Jin, Jiashuai Zhou, Jeffrey Xu Yu, Xiao Jiang, and Tengjiao Wang. 2011. Relational approach for shortest path discovery over large graphs. Proc. VLDB Endow. 5, 4 (December 2011), 358-369. (To be presented By: )

 

10.         Jeppe Rishede Thomsen, Man Lung Yiu, and Christian S. Jensen. 2012. Effective caching of shortest paths for location-based services. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (SIGMOD '12). (To be presented By: )

 

11.         Sherif Sakr, Sameh Elnikety, and Yuxiong He. 2012. G-SPARQL: a hybrid engine for querying large attributed graphs. In Proceedings of the 21st ACM international conference on Information and knowledge management (CIKM '12). (To be presented By: Lu Xing)

 

12.         Ahmad Ghazal, Dawit Seid, Alain Crolotte, and Mohammed Al-Kateb. 2012. Adaptive optimizations of recursive queries in teradata. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (SIGMOD '12). (To be presented By: )

 

13.         Semih Salihoglu and Jennifer Widom. 2013. GPS: a graph processing system. In Proceedings of the 25th International Conference on Scientific and Statistical Database Management (SSDBM), Alex Szalay, Tamas Budavari, Magdalena Balazinska, Alexandra Meliou, and Ahmet Sacan (Eds.). ACM, New York, NY, USA,  Article 22 , 12 pages. (To be presented By: )

 

14.         Cristian Diaconu, Craig Freedman, Erik Ismert, Per-Ake Larson, Pravin Mittal, Ryan Stonecipher, Nitin Verma, and Mike Zwilling. 2013. Hekaton: SQL server's memory-optimized OLTP engine. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD '13). (To be presented By: , Abdullah Almamun)

 

15.         David Simmen, Karl Schnaitter, Jeff Davis, Yingjie He, Sangeet Lohariwala, Ajay Mysore, Vinayak Shenoi, Mingfeng Tan, and Yu Xiao. 2014. Large-scale graph analytics in Aster 6: bringing context to big data discovery. Proc. VLDB Endow. 7, 13 (August 2014), 1405-1416. (To be presented By: )

 

16.         Andy Diwen Zhu, Wenqing Lin, Sibo Wang, and Xiaokui Xiao. 2014. Reachability queries on large dynamic graphs: a total order approach. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (SIGMOD '14). (To be presented By: Spencer Pearson)

 

17.         Joseph E. Gonzalez, Reynold S. Xin, Ankur Dave, Daniel Crankshaw, Michael J. Franklin, and Ion Stoica. 2014. GraphX: graph processing in a distributed dataflow framework. In Proceedings of the 11th USENIX conference on Operating Systems Design and Implementation (OSDI'14). (To be presented By: )

 

18.         Minyang Han and Khuzaima Daudjee. 2015. Giraph unchained: barrierless asynchronous parallel execution in pregel-like graph processing systems. Proc. VLDB Endow. 8, 9 (May 2015), 950-961. (To be presented By: )

 

19.         Avery Ching, Sergey Edunov, Maja Kabiljo, Dionysios Logothetis, and Sambavi Muthukrishnan. 2015. One trillion edges: graph processing at Facebook-scale. Proc. VLDB Endow. 8, 12 (August 2015), 1804-1815. (To be presented By: Guo Li)

 

20.         Wen Sun, Achille Fokoue, Kavitha Srinivas, Anastasios Kementsietsidis, Gang Hu, and Guotong Xie. 2015. SQLGraph: An Efficient Relational-Based Property Graph Store. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD '15). (To be presented By: )

 

21.         Alekh Jindal, Samuel Madden, Malu Castellanos, and Meichun Hsu. 2015. Graph analytics using vertica relational database. In Proceedings of the 2015 IEEE International Conference on Big Data (Big Data) (BIG DATA '15). (To be presented By: )

 

22.         J. Fan, A. G. S. Raj, and J. M. Patel, The case against specialized graph analytics engines," in Online Proceedings of the 2015 Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, January 4-7, 2015. (To be presented By: Spencer Pearson)

 

23.         Marcus Paradies, Wolfgang Lehner, and Christof Bornhövd. 2015. GRAPHITE: an extensible graph traversal framework for relational database management systems. In Proceedings of the 27th International Conference on Scientific and Statistical Database Management (SSDBM '15), Amarnath Gupta and Susan Rathbun (Eds.). (To be presented By: )

 

24.         Mohamed S. Hassan, Walid G. Aref, and Ahmed M. Aly. 2016. Graph Indexing for Shortest-Path Finding over Dynamic Sub-Graphs. In Proceedings of the 2016 International Conference on Management of Data (SIGMOD '16). (To be presented By: Walid Aref)

 

25.         Christopher R. Aberger, Susan Tu, Kunle Olukotun, and Christopher . 2016. EmptyHeaded: A Relational Engine for Graph Processing. In Proceedings of the 2016 International Conference on Management of Data (SIGMOD '16). (To be presented By: Xilun Wu)

 

26.         Ankur Dave, Alekh Jindal, Li Erran Li, Reynold Xin, Joseph Gonzalez, and Matei Zaharia. 2016. GraphFrames: an integrated API for mixing graph and relational queries. In Proceedings of the Fourth International Workshop on Graph Data Management Experiences and Systems (GRADES '16). (To be presented By: )

 

27.         Konstantinos Xirogiannopoulos and Amol Deshpande. 2017. Extracting and Analyzing Hidden Graphs from Relational Databases. In Proceedings of the 2017 ACM International Conference on Management of Data (SIGMOD '17). (To be presented By: Spencer Pearson)

 

28.         Anil Pacaci, Alice Zhou, Jimmy Lin, and M. Tamer Özsu. 2017. Do We Need Specialized Graph Databases?: Benchmarking Real-Time Social Networking Applications. In Proceedings of the Fifth International Workshop on Graph Data-management Experiences & Systems (GRADES'17). (To be presented By: Guo Li)

 

29.         Renzo Angles, Marcelo Arenas, Pablo Barcelo, Peter Boncz, George Fletcher, Claudio Gutierrez, Tobias Lindaaker, Marcus Paradies, Stefan Plantikow, Juan Sequeda, Oskar van Rest, and Hannes Voigt. 2018. G-CORE: A Core for Future Graph Query Languages. In Proceedings of the 2018 International Conference on Management of Data (SIGMOD '18). (To be presented By: Xilun Wu)

 

30.         M. S. Hassan, T. Kuznetsova, H. C. Jeong, W. G. Aref, and M. Sadoghi, Extending in-memory relational database engines with native graph support, EDBT 2018. (To be presented By: Walid Aref)

 

31.          Udrea, O., Pugliese, A., & Subrahmanian, V.S, GRIN: A Graph Based RDF Index. AAAI, 2007.

 

32.         Kai Zeng, Jiacheng Yang, Haixun Wang, Bin Shao, and Zhongyuan Wang. 2013. A distributed graph engine for web scale RDF data. Proc. VLDB Endow. 6, 4 (February 2013)

Source Code: https://www.microsoft.com/en-us/research/project/trinity/

 

33.         Lei Zou, M. Tamer Özsu,Lei Chen, Xuchuan Shen, Ruizhe Huang, Dongyan Zhao, gStore: A Graph-based SPARQL Query Engine, VLDB Journal , 23(4): 565-590, 2014.

Source Code: https://github.com/Caesar11/gStore

 

34.         Giannis Agathangelos1, Georgia Troullinou1, Haridimos Kondylakis1, Kostas Stefanidis2, Dimitris Plexousakis, RDF Query Answering Using Apache Spark: Review and Assessment. 2018 IEEE 34th International Conference on Data Engineering Workshops (ICDEW), Paris, 2018, pp. 54-59.

 

35.         Dong Wen, Lu Qin, Ying Zhang, Lijun Chang, and Xuemin Lin, Efficient Structural Graph Clustering: An Index-Based Approach, VLDB 2018.

 

36.         Siddhartha Sahu, Amine Mhedhbi, Semih Salihoglu, Jimmy Lin, M. Tamer Özsu, The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing, VLDB 2017.

 

37.         Ammar, Khaled, and Tamer Ozsu. Experimental Analysis of Distributed Graph Systems, VLDB 2018.

 

Tentative Graph-Related Projects:

1. Cost model for traversing a graph using either

§  a graph representation of adjacency lists, or

§  self joins of the edges table (consider hash-joins, nested-loop joins, merger-sort joins)

 

2. Distributed vertex-centric approach for computing

§  Shortest paths

§  Page rank

§  Reachability

 

3. Efficient partitioning of a road network and efficient evaluation of shortest paths

 

4. Evaluating different graph partitioning approaches over a cluster

§  Can lead to a survey paper

§  Can consider generic graphs and graphs for specific domains (e.g., road networks)

 

5. Efficient processing of shortest paths over dynamic graphs

§  i.e., having precomputations to boost the query-time performance while considering online graph-updates that might invalidate the precomputations (do not assume a labeled graph)

 

6. Support advanced predicates on GRFusion’s paths

§  Will use GRFusion codebase

§  Implement filtering predicates on the edges and the vertexes of paths

·      E.g., P.Edges[1..5].PropertyName > Value Or P.Vertexes[2..*].Property = Value

 

7. Support graph partitioning and distributed query evaluation in GRFusion

§  The vertexes table and the edges table can be horizontally partitioned based on user-preference (the way VoltDB users define table partitioning)

§  The graph view (i.e., topology) can be replicated

§  Distributed query evaluation should guarantee correctness and consider low-latency evaluation (and/or other optimization criteria like network communication)

 

8. Adding a Create Graph Index statement in GRFusion to create EDP index on edge-labeled graphs to support efficient evaluation of constrained shortest path queries

§  EDP: Mohamed S. Hassan, Walid G. Aref, and Ahmed M. Aly. 2016. Graph Indexing for Shortest-Path Finding over Dynamic Sub-Graphs. In Proceedings of the 2016 International Conference on Management of Data (SIGMOD '16).

 

Class Schedule:

8/20/2018 : Walid Aref         – Discussion of Syllabus

8/22/2018 : Walid Aref         – Graph Data Systems – An Introduction

8/24/2018 : Walid Aref         – Graph Data Systems – Declarative Querying

8/27/2018 : Walid Aref         – Graph Data Systems – The Relational Core and the Graph Core Approaches

8/29/2018 : Walid Aref         GRFusion (Paper 30)          (Slides)

8/31/2018 : Walid Aref         GRFusion (Paper 30) (2)

9/5/2018  : Xilun Wu           EmptyHeaded (Paper 25)       (Slides)

9/7/2018  : Lu Xing      - GRIN (Paper 31)              (Slides)

9/10/2018 : Spencer Pearson    - Reachability queries on large dynamic graphs (Paper 16) (Slides)

9/12/2018 : Xilun Wu           – G-Core (Paper 29)            (Slides)

9/12/2018 : Walid Aref         - Project Discussion (Extra Class – LWSN 2149 2:30-3:30)

9/14/2018 : Walid Aref         - Indexing Graphs

9/17/2018 : Walid Aref         - Indexing Dynamic Labeled Graphs (Paper 24) (Slides)

9/19/2018 : Walid Aref        - Project Discussions

9/21/2018 : Lu Xing            - Trinity (Paper 32)      (Slides)

9/24/2018 : Spencer Pearson    - Grail (Paper 22)

9/26/2018 : In-class Project Discussions

9/28/2018 : Abdullah Almamun   - H-Store (Paper 3) (Slides)

10/1/2018 : Spencer Pearson    - Extracting and Analyzing Hidden Graphs (Paper 27)

10/3/2018 : In-class Project Discussions

10/5/2018 : Abdullah Almamun   - Hekaton (Paper 24)

10/8/2018 : October Break (No Class)

10/10/2018: In-class Project Discussions

10/12/2018: Xilun Wu           - Adding regular expressions to graph reachability and pattern queries (Paper 8) (Slides)

10/15/2018: Abdullah Almamun   - Hekaton (Paper 24) Cont’d

10/17/2018: In-class Project Discussions

10/19/2018: Spencer Pearson    - Extracting and Analyzing Hidden Graphs (Paper 27)

10/22/2018: Xilun Wu           - Adding regular expressions to graph reachability and pattern queries (Paper 8) Cont’d (Slides)

10/24/2018: In-class Project Discussions

10/26/2018: Lu Xing            - G-SPARQL (Paper 11)

10/29/2018: Abdullah Almamun   - GPS (Paper 13)

10/31/2018: In-class Project Discussions

11/2/2018 : Abdullah Almamun   - Experimental Analysis (Paper 37)

11/5/2018 : Professor Saeed Salem        - Structural Graph Clustering (Paper 35)

11/7/2018 : Project Preparation (No Class)

11/9/2018 : Project Preparation (No Class)

11/12/2018: Xulin Wu           - (Paper 36)

11/14/2018: In-class Project Discussions

11/16/2018: Lu Xing            - GraphFrames (Paper 26)

11/19/2018: Spencer Pearson    - Pregel (Paper 7)

11/21/2018: Thanksgiving Break  

11/23/2018: Thanksgiving Break

11/26/2018: Professor Saeed Salem

11/28/2018: In-class Project Discussions

11/30/2018: Project Final Presentations

12/3/2018 : Professor Semih Salihoglu

12/5/2018 : Project Final Presentations

12/7/2018 : Project Final Presentations