Two papers accepted in ACM CIKM 2023 (full papers)

1
Title: ELTRA: An Embedding Method based on Learning-to-Rank to Preserve Asymmetric Information in Directed Graphs
Author: Masoud Reyhani Hamedani, Jin-Su Ryu, and Sang-Wook Kim
Abstract
Double-vector embedding methods capture the asymmetric information in directed graphs first, and then preserve them in the embedding space by providing two latent vectors, i.e., source and target, per node. Although these methods are known to be superior to the single-vector ones (i.e., providing a single latent vector per node), we point out their three drawbacks as inability to preserve asymmetry on NU-paths, inability to preserve global nodes similarity, and impairing in/out-degree distributions. To address these, we first propose CRW, a novel similarity measure for graphs that considers contributions of both in-links and out-links in similarity computation, without ignoring their directions. Then, we propose ELTRA, an effective double-vector embedding method to preserve asymmetric information in directed graphs. ELTRA computes asymmetry preserving proximity scores (AP-scores) by employing CRW in which the contribution of out-links and in-links in similarity computation is upgraded and downgraded, respectively. Then, for every node 𝑢, ELTRA selects its top-𝑡 closest nodes based on AP-scores and conforms the ranks of their corresponding target vectors w.r.t 𝑢’s source vector in the embedding space to their original ranks. Our extensive experimental results with seven real-world datasets and sixteen embedding methods show that (1) CRW significantly outperforms Katz and RWR in computing nodes similarity in graphs, (2) ELTRA outperforms the existing state-of-the-art methods in graph reconstruction, link prediction, and node classification tasks.

2
Title: SAGE: A Storage-Based Approach for Scalable and Efficient Sparse Generalized Matrix-Matrix Multiplication
Author: Myung-Hwan Jang*, Yunyong Ko*, Hyuck-Moo Gwon, Ikhyeon Jo, Yongjun Park, and Sang-Wook Kim
Abstract
Sparse generalized matrix-matrix multiplication (SpGEMM) is a fundamental operation for real-world network analysis. With the increasing size of real-world networks, the single-machine-based SpGEMM approach cannot perform SpGEMM on large-scale networks, exceeding the size of main memory (i.e., not scalable). Although the distributed-system-based approach could handle largescale SpGEMM based on multiple machines, it suffers from severe inter-machine communication overhead to aggregate results of multiple machines (i.e., not efficient). To address this dilemma, in this paper, we propose a novel storage-based SpGEMM approach (SAGE) that stores given networks in storage (e.g., SSD) and loads only the necessary parts of the networks into main memory when they are required for processing via a 3-layer architecture. Furthermore, we point out three challenges that could degrade the overall performance of SAGE and propose three effective strategies to address them: (1) block-based workload allocation for balancing workloads across threads, (2) in-memory partial aggregation for reducing the amount of unnecessarily generated storage-memory I/Os, and (3) distribution-aware memory allocation for preventing unexpected buffer overflows in main memory. Via extensive evaluation, we verify the superiority of SAGE over existing SpGEMM methods in terms of scalability and efficiency.