Edmon Begoli, PhD, HCISPP/CAP, is the director of scalable protected data facilities (SPDF) at Oak Ridge National Laboratory (ORNL). In this role, Edmon is responsible for research and design of systems for computing on protected data. He also currently serves as the Principal Investigator (PI) for the joint DOE and VA precision medicine program (MVP CHAMPION), and the program with the Centers for Medicare and Medicaid Services (CMS) focused on the application of HPC and AI technologies.

During his tenure at ORNL, Edmon also led several major national projects in healthcare and defense, and was a chief architect for Knowledge Discovery Initiative (KDI) for Centers for Medicare and Medicaid Services (CMS) — a large national program aimed at developing a platform for comprehensive and longitudinal analysis of the large, structured healthcare datasets (CMS data).

Prior to serving as the Chief Data Architect at ORNL, Edmon was a Chief Data Officer at the Joint Institute for Computational Sciences/National Institute for Computational Sciences (JICS/NICS), a NSF funded XSEDE national supercomputing facility and a joint institute between ORNL and University of Tennessee (UT). While at JICS, Edmon was one of the core members of the team that won and established the NSF-funded Southeast “Big Data” Hub, and was a principal investigator on a research project for the Intel Parallel Computing Center (PCC). Prior to working at the research institutes, Dr. Begoli held technology leadership positions at a technology startup, and the large commercial organizations.

Edmon is a member of the IEEE, ACM, and Apache Software Foundation (ASF).

Edmon holds undergraduate, graduate (University of Colorado-Boulder), and doctoral degrees (University of Tennessee) in Computer Science. He is currently a visiting researcher scholar with the UC Berkeley EECS, RISE Lab, and a joint faculty professor of computer science at the University of Tennessee, EECS department.


University of Tennessee
Computer Science
Doctor of Philosophy (Ph.D.)
University of Colorado
Computer Science
Master of Science (M.S.)
East Tennessee State University
Computer Science
Bachelor of Science (B.S.)

Staff Activities

Open Source Contribution:


Apache Calcite

Apache Calcite is a foundational software framework that provides query processing, optimization, and query language support to many popular open-source data processing systems such as Apache Hive, Apache Storm, Apache Flink, Druid, and MapD. Calcite’s architecture consists of a modular and extensible query optimizer with hundreds of built-in optimization rules, a query processor capable of processing a variety of query languages, an adapter architecture designed for extensibility, and support for heterogeneous data models and stores (relational, semi-structured, streaming, and geospatial). This flexible, embeddable, and extensible architecture is what makes Calcite an attractive choice for adoption in bigdata frameworks. It is an active project that continues to introduce support for the new types of data sources, query languages, and approaches to query processing and optimization.


2018 - Begoli, Edmon, Kris Brown, Sudarshan Srinivas, and Suzanne Tamang. "SynthNotes: A Generator Framework for High-volume, High-fidelity Synthetic Mental Health Notes." In 2018 IEEE International Conference on Big Data (Big Data), pp. 951-958. IEEE, 2018.

2016 - Begoli, Edmon, Pragneshkumar Patel, and J. Blair Christian. "Storage and Read-Optimized Data Placement Structures for High-Performance Analysis." In Optimization Challenges in Complex, Networked and Risky Systems, pp. 171-184. INFORMS, 2016.

2016 - Begoli, Edmon, Derek Kistler, and Jack Bates. "Towards a heterogeneous, polystore-like data architecture for the US Department of Veteran Affairs (VA) enterprise analytics." In 2016 IEEE International Conference on Big Data (Big Data), pp. 2550-2554. IEEE, 2016.

2016 - Begoli, Edmon, Ted Dunning, and Charlie Frasure. "Real-time discovery services over large, heterogeneous and complex healthcare datasets using schema-less, column-oriented methods." In 2016 IEEE Second International Conference on Big Data Computing Service and Applications (BigDataService), pp. 257-264. IEEE, 2016.

2018 - Begoli, Edmon, Jesús Camacho-Rodríguez, Julian Hyde, Michael J. Mior, and Daniel Lemire. "Apache calcite: A foundational framework for optimized query processing over heterogeneous data sources." In Proceedings of the 2018 International Conference on Management of Data, pp. 221-230. ACM, 2018.

2015 - Baer, Troy, Paul Peltz, Junqi Yin, and Edmon Begoli. "Integrating apache spark into pbs-based hpc environments." In Proceedings of the 2015 XSEDE Conference: Scientific Advancements Enabled by Enhanced Cyberinfrastructure, p. 34. ACM, 2015.

2012 - Begoli, Edmon, and James Horey. "Design principles for effective knowledge discovery from big data." In 2012 Joint Working IEEE/IFIP Conference on Software Architecture and European Conference on Software Architecture, pp. 215-218. IEEE, 2012.