Easier than Excel: Social Network Analysis of DocGraph with Gephi
The DocGraph dataset was released at Strata RX 2012. The dataset is the result of FOI request to CMS by healthcare data activist Fred Trotter (co-presenter). The dataset is minimal where each row consists of just three numbers: 2 healthcare provider identifiers and a weighting factor. By combining these three numbers with other publicly available information sources novel conclusions can be made about delivery of healthcare to Medicare members. As an example of this approach see: http://tripleweeds.tumblr.com/post/42989348374/visualizing-the-docgraph-for-wyoming-medicare-providers
The DocGraph dataset consists of over 49,685,810 relationships between 940,492 different Medicare providers. Analyzing the complete dataset is too big for traditional tools but useful subsets of the larger dataset can be analyzed with Gephi. Gephi is a opensource tool to visually explore and analyze graphs. This tutorial will teach participants how to use Gephi for social network analysis on the DocGraph dataset.
Outline of the tutorial:
Part 1: DocGraph and the network data model (30% of the time)
The DocGraph dataset The raw data Helper data (NPI associated data) The graph / network data model Nodes versus edges How graph models are integral to social networking Other Healthcare graph data sets
Part 2: Using Gephi to perform analysis (70% of the time)
Basic usage of Gephi Saving and reading the GraphML format Laying out edges and nodes of a graph Navigating and exploring the graph Generating graph metrics on the network Filtering a subset of the graph Producing the final output of the graph
Janos Hajagos
Stony Brook School of Medicine
Dr. Janos G. Hajagos is the lead data analyst for a unique partnership between SUNY and the New York State Department of Health. He has a Ph.D. in Ecology and Evolutionary Biology and has published widely from risk analysis to applications of the semantic web to healthcare. He is a participant in the CTSAConnect project.
FredTrotter.com
Fred Trotter is the leading consultant and advocate for Free/Libre and Open Source (FOSS) Health Software. In recognition of his role within the Open Source Health Informatics community, Trotter was the only Open Source representative invited by the NCVHS to testify on the definition of ‘meaningful use’.
Form more information check out O’Reilly’s StrataRX Boston Site.