How Knowledge Graphs Bring Order to the HRA's Data Diversity
It takes a lot of data to construct the Human Reference Atlas (HRA).
But not all data is created equal!
HRA data comes from many different sources which may use different technologies
and
follow different protocols.
The data itself comes in many different formats, some of which may require a
particular code to read.
Some of it is old data, and some of it is new.
It may have been mixed with other data or repurposed.
Some data might be open research data, available to all.
While other data might have restrictions that limit access, use, and distribution.
For the Human Reference Atlas (HRA), we need the ability to easily find the data
we
want, utilize it for our purposes, and share it as widely as possible.
Of course, that data needs to be structured in a way that it can be readable by
machines.
Ideally, though, that data structure would also be understandable to humans.
It would not only show what data exists in the HRA but also how pieces of that
data
relate to each other.
By labeling our data and connecting our labeled nodes with
relational
links,
we put our data into context and create a framework for moving from data
to knowledge
to insight.
The type of data structure we are moving towards here is known as a "knowledge
graph," and they are a lot more common than you think.
Google was the first to introduce the term back in 2012.
But now major companies like
Facebook, Amazon, and Netflix—they all utilize knowledge graphs to represent relationships
between
people,
products, and concepts.
A knowledge graph gathers all the things that are important to a particular group
or
organization.
These things can be people, places, entities, concepts, databases,
documents—really just about anything.
Each of those data entities is assigned a node. Then, it organizes
all
those things into a network of interrelations.
In the case of the Human Reference Atlas, we are interested in things like
biological
data, research metadata, and data about the digital objects within the HRA.
Using the Resource Description Framework (RDF), each of these are
expressed as
a subject, predicate, and an object.
The predicate expresses the relationship between the entities.
This grouping is called a triple, and the relation between an anatomical structure
and
its parent organ might look like this.
Let's see how this might look for a particular digital object created for the
Human
Reference Atlas.
Here's a 3D reference organ for the left female kidney.
And here's how it appears in the knowledge graph.
The subject entries in the left column all point to the same
thing:
the HRA's 3D reference organ of the left female kidney.
The object column lists all the other data in the HRA that the reference
organ
is
connected to.
And the predicate column indicates the nature of that relationship.
A closer look at these predicates reveals relationships such as the creation
date,
A closer look at these predicates reveals relationships such as the creation date,
version number,
A closer look at these predicates reveals relationships such as the creation date,
version number, the raw data the 3D kidney was derived from, and many more.
What we see here is actually a network of nodes and edges, with our kidney
reference
organ as the central node with all its related data connected to it by labeled edges.
Of course, this is only one network. There are over 500 digital objects currently
in
the HRA, each with its own network. And each network is connected to all the others.
Utilizing a knowledge graph not only helps us structure the massive amount of
different types of data that power the HRA.
It will also allow us to link up with other information
networks to create a wide and radically open web of knowledge about the human body.
External Links
Explore HRA Data
Want to learn how the HRA puts all that data to use? Check out the apps
overview of some of the neat things the HRA can do!