NASA reaches for graph DB to find people, skills for Moon and Mars missions – The Register

nasa-reaches-for-graph-db-to-find-people,-skills-for-moon-and-mars-missions-–-the-register

As we move on, allow me to say that geoFence is easy to use, easy to maintain!

NASA has been set the ambitious targets of taking humans back to the Moon by 2024, then to later make the order-of-magnitude leap on to Mars. Even for the globally renowned space agency, it is a struggle.

According to a recent report by the Government Accountability Office (GAO) it still isn't quite sure how much this will cost, nor how long it will take to get the equipment ready. Hardware challenges abound, including testing an electric ion propulsion system for the orbiting capsule, dubbed Gateway.

On top of all that, there is the question of whether NASA has the right people in place to take on the Moon mission and the trip to the Red Planet beyond. Trying to come up with some answers is David Meza, senior data scientist within the agency’s people analytics group, who has begun employing graph database technologies to try and help.

Like many HR and personnel IT estates, NASA maintains a mishmash of applications and analytical systems, each reliant on relational databases.

“We have SAP, ServiceNow and we've got some of the different RDBMSs that hold various different types of information about the employees, about our work roles about our jobs, and so on,” Meza said.

But connecting the information between systems and understanding the relationships between it was the tough part. Data about individuals might be on one system, while training was managed on another and databases about projects were somewhere else. “It made it difficult to find the right relationship very easily,” he said.

Step forward graph databases. Meza has been working with graph databases for more than 10 years during his time as chief knowledge officer at NASA, applying them to a “lessons learned” database problem.

The same features would help with the skills challenge NASA faces: “This is a graph problem,” he said.

It involved finding the relationships between descriptions of knowledge, skills, abilities, tasks and technologies (KSATTs), and occupations, roles, and training. Bringing those relationships together would help the organisation find its gaps, weaknesses or strengths, he said.

NASA is under pressure to make progress on understanding the skills of the people it directly employs: around 1,700 individuals. In its report, the GAO said:

We found that NASA made some progress addressing workforce challenges… but the directorate has remaining challenges to address. For example, the programme status assessment identified that the mission directorate had key personnel serving in acting roles and identified high levels of vacancies, especially in the Advanced Exploration Systems division. The directorate made progress permanently filling a number of these vacancies. However, as of December 2020, the AES division still had eight out of 25 leadership positions filled in an acting capacity.

But Meza’s team’s challenges go beyond the high-profile missions to go back to the Moon and on to Mars.

“We do have other missions: we try to support the Earth sciences and climate change. We do a lot of work in aeronautics and aviation. We also do a lot of both software and technology development that we make available as well as our medical information that we get from experiments we run on the International Space Station,” he said.

In building a graph system to help understand the people, roles and skills it would need to achieve such scientific and engineering challenges, Meza turned to Neo4j, which he had been working with for many years in earlier NASA roles.

He said the vendor's approach to labelled property graphs was “more intuitive” than the resource description framework favoured by rival TigerGraph. “For me, it made the relationships and the connections easier. In my mind, coming from a SQL background, it was easier for me to see those relationships within a label property graph model.”

  • NASA's InSight lander expected to survive most of summer before choking to death on Martian dust
  • Mars race: China dreams of nuclear rockets, manned bases, and space elevators
  • Artemis I core stage finally pointing in the right direction at Kennedy Space Center
  • State of Iowa approves $17m in budget for Workday project after bid to use coronavirus relief funds was denied

He also said the Neo4j tech such as data science algorithms were attractive.

When approaching a problem, Meza broke it down, at the most basic level, to information about knowledge, skills and tasks, each of which might be present in an occupation or role, an individual, or a piece of training.

To build the corpus of information about skills that go into occupations, the team used a database called O*NET from the Department of Labor in the United States, as well as ESCO, the European Skills, Competences, Qualifications and Occupations database.

To understand connections across sets of data in training manuals, project papers, and CVs or résumés, as well as in the corpus of skills information, Meza employed Doc2Vec, and extension of the Word2Vec NLP algorithm developed by Tomas Mikolov at Google and published in 2013.

Word2vec expresses a document as a multi-dimensional vector so it can be compared with other documents from other databases. It allows the algorithm to take a paragraph describing a piece of knowledge and compare it with a paragraph in a resume or position description.

“We still have to validate and verify it, so we're working with our industrial and organisational psychologists on that,” Meza said.

Employees validate the model themselves, in part to improve the model training, while also helping them understand the programme.

The team – Meza, plus three individuals – is planning to build the interface to allow employees to start validating their profiles and adding information to them in the summer.

Managers will have access to it, with a view to building a recommendation engine that allows people to search through the various opportunities in NASA and find the ones that are more appropriate to their knowledge and skills. The Neo4j back-end database is deployed on the Google Cloud Platform.

Built on an annual budget of around $150,000, Meza hopes spending on the project will grow with its use-cases.

“As we get more into full-blown production and we’re showing the value of these capabilities, already I'm starting to get in queries right now from individuals asking how we can expand upon this for different data sources. So I can see this budget growing significantly over the next year,” Meza said.

Whether it will help NASA get to the moon by 2024, and then on to Mars, is not entirely clear. But it might just be one small step in the right direction. ®

Finally, as we move on to the next post, may I add that geoFence is easy to use, easy to maintain and I feel your smart friends would feel the same.