MilKBQA
Facts in military field tend to involve elements of time, space, quantity, status, and so on. Existing methods of representing knowledge in the form of triples fail to adequately express these facts, and also cause obstacle to knowledge storage and updating. Furthermore, question answering on these facts introduces new complexity dimension, which are complicated to be supported by existing corpus. Thus, we construct a Chinese knowledge base for military field covering entities and events centric knowledge, referred as MilKB. It consists of 965 entities and 3,017 facts. Moreover, we classify the natural questions into 26 types and construct a complex question answering dataset derived from MilKB, referred as MilKBQA. It consists of 2,829 questions, in which 600 are event-centric questions.
Resource structure
MilKB
MilKBQA
- Entity-centric Calculation Reasoning.json
- Entity-centric Logic Reasoning.json
- Entity-centric Quantity Reasoning.json
- Entity-centric Simple.json
- Event-centric Quantity Reasoning.json
- Event-centric Simple.json
- Event-centric-Probability Reasoning.json
Distribution of questions in MilKBQA
Question Type |
Count |
Entity-centric Simple |
1749 |
Entity-centric Logic Reasoning |
82 |
Entity-centric Quantity Reasoning |
302 |
Entity-centric Calculation Reasoning |
96 |
Event-centric Simple |
509 |
Event-centric Quantity Reasoning |
83 |
Event-centric-Probability Reasoning |
8 |
Total |
2829 |
Resource usage
Place the following code in the same directory as node.json and relation.json, modify the username and password to the current neo4j account. Execute the code to import MilKB into local neo4j.
As for MilKBQA, users can configure the number of each question type according to needs.
It should be noted that because neo4j does not support numerical operations with units.Questions involving numerical calculations and comparisons cannot be answered directly by executing Cypher queries. Therefore, part of the results mentioned above needs to be further processed. The answers in the corpus are manually calculated.
import json
from py2neo import Graph, Node, Relationship, NodeMatcher
def get_graph():
"""
connect the neo4j
:return: neo4j object
"""
try:
graph = Graph("http://localhost:7474",username="neo4j",password="neo4j")
print("success for neo4j connection.")
return graph
except Exception as e:
print(e)
return None
def deal_json(file_name):
"""
deal with the json
:param file_name: json file name
:return: list abound with dicts
"""
with open(file_name, 'r', encoding='utf-8') as f:
load_dict = json.load(f)
return load_dict
def trans_nodes(graph, file_name):
"""
transfer the nodes to new db
:param graph: neo4j object
:param file_name: json file name
:return: None
"""
load_dict = deal_json(file_name)
for row in load_dict:
# select by json structure
old_id = row['n']['identity']
node_type = row['n']['labels'][0]
properties = row['n']['properties']
node = Node(node_type, old_id=old_id)
for key, val in properties.items():
node[key] = val
print('creating-', node)
graph.create(node)
def trans_relations(graph, file_name):
"""
transfer the relations to the new db
:param graph: neo4j object
:param file_name: json file name
:return: None
"""
load_dict = deal_json(file_name)
for row in load_dict:
start_id = row['p']['segments'][0]['start']['identity']
print(start_id)
start_label = row['p']['segments'][0]['start']['labels'][0]
end_id = row['p']['segments'][0]['end']['identity']
end_label = row['p']['segments'][0]['end']['labels'][0]
relation_type = row['p']['segments'][0]['relationship']['type']
relation_properties = row['p']['segments'][0]['relationship']['properties']
print(relation_properties)
if relation_properties == {}:
cypher = "match (m:{0}), (n:{1}) where m.old_id={2} and n.old_id={3} create p=(m)-[:{4}]->(n) return p".format(start_label,end_label,start_id,end_id,relation_type)
else:
constrain = ''
for k, v in relation_properties.items():
constrain = constrain + "{0}:'{1}',".format(k, v)
constrain = constrain.strip(',')
cypher = "match (m:{0}), (n:{1}) where m.old_id={2} and n.old_id={3} create p=(m)-[:{4}{{{5}}}]->(n) return p".format(start_label,end_label,start_id,end_id,relation_type,constrain)
print(cypher)
graph.run(cypher)
if __name__ == '__main__':
graph = get_graph()
graph.delete_all()
trans_nodes(graph, 'node.json')
trans_relations(graph, 'relation.json')
# delete old id
graph.run('MATCH (n) REMOVE n.old_id')
链接
You can download resources from the given link MilKB,MilKBQA.