MilKBQA

Facts in military field tend to involve elements of time, space, quantity, status, and so on. Existing methods of representing knowledge in the form of triples fail to adequately express these facts, and also cause obstacle to knowledge storage and updating. Furthermore, question answering on these facts introduces new complexity dimension, which are complicated to be supported by existing corpus. Thus, we construct a Chinese knowledge base for military field covering entities and events centric knowledge, referred as MilKB. It consists of 965 entities and 3,017 facts. Moreover, we classify the natural questions into 26 types and construct a complex question answering dataset derived from MilKB, referred as MilKBQA. It consists of 2,829 questions, in which 600 are event-centric questions.

Resource structure

MilKB

node.json
relation.json

MilKBQA

Entity-centric Calculation Reasoning.json
Entity-centric Logic Reasoning.json
Entity-centric Quantity Reasoning.json
Entity-centric Simple.json
Event-centric Quantity Reasoning.json
Event-centric Simple.json
Event-centric-Probability Reasoning.json

Distribution of questions in MilKBQA

Question Type	Count
Entity-centric Simple	1749
Entity-centric Logic Reasoning	82
Entity-centric Quantity Reasoning	302
Entity-centric Calculation Reasoning	96
Event-centric Simple	509
Event-centric Quantity Reasoning	83
Event-centric-Probability Reasoning	8
Total	2829

Resource usage

Place the following code in the same directory as node.json and relation.json, modify the username and password to the current neo4j account. Execute the code to import MilKB into local neo4j.
As for MilKBQA, users can configure the number of each question type according to needs.

It should be noted that because neo4j does not support numerical operations with units.Questions involving numerical calculations and comparisons cannot be answered directly by executing Cypher queries. Therefore, part of the results mentioned above needs to be further processed. The answers in the corpus are manually calculated.

import json
from py2neo import Graph, Node, Relationship, NodeMatcher
 
def get_graph():
    """
    connect the neo4j
    :return: neo4j object
    """
    try:
        graph = Graph("http://localhost:7474",username="neo4j",password="neo4j")
        print("success for neo4j connection.")
        return graph
    except Exception as e:
        print(e)
        return None

def deal_json(file_name):
    """
    deal with the json
    :param file_name: json file name
    :return: list abound with dicts
    """
    with open(file_name, 'r', encoding='utf-8') as f:
        load_dict = json.load(f)
    return load_dict
 
 
def trans_nodes(graph, file_name):
    """
    transfer the nodes to new db
    :param graph: neo4j object
    :param file_name: json file name
    :return: None
    """
    load_dict = deal_json(file_name)
    for row in load_dict:
        # select by json structure
        old_id = row['n']['identity']
        node_type = row['n']['labels'][0]
        properties = row['n']['properties']
        node = Node(node_type, old_id=old_id)
        for key, val in properties.items():
            node[key] = val
        print('creating-', node)
        graph.create(node)
 
 
def trans_relations(graph, file_name):
    """
    transfer the relations to the new db
    :param graph: neo4j object
    :param file_name: json file name
    :return: None
    """
    load_dict = deal_json(file_name)
    for row in load_dict:
        start_id = row['p']['segments'][0]['start']['identity']
        print(start_id)
        start_label = row['p']['segments'][0]['start']['labels'][0]
        end_id = row['p']['segments'][0]['end']['identity']
        end_label = row['p']['segments'][0]['end']['labels'][0]
        relation_type = row['p']['segments'][0]['relationship']['type']
        relation_properties = row['p']['segments'][0]['relationship']['properties']
        print(relation_properties)
        if relation_properties == {}:
            cypher = "match (m:{0}), (n:{1}) where m.old_id={2} and n.old_id={3} create p=(m)-[:{4}]->(n) return p".format(start_label,end_label,start_id,end_id,relation_type)
        else:
            constrain = ''
            for k, v in relation_properties.items():
                constrain = constrain + "{0}:'{1}',".format(k, v)
            constrain = constrain.strip(',')
            cypher = "match (m:{0}), (n:{1}) where m.old_id={2} and n.old_id={3} create p=(m)-[:{4}{{{5}}}]->(n) return p".format(start_label,end_label,start_id,end_id,relation_type,constrain)
        print(cypher)
        graph.run(cypher)
        
if __name__ == '__main__':
    graph = get_graph()
    graph.delete_all()
    trans_nodes(graph, 'node.json')
    trans_relations(graph, 'relation.json')
    # delete old id
    graph.run('MATCH (n) REMOVE n.old_id')

链接

You can download resources from the given link MilKB，MilKBQA.

5.1 KiB Raw Permalink Blame History

MilKBQA

Resource structure

MilKB

MilKBQA

Distribution of questions in MilKBQA

Resource usage

链接

5.1 KiB

Raw Permalink Blame History