LC-QuAD 2.0
Largescale Complex Question Answering Dataset

What?

LC-QuAD 2.0 is a Large Question Answering dataset with 30,000 pairs of question and its corresponding SPARQL query. The target knowledge base is Wikidata and DBpedia, specifically the 2018 version. Please see our paper for details about the dataset creation process and framework.

Leaderboard

Rank System Exact Match F1 Score
- TBD TBD TBD

Usage

  • Find the train and test splits at our github repo also.
  • Use Wikidata to benchmark your system on LC-QuAD 2.0.
  • Use DBpedia2018 to benchmark your system on LC-QuAD 2.0. Here's a guide on setting up your own endpoint.
  • We're in the process of creating a one-click benchmarking process. For the time being, please contact us to report your results
Every data item in the dataset consists of the following fields:

NNQT_question: "The automatically verbalized equivalent of the SPARQL Query.",
question: "Human corrected version of the verbalized question.",
paraphrased_question: "Human paraphrased version of the corrected version.",
sparql_dbpedia18: "Valid corresponding SPARQL query on DBpedia.",
sparql_wikidata: "Valid corresponding SPARQL query on Wikidata.",
uid: "Unique ID to the datapoint.",
v0.1.0 - 01-07-2019
  • [RELEASE] First version of the dataset released with 30,000 datapoints.
  • lc-quad.sda.tech published
@inproceedings{dubey2017lc2,
title={LC-QuAD 2.0: A Large Dataset for Complex Question Answering over Wikidata and DBpedia},
author={Dubey, Mohnish and Banerjee, Debayan and Abdelkawi, Abdelrahman and Lehmann, Jens},
booktitle={Proceedings of the 18th International Semantic Web Conference (ISWC)},
year={2019},
organization={Springer}
}