LC-QuAD 2.0
Largescale Complex Question Answering Dataset


LC-QuAD 2.0 is a Large Question Answering dataset with 30,000 pairs of question and its corresponding SPARQL query. The target knowledge base is Wikidata and DBpedia, specifically the 2018 version. Please see our paper for details about the dataset creation process and framework.


Rank System Exact Match F1 Score


  • Find the train and test splits at our github repo also.
  • Use Wikidata to benchmark your system on LC-QuAD 2.0.
  • Use DBpedia2018 to benchmark your system on LC-QuAD 2.0. Here's a guide on setting up your own endpoint.
  • We're in the process of creating a one-click benchmarking process. For the time being, please contact us to report your results
Every data item in the dataset consists of the following fields:

NNQT_question: "The automatically verbalized equivalent of the SPARQL Query.",
question: "Human corrected version of the verbalized question.",
paraphrased_question: "Human paraphrased version of the corrected version.",
sparql_dbpedia18: "Valid corresponding SPARQL query on DBpedia.",
sparql_wikidata: "Valid corresponding SPARQL query on Wikidata.",
uid: "Unique ID to the datapoint.",
v0.1.0 - 01-07-2019
  • [RELEASE] First version of the dataset released with 30,000 datapoints.
  • published
title={LC-QuAD 2.0: A Large Dataset for Complex Question Answering over Wikidata and DBpedia},
author={Dubey, Mohnish and Banerjee, Debayan and Abdelkawi, Abdelrahman and Lehmann, Jens},
booktitle={Proceedings of the 18th International Semantic Web Conference (ISWC)},