ShExStatements: Documentation¶
ShExStatements allows the users to generate shape expressions from
simple CSV statements and files. shexstatements
can be also be used
from the command line.
Objectives¶
- Easily generate shape expressions (ShEx) from CSV files
- Simple syntax, with 5 columns
- Node name
- Property
- Allowed values
- Cardinality (optional)
- Comments (optional)
Quick start¶
Clone the ShExStatements repository.
$ git clone https://github.com/johnsamuelwrites/ShExStatements.git
Go to ShExStatements directory.
$ cd ShExStatements
Install modules required by ShExStatements (here: installing into a virtual environment).
$ python3 -m venv .venv
$ source ./.venv/bin/activate
$ pip3 install .
Run the following command with an example CSV file. The file contains an example description of a language on Wikidata. This file uses comma as a delimiter to separate the values.
$ ./shexstatements.sh examples/language.csv
There are five columns in the CSV file. Column 1 is used for specifying the node name, 2 for specifying the property value, 3 for possible values, 4 for cardinality (+,*) and column 5 for comments. Comments start with #. Columns 1, 2, 3 are mandatory. Column 3 can be a special value like . (period to say ‘any’ value). Columns 3,4 and 5 are empty for prefixes.
- Cardinality can be any one of the following values
- * : zero or more values
- + : one or more values
- m : m number of values
- m,n : any number of values between m and n (including m and n).
CSV file can use delimiters like ;. Take for example, the following command works with a file using semi-colon as a delimiter.
$ ./shexstatements.sh examples/languagedelimsemicolon.csv --delim ";"
But sometimes, users may like to specify the header. In that case, they
can make use of -s
or --skipheader
to tell the generator to skip
the header (firsrt line of CSV).
$ ./shexstatements.sh --skipheader examples/header/languageheader.csv
In all the above cases, the shape expression generated by ShExStatements will look like
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
start = @<language>
<language> {
wdt:P31 [ wd:Q34770 ] ;# instance of a language
wdt:P1705 LITERAL ;# native name
wdt:P17 .+ ;# spoken in country
wdt:P2989 .+ ;# grammatical cases
wdt:P282 .+ ;# writing system
wdt:P1098 .+ ;# speakers
wdt:P1999 .* ;# UNESCO language status
wdt:P2341 .+ ;# indigenous to
}
Use -j
or --shexj
to generate ShEx JSON Syntax (ShExJ) instead
of default ShEx Compact syntax (ShExC).
$ ./shexstatements.sh --shexj examples/language.csv
The outpul will be similiar to:
{
"type": "Schema",
"start": "language",
"shapes": [
{
"type": "Shape",
"id": "language",
"expression": {
"type": "EachOf",
"expressions": [
{
"type": "TripleConstraint",
"predicate": "http://www.wikidata.org/prop/direct/P31",
"valueExpr": {
"type": "NodeConstraint",
"values": [
"http://www.wikidata.org/entity/Q34770"
]
}
},
{
"type": "TripleConstraint",
"predicate": "http://www.wikidata.org/prop/direct/P1705",
"valueExpr": {
"type": "NodeConstraint",
"nodeKind": "literal"
}
},
{
"type": "TripleConstraint",
"predicate": "http://www.wikidata.org/prop/direct/P17",
"min": 1,
"max": -1
},
{
"type": "TripleConstraint",
"predicate": "http://www.wikidata.org/prop/direct/P2989",
"min": 1,
"max": -1
},
{
"type": "TripleConstraint",
"predicate": "http://www.wikidata.org/prop/direct/P282",
"min": 1,
"max": -1
},
{
"type": "TripleConstraint",
"predicate": "http://www.wikidata.org/prop/direct/P1098",
"min": 1,
"max": -1
},
{
"type": "TripleConstraint",
"predicate": "http://www.wikidata.org/prop/direct/P1999",
"min": 0,
"max": -1
},
{
"type": "TripleConstraint",
"predicate": "http://www.wikidata.org/prop/direct/P2341",
"min": 1,
"max": -1
}
]
}
}
]
}
It’s also possible to use application profiles of the following form
Entity_name,Property,Property_label,Mand,Repeat,Value,Value_type,Annotation
and Shape expressions can be generated using the following form
$ ./shexstatements.sh -ap --skipheader examples/languageap.csv
There are example CSV files in the examples folder.