• July 14th 2016

    Changes to &if Policy

    This post is in reference to recent events on and around the Euphoria &if channel; it’s not really of interest to people who don’t have that context. Some community issues have to be addressed, but I don’t want to expand the circle of anger by supplying a recap. Chances are, if you need to read this, you have context already. This is the first post I’m making on the issue, which is a statement as &if moderator.

    Going forward, within &if, I’m going to be discouraging unalloyed vitriol towards members of the broader IF community. I don’t want to delegitimize the anger that some might express towards certain figures, but I ask that if you are going to express anger, it be your own anger rather than someone else’s. A pattern has developed where person A says something to the effect of Bob has said some hurtful and stupid things, and I find that upsetting” and then persons B, C, and D choose to take this as an opportunity to dunk on Bob.

    Ultimately, regardless of my own personal feelings or whether or not Bob actually is an asshole, this has created discomfort and a less than welcoming environment. In trying to manage the various different functions that &if performs, I need to identify things (in this case, expressing anger and disgust at certain figures) that threaten to overtake the space, and curtail them.

    I don’t anticipate banning anyone over this, and this is not a request for people to police their own feelings or self-censor; rather, I want to ask people to be thoughtful of the overall tone of the room before joining in on tearing someone apart, even if that person seems to deserve it. I’m not entirely innocent in this, so this is also a necessary shift in my own behavior. Most of all, I ask that users of &if respond to venting or callouts by being supportive of the aggrieved party, not venomous towards the guilty party. Ultimately, I think that situations where someone is being publicly frustrated or upset are better served building that person up than by trying to tear their aggressor down,

    If you feel like you want a space where those expressions of anger are validated, and believe me I feel you, my DMs are open.

    Also, to reiterate: There is not, and never will be, any change in policy regarding hateful speech; I expect users to be considerate of others’ identities and humanity, and bigotry will never be tolerated in that space.

  • June 29th 2016

    Scraping DBPedia for Fun and Corpora

    One of the main challenges in procedural text generation is obtaining big enough corpora to produce surprising results. Hand-writing corpora is a good approach, but sometimes too time-consuming or unlikely to produce surprising enough results.

    Another common approach is the use of machine learning to make use of unstructured data as a corpus. Markov chains and neural networks have their uses, but they’re not for every application either.

    The third approach (Which Emily Short aligns with the principle of Beeswax) is scraping open access data. Wikipedia editors have done a lot of work structuring information about the world, and that data exists in a surprisingly machine-friendly format, assuming one knows how to coax it out.

    Writing ad-hoc web scraping scripts is a valid and useful technique, but there’s a more convenient (well, for a certain value of convenient”) alternative: SPARQL queries.

    DBPedia is a semantic web” collation of wikipedia, joining together Wikipedia’s information into a databade of machine-friendly relationships. It uses RDF as a format, which can be queried through SPARQL.

    SPARQL is a query language for RDF databases. For those of you with database experience, this is similar to the much more broadly used SQL language used to manage relational databases. For those of you without database experience (like yours truly), you can rest assured that RDF and SPARQL are totally unlike relational databases or key-value storages, so you’re on the same footing as the MongoDB nerds.

    RDF, or resource description framework, is a format for describing metadata. I realise your eyes are glazing over by now but bear with me. An RDF database, like DBPedia, is a big unordered pile of triples.

    A triple is essentially a statement in the subject-predicate-object form we’re used to from English. However, all three components can be resources, ie web URLs that represent something — in the case of DBPedia, Wikipedia pages or ontologies” that are used as predicates. The same resource can be the object in one triple and the subject in another, forming a web of interconnected statements which can be searched.

    A SPARQL query is a series of conditions, such as find me the names of British sailing ships launched after 1820, with their launch dates”. SPARQL is a language for expressing that.

    SPARQL is also clumsy, not very intuitive even for technologists from outside the database realm, and obscurely documented. So this is my attempt at wresting it out of the hands of dedicated data nerds. I’ll be going step by step until we have a list of British sailing ships launched in the 19th century, in the form of a JSON file that looks like this:

    "name": "HMS Plantagenet",
    "launched": 1801

    Making queries

    You can use dedicated software to talk to SPARQL endpoints (ie, the servers that receive and respond to queries for a database) but DBPedia has a number of web interfaces to endpoints that are very convenient, such as this one.

    First, a note about prefixes: In reality, every part of a triple is either a resource (Ie, a URL) or a literal value (String, number, or date). But typing out fully qualified URLs by hand gets tiresome fast. As such, SPARQL queries often start with a list of prefixes, shorthand for naming resources in specific domains. The DBPedia web query interface comes with a preloaded list of prefixes, and we’ll mostly be using that.

    So when I write dbo:Ship, what that really means is <http://dbpedia.org/ontology/Ship>; when using a literal URL in SPARQL, we enclose it in angle brackets. Note that these names are case-sensitive, even though the keywords in the SPARQL language themselves aren’t. So let’s start with a simple query:

    select distinct ?ship
    where { ?ship rdf:type dbo:Ship }

    This will get us a long list of every ship in Wikipedia, which unfortunately also includes things such as ship classes — so you’ll find specific U-Boats listed alongside models of U-Boats. Wikipedia’s data is often messy and noisy, and going one step at a time helps in not missing anything as you filter data.

    Let’s go over this line by line, since SPARQL is probably unfamiliar even to programmers.

    select distinct ?ship

    This first select statement tells the database what we are looking for, that is, the columns in the table we’ll get as a result. For now, we’re looking only at ships; eventually we’ll want to connect ship names to ship launch dates. This isn’t as simple as finding a list of triples; it’s essentially finding a list of paths through the database that satisfy the particular query, since the name (a literal value) and the date (another literal value) are not in fact directly connected to each other, but rather are both objects of two different predicates with the same subject, the resource for a given ship.

    ?ship is a variable; variables in SPARQL are prefixed with ?, because the W3C designed this thing therefore using a character that was already in common use as a variable sigil was out of the question.

    where { ?ship rdf:type dbo:Ship }

    The where statement contains a list of conditions that have to be fulfilled for a valid path to be found. This one simply states that we’re looking for ?ship where every possible value of ?ship relates to dbo:Ship via the rdf:type predicate.

    rdf:type is a commonly-used predicate used to mean is a”; dbo:Ship is an ontology, one of many objects created in DBPedia for the purpose of acting as categories. I’ll talk about how to figure out what resources to reference at the end of this tutorial.

    We can add another column to our table:

    select distinct ?ship ?propulsion
    where {
    ?ship rdf:type dbo:Ship .
    ?ship dbp:shipPropulsion ?propulsion

    Note the . used as a separator between statements. This won’t refine the search, but it’ll give us a table of ships with their propulsion methods. This is useful for finding out how that’s specified in the data. Looking over the entries, we find that both Sail” and Sails” are often used to denote a sailing vessel. We don’t need our corpora to be totally perfectly comprehensive (Wikipedia scraping won’t get you that anyway), so let’s just consider that our qualification.

    select distinct ?ship
    where {
    ?ship rdf:type dbo:Ship .
    ?ship dbp:shipPropulsion "Sails"@en

    "Sails"@en is a string literal. Strings in RDF come with a specified language, so just Sails wouldn’t match; we need the language tag (@en) in there. This is only half the equation, though; Sails” isn’t Sail”; curse Wikipedia editors for their inconsistency.

    Here’s how we look up both together:

    select distinct ?ship
    where {
    ?ship rdf:type dbo:Ship .
    { ?ship dbp:shipPropulsion "Sails"@en } union
    { ?ship dbp:shipPropulsion "Sail"@en }

    union is a SPARQL operator. It means a set union, of course, and it’s infix, because why would the syntax make sense. This gets us all the sailing ships, at last.

    By looking at the data, we can find the right names to use in order to further select only British ships:

    select distinct ?ship
    where {
    ?ship rdf:type dbo:Ship .
    { ?ship dbp:shipPropulsion "Sails"@en } union
    { ?ship dbp:shipPropulsion "Sail"@en } .
    ?ship dbo:country dbr:United_Kingdom_of_Great_Britain_and_Ireland

    Finally, we want to know when those ships were launched, and filter out the ones that were launched before or after the 19th century:

    select distinct ?ship ?launched
    where {
    ?ship rdf:type dbo:Ship .
    { ?ship dbp:shipPropulsion "Sails"@en } union
    { ?ship dbp:shipPropulsion "Sail"@en } .
    ?ship dbo:country dbr:United_Kingdom_of_Great_Britain_and_Ireland .
    ?ship dbo:shipLaunch ?launched .
    filter (
    ?launched > xsd:dateTime('1820-1-1') &&
    ?launched < xsd:dateTime('1900-1-1')

    Note how we can have two variables in a predicate: ?ship dbo:shipLaunch ?launched. This lets us traverse the network of triples, going arbitrarily far and deep across the relationships; it’s possible to ask elaborate questions such as Football players under 25 who play for countries that took part in WWII”, because we can draw indirect relationships like that.

    The contents of the filter statement should make sense to people with some programming familiarity; the one notable thing is that to write out a date literal, we use a function to create it from a string. Simply writing 1820-1-1” wouldn’t work.

    Now we have a table of ships (that is, web resources representing ships) and their launch dates. But we want a table of ships’ names and their launch dates, information that we can actually use. For neatness’ sake, we’ll also sort the results by date:

    select distinct ?ship ?name ?launched
    where {
    ?ship rdf:type dbo:Ship .
    { ?ship dbp:shipPropulsion "Sails"@en } union
    { ?ship dbp:shipPropulsion "Sail"@en } .
    ?ship dbo:country dbr:United_Kingdom_of_Great_Britain_and_Ireland .
    ?ship dbo:shipLaunch ?launched .
    filter (
    ?launched > xsd:dateTime('1820-1-1') &&
    ?launched < xsd:dateTime('1900-1-1')
    ) .
    ?ship dbp:shipName ?name
    order by asc(?launched)

    asc means ascending, of course. At this point, we can change the results format” setting on the web interface to JSON and download a nice machine-readable JSON file.

    The JSON includes a lot of metadata we don’t need, but it’s easy to clean that up with a simple script. You can use whatever tool you like for this; I wrote a dirty ES6 script that runs on babel-node:

    import jetpack from 'fs-jetpack'
    jetpack.read('ships.json', 'json')
    .map(entry => ({
    name: entry.name.value,
    launched: entry.launched.value.split('-')[0]

    You can see the final result in this gist.

    Finding Resources

    Here’s the problem with SPARQL: Even if you know the syntax and semantics of it, you don’t necessarily know what resources to use in queries, which is to say the right names to express the relationships you want to search for.

    So far, the best way I’ve found of figuring this out is by using the DBPedia faceted browser. With it, you can search for the DBPedia resources that are counterparts to wikipedia pages, and see how their relationships are structured and what predicates are used. For instance, when I started writing this example, I first looked at the page for the HMS Trafalgar, which is where I found out how the different relationships are structured in the data: dbo:country used to express country of origin, for instance, and that ships have rdf:type to dbo:Ship. Some experimentation is required to get useful queries, and I’m still myself figuring out how to best use this tool.

    Now go out there and make some twitter bots.

  • February 17th 2016


    Yesterday, on &if, someone asked whether we were attracted to IF because of its status as "outsider art."

    I don't really want to define outsider art, or get into the discussion over whether IF qualifies. But I responded that I felt I was attracted to IF because it's unsettled.

    And then I had to go and write a post about what, exactly, I mean by that.

  • February 15th 2016

    The Future of Raconteur

    I'm not really ready for a release of this just yet -- it'll be a while, probably at least a week -- but I wanted to give people an update of where I'm at with Raconteur. Here's the current (rough) roadmap.

  • January 27th 2016

    Improv, a javascript library for generative text

    I’m currently working on a project involving some fairly demanding procedural generation of text. While that project isn’t ready to be announced yet, one of the first core pieces of functionality I wrote for it was a text-generating library. Said library had to be powerful, flexible, and fulfil the following needs:

    • Like Tracery, it needs to randomly choose text from nested webs of corpora, recursing itself.
    • Also like Tracery, it needs some basic templating functionality.
    • Unlike Tracery, it needs to run with the backing of a world model that can guide text generation.

    Most of the ideas used to build this initial version of the tool were taken from Emily Short’s Annals of the Parrigues, which contains a long and extremely useful discussion of generative text in its epilogue.

    Since this library is a separate module, I’ve decided to open source it. Improv has been released under the MIT license and can be viewed on Github. It’s an npm module, but it’s built so that it will work in a browser environment using a module-bundling tool like webpack or browserify. Improv is currently in active development, but the latest (0.4.2) version is one I consider to be reasonably usable.

    Assuming you have node (v4 or newer), npm, and gulp installed, you can see a demo of Improv in action by doing:

    $ git clone https://github.com/sequitur/improv.git
    $ cd improv
    $ npm install
    $ gulp demo
    $ node demo_build/hms.js

    This demo produces descriptions of fictional ships, along the lines of:

    The HMS Reliable is a clipper commissioned 6 years ago.

    Using a whale oil engine, she can reach speeds upwards of 32 knots. The Reliable is one of the new generation of vessels built to fight against the Arkodian fleet in the Short War. Her crew is known to be one of the more disciplined in the Navy. She is currently serving as a colonial troop transport.

    The most obvious place to play around with Improv, at first, is Raconteur projects, since those are already friendly to including npm modules. NanoGenMo and ProcJam are some time away, but I look forward to seeing what people do with this tool in the meantime. Bug reports and pull requests are welcome.

  • January 6th 2016

    Impressions: What Fuwa Bansaku Found

    What Fuwa Bansaku Found (Chandler Groover), released today through sub-Q magazine, is a free-verse ghost story set in an abandoned shrine in Sengoku Japan.

  • December 31st 2015

    2015 in Review: Thank-yous and shout-outs

    As the year heads to a close I have been busy sending thank-you notes (well, emails). This list is in no particular order and, inevitably, incomplete; if you feel like I have missed you, I am sorry.

    • My testers, too numerous to name but incredibly important; any remaining bugs and typos are entirely my fault.
    • Tory Hoke, Devi Acharya, Kerstin Hall, and the rest of the sub-Q team: You’ve made a dramatic change to how I look at writing IF. This has been an incredible year, and sub-Q is responsible for a lot of that.
    • Carolyn van Eseltine, Aaron Reed, Neil Butters, and Jason McIntosh: That is, the people whose competitions I entered this year. People consistently underestimate how much work organising those events is, and the least I can do is thank the people who inexplicably continue to do it, expecting no reward whatsoever.
    • The good people at &if, including furkle, Brendan Patrick Hennessy, Chandler Groover, Emily Short and others, who’ve made the last couple of months a terrifying delight. There’s a million things we haven’t done; but just you wait.
    • greenie, chromakode, and intortus, the Euphoria crew, for giving me this wonderful space to do terrible things with.
    • Last but not least: Cat Manning. You know what you did (and continue to do).
  • December 30th 2015

    2015 in Review: Things to highlight

    Following on from my previous post, a look some of the IF and IF-adjacent games that came up this year which I thought were important enough to bring up.

  • December 29th 2015

    2015 in Review: My Work

    I was originally going to release this year-end rundown all in one piece, but I realised it's very long and therefore probably best split into three parts. First, the most skippable part: A look back at games and stories I released this year.

  • December 24th 2015

    The World Turned Upside Down

    The World Turned Upside Down is a tiny bit of parser fiction I wrote as a sort of thank you note/Christmas special. It’s very short and straightforward, so I’ll just direct you to the game page where you can play or download it.

    Happy holidays, everyone.