Category Archives: Promotion

Introducing MetaPipe

update: MetaPipe was renamed as GLAMpipe


I have previously made a tool called Flickr2GWToolset. It is a simple tool for editing metadata of Flickr images and exporting this data to XML file for GLAM-Wiki Toolset. The tool was aimed mainly for GLAM collections metadata. As you can see below, the user interface of Flickr2GWToolset is rather complicated.

The lesson learned from that project was that the problem with designing this kind of tool is how to make all the functionality available to the user without scaring the user. The user interface becomes complicated and adding new features makes it even more complicated.


Flickr2GWToolset user interface

For me, it seems obvious that extending user interfaces like seen above to include more and more functionality is a dead end (or, it would require a super-talented designer). Still, even if there was such designer, one fundamental problem remains.

The remaining problem is that, after the metadata is processed, there is no any clue about what was done with the data. The history of actions is not there. If someone asks “what did you do with the metadata?”, then one can just try to remember what were the actual steps. What if I could just show what I did? Or even better, re-run my process with different dataset?

At this point programmers raise their hands and say: “Just write scripts and then you can process any number of datasets”. That is true. Still, this approach has some problems.

The first one is obvious. How to write scripts if you are not a programmer? Second problem is re-usability. When someone writes scripts for example for processing metadata for a Wikimedia Commons upload, the results are often “hack until it works” type of scripts. This means awkward hacks, hardcoded values and no documentation (at least this is how my scripts look like). This makes re-using other programmers’ scripts very difficult, and people keep re-inventing the wheel over and over again.

Third problem is more profound. This is related to the origin of data and I’ll deal with it in next chapter.

Collection metadata vs. machine data

When speaking of tools for (meta)data manipulation, it important to define what kind of data we are dealing with. I make here a distinction between machine[c] data and collection data .

A server log file (a file that can tell what web pages are viewed and when, for example) is machine data. It is produced by computer, it has consistent structure *and* content.  You can rely on that structure when you are manipulating the data. If there is a date field, then there is date in certain format with no exceptions. When processing is needed, a script is created, it is tested, and finally executed. The data is now processed. There is no need to edit this data by hand in any point. Actually, hand editing would endanger the reliability of the data.

On the contrary, collection data has “human nature”. It is produced by humans during some time period. This time period can include several changes in a way data was produced and structured. When this data is made publicly accessible, it has usually consistent structure but there might be various inconsistencies in the content structure and semantics.

For example, “author” can contain name or names, but it can also contain dates of birth or death, or it can even contain descriptions about authors. Or “description” field can contain just few words or it can include all the possible information about the target which could not be fitted in anywhere else in the data structure (and that should be placed somewhere else in upload).

This kind of data can be an algorithmic nightmare. There are special cases and special cases of special cases, and it would be almost impossible to make an algorithm that could deal every one of them. Often you can deal with 90 or 99 percent of cases. For the rest it might be  easiest to just edit data manually.

When working with this kind of data, it is important that one can make manual edits during the process which is difficult when data is processed with scripts only.

MetaPipe GLAMpipe by WMFI

GLAMPipe (we are searching better name) relies the concepts of visual programming and node-based editing on its user interface. Both are based on visual blocks that can be added, removed and re-arranged by the user. The result is both a visual presentation and an executable program. Node-based user interface is not a new idea, but for some cases it is a good idea. There is a good analysis of node-based (actually flow-based which is a little different thing) here: Below you can see an example of visual programming with Scratch. Can you find out what happens when you click the cat?

Visual programming with Scratch. Image by, CC BY-SA 3.0,

Visual programming with Scratch. Image by, CC BY-SA 3.0,

GLAMpipe combines visual programming and node-based editing very loosely. The result of using nodes in GLAMpipe is a program that processes data in a certain way.  You can think nodes as modifiers or scripts that are applied to a current dataset.

A simple MetaPipe project.

A simple GLAMpipe project.

Above you can see a screenshot of GLAMpipe showing a simple project. There is a Flickr source node (blue), which brings data to collection (black). Then there are two transform nodes (brownish). The first one extracts the year from the “datetaken” field of the data and puts the result to a field called “datetaken_year”. The second one combines “title” and extracted year with comma and saves the result to a new field called “commons_title”.

Here is one record after the transforms nodes have been executed:

ORIGINAL title: “Famous architect feeding a cat”
ORIGINAL datetaken: 1965-02-01 00:00:00
NEW datetaken_year: 1965
NEW commons_title: “Famous architect feeding a cat, 1965”

Note that the original data remains intact. This means one can re-run transform nodes any number of times. Let’s say that you want to have the year in the commons title inside brackets like this: “Famous architect feeding a cat (1965)”. You can add brackets around “datetaken_year” in transform node’s settings. Then just re-run the node and new values are written to “commons_title” field.

The nodes have their own parameters and settings. This means that all information about editing process is packed in projects node-setup. Now, if someone asks “How did you create commons title names for your data?”  I can share this setup. And even more, one can change the dataset and re-run nodes by replacing source node with a new source node with similar data structure. So if one want to process some other Flickr image album, this can be done with replacing source node which points to different album.

However, we are still fiddling with different kind of options how UI should work.


Nodes are the building blocks of data manipulation in GLAMpipe. Nodes can import data, transform data, download files, upload content or export content to a certain format.

Some examples of currently available nodes:

  • Flickr source node. It needs an Flickr API key (which is free) and the album id. When executed, it imports data to the collection.
  • File source node. Node reads data from file and imports it to the collection. Currently it accepts data in CSV or TSV formats.
  • wikitext transform node. This maps your data fields to Photograph or Map template and writes wikitext to a new field to the collection.
  • Flickr image url lookup node. This will fetch urls for different image sizes from Flickr and writes info to collection
  • Search and replace transform node. This nodes searches string, replaces it and writes result back to collection by creating a new field.

Below is a screencast of using georeferencer node. Georeferencer is a view node. View node is basically a web page, that can fetch and alter data via GLAMpipe API.

Technically nodes are json files, that include several scripts. You can find in more depth information here:

How to test?

GLAMpipe is work in progress state. Do not expect things just to work.

Installing GLAMpipe currently requires some technical skills. GLAMpipe is a server software but only way to test it now is to install it to your own computer. There are directions for installation on Linux. For other operating systems installation should be also possible, but that is not tested.


My assignment (Ari) in Wikimaps 2.0 project is to improve a map upload process. This is done by adjusting separate tool development project by WMFI called MetaPipe (working title) so that would help map handling. The development of the tool is funded by Finnish Ministry of Education and Culture.

Old maps of Jerusalem released

Kidron Monuments 1868” by Alexander von Wartensleben-Schwirsen – National Library of Israel, Eran Laor Cartographic Collection. Licensed under CC BY 3.0 via Wikimedia Commons.

The National Library of Israel has released a collection of 200 high-resolution unique maps of Jerusalem in collaboration with Wikimedia Israel. This collection of ancient maps, spanning from 1486 to 1947, contains a variety of styles and languages.

To celebrate this, Wikimedia Israel will hold an editathon / mapathon in the National Library. It will be a social event to create, update, and improve Wikipedia articles about maps, cartographers, and locations. The editathon will take place on December 15th and will include a guided tour in the rare map collection of the National Library.

Wikimedia Israel encourages the Wikimaps community to use the maps in Wikimedia initiatives and other open source projects and wishes for a wonderful Holiday Season and a Happy New Year!

Public art from streets to the net

DroneArt Helsinki! was an event organized in collaboration with Wikimedia Finland, Maptime Helsinki and AvoinGLAM, where we experimented with bringing public artworks and statues into the open through Wikimedia sites. At the same time Wikimedia Sweden’s Jan Ainali, John Andersson and André Costa were visiting us. The aim was to phtograph statues with drones, place them on a map and model in 3D.

Little seminar

We put together a great program of presenting  public art in Wikimedia. Heikki Kastemaa opened the event by describing writing about public art in Wikipedia. According to him, there is no established method for it, but often articles about public art contain some or all of the following: description of the artwork, the provenance and reception. Images of the artworks are intended to communicate information about the works.

John Andersson told about the Wiki Loves Public Art project that was initiated by Wikimedia Sweden. Thanks to the project, our colleagues in Sweden have been able to gather a public art database covering the whole country. André Costa was presenting productions that were created around the project.

The database has been turned into the map service. In another project images from Wiki Loves Monuments and Mapillary have been brought together.

Copyright of public art

Finland and Sweden have basically similar copyright legislation based on the pan-European practice. A work of art is protected with copyright 70 years from the death of the artist. Replication of the work, images or 3D models are not allowed to be distributed freely. Works placed permanently in the public space are an exception, but this exception is dealt with slight differences in different EU countries.

The exception is called the Freedom of Panorama. In Finland all artworks placed permanently in public space or in it’s vicinity are allowed to be photographed freely, but the images must not be used commercially.

In Sweden the copyright organization BUS has required compensation from Wikimedia Sweden for publishing images in the Internet, based on the claim that the database can be used free of charge and the distribution of images is in large scale compared to traditional printing of postcards. Wikimedia Sverige and BUS case is in Supreme Court. (Mikael Mildén, Konstnärer stämmer Wikimedia, Fria Tidningen 26.3.2015)

In addition, the images stored in Wikimedia must comply to the copyright legislation in the originating country and the United States. Because of this even more images fall outside Wikimedia Commons.

In the Finnish Wikipedia images are stored locally and they can be used according to Finnish copyright law based on the right to quote, in the respective article of the artwork.

Represenatives of the Wikimedia movement strive to affect the current changes in copyright legislation in Brussels, and one of the most important reforms is the unification of freedom of panorama. Local representatives are welcome for this work.

Unleash the drones!

We wanted to try out many different ways to photograph the statues to create the 3D models.

The statues is photographed from all sides with many images. A drone or a long monopog helps in reaching the heights. All corners of the statue must be explored with different shooting angles.

The weather forecast promised stormy wind and rain. There were clouds in the sky. The drone was tossed around by the wind, but finally we went happily back to our workshop with plenty of images.


As a result of working with different Structure from Motion programs, we were happy to note that the stormy clouds were not the only clouds: we managed to create a point cloud! A point cloud is a computer generated 3D model that consists of points situated in 3D space. Comparing different images taken from different angles, matching points can be found and they are mapped into 3D space quite as mystically as surveyors reveal distances.

The point cloud was further elaborated into a polygon mesh, out of which a 3D printer was able to create an object. We shall attach the image here as soon as we get it. The copyright of the work has expired. The artist is Bertel Nilsson, and the work is Eagles, made in 1913.