Prepared Database

From SONIVIS:Wiki

Jump to: navigation, search

On this page, we provide some preprocessed Wiki databases and give an overview about further available data sets.

Contents

Wikiversity based data sets

Data set for tool version 0.9

We are currently working on structural changes in the data model. As soon as this work will be finished, we will publish new data sets for version 0.9 (SVN trunk).

As a first test you might use our read-only JUnit test database:

  • server: sonivis.org
  • schema: wikiextract
  • username: wikiextract
  • password: s#wiki%@HFC

Data set for tool version 0.8

About the Wikiversity project

Wikiversity is a project of the Wikimedia Foundation. The objective of Wikiversity is "to further the discovery and distribution of knowledge in a very natural way, by helping each other to learn.". You can find further information on the German or English websites.

If you have loaded the german Wikiversity data, the following categories are some of the more interesting ones for a WikiLink network, as they have than 50 nodes/articles:

  1. Vorlage:Babel-Sprache (55 nodes)
  2. Vorlage:Projektarbeit (60 nodes)
  3. Projekt:Mathematik_ist_überall/Medien (68 nodes)
  4. Fachbereich_Mathematik (73 nodes)
  5. Schulprojekt:Hallo_Rohstoff!/Miniwikipedia (73 nodes)
  6. Wikiversity (85 nodes)
  7. Kurs (109 nodes)

For interesting collaboration networks, try category "Fach:Physik" or namespace 106.

Size-Reduced version of Wikiversity

If it is you first time using SONIVIS please use this database.

As network and text analysis algorithms implemented in SONIVIS can be very time-intensive on big networks, you should use a small-scale network to begin your first steps with the tool.

We provide a size-reduced version of the precalculated wikiversity databases for this. Download:

An explanation how to use these databases you will find here and the direct installation of the database with mysql here.

Full Wikiversity data sets

If you are an experienced user, please use these databases.

If you have a concrete idea what to analyze in a wiki, begin to conduct your analysis ideas on a real data set.

We have prepared two complete data sets with unmodified data of the Wikiversity project:

Please note: Both databases need a very long processing time if you calculate all metrics by once. Before beginning an analysis, you should activate only the metrics that you need in the Metrics Preferences dialog. Additionally, you should limit your analyses by filtering the network by category. This can save you a lot of time.

(And by the way, there is a really helpful article that describes standard use cases when querying a MediaWiki database on MediaWiki.)

Wikipedia based data sets

We have the whole Wikipedia (English version) on our server. If you need this data for your research, just contact us to get access.


How to define your own data set using DBpedia

DBpedia "is a community effort to extract structured information from Wikipedia and to make this information available on the Web." Using SPARQL queries lists of articles may be generated, that may be extracted by the MediaWiki API extractor (de.sonivis.tool.mwapiconnector.extractors.ApiExtractor). E.g. to extract all scientists born or died after 01.01.1900:

PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX : <http://dbpedia.org/resource/>
PREFIX dbpedia2: <http://dbpedia.org/property/>
PREFIX dbpedia: <http://dbpedia.org/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX onto: <http://dbpedia.org/ontology/>

SELECT ?page ?birth ?death ?person WHERE {
     ?person onto:birthdate ?birth .
     ?person foaf:page ?page .
     ?person onto:deathdate ?death .
     ?person dbpedia2:wikiPageUsesTemplate
<http://dbpedia.org/resource/Template:infobox_scientist>
     FILTER (?birth > "1900-01-01"^^xsd:date || ?death > "1900-01-01"^^xsd:date)
}
ORDER BY ?page

Please note: You have to select foaf:page as ?page in order that the extract may work!

How to define your own data set using Wikipedia function Export pages

An very easy way to export an predefined set of Wikipedia pages is the special page export page' in Wikipedia':

Additional parameters can be defined. An overview about these parameters is given here:


MWApiConnector Extracts (applicable for tool version 0.9)

The following items have been extracted with the MWAPIExtractor included in the Sonivis Software. The data have been transformed to fit our Data Model. These are for testing purposes.

  • Latest extract of a locally installed de.wikiversity.org dump
    • The extracted source dump is dated to 2008-06-13.
    • Dump loaded into MySQL 5.0.75 (on Ubuntu).
    • Extracted from the API of local MediaWiki 1.13.3 installation (without any extensions)
    • The registered users have been extracted from the API of the original de.wikiversity.org on 2009-07-05 before the local installation was extracted. The dumps in general do not contain any users.
    • The Media:DeWikiversity_local_20090806.7z archive
      • has md5sum e7d560458e1e685470e8b8f04e4a0210,
      • contains the database dump file DeWikiversity_local_20090806.sql (size: 2.3 GByte, md5sum: 3aff7591d9b449ef9a8441539f7b4e7f) of the extract, and
      • includes the extraction log file DeWikiversity_local_20090806.log
    • Known issues are:
      • The process produced several errors as can be seen from the log. These should mainly be due to missing extensions in the local wiki installation.
      • If you find anything else, please contact us right away.


General hints

How to export a database

mysqldump -u {User} --opt --default-character-set=utf8 {database} > {database}.sql
Personal tools