Prepared Database
From SONIVIS:Wiki
On this page, we provide some preprocessed databases.
Contents |
Beginning an Analysis with Wikiversity
About the Wikiversity project
Wikiversity is a project of the Wikimedia Foundation. The objective of Wikiversity is "to further the discovery and distribution of knowledge in a very natural way, by helping each other to learn.". You can find further information on the German or English websites.
If you have loaded the german Wikiversity data, the following categories are some of the more interesting ones for a WikiLink network, as they have than 50 nodes/articles:
- Vorlage:Babel-Sprache (55 nodes)
- Vorlage:Projektarbeit (60 nodes)
- Projekt:Mathematik_ist_überall/Medien (68 nodes)
- Fachbereich_Mathematik (73 nodes)
- Schulprojekt:Hallo_Rohstoff!/Miniwikipedia (73 nodes)
- Wikiversity (85 nodes)
- Kurs (109 nodes)
For interesting collaboration networks, try category "Fach:Physik" or namespace 106.
Size-Reduced version of Wikiversity
If it is you first time using SONIVIS please use this database.
As network and text analysis algorithms implemented in SONIVIS can be very time-intensive on big networks, you should use a small-scale network to begin your first steps with the tool.
We provide a size-reduced version of the precalculated wikiversity databases for this. Download:
- the size-reduced German Wikiversity or
- the size-reduced English Wikiversity.
An explanation howto use these databases you will find here.
Full Wikiversity data sets
If you are an experienced user, please use these databases.
If you have a concrete idea what to analyze in a wiki, begin to conduct your analysis ideas on a real data set.
We have prepared two complete data sets with unmodified data of the Wikiversity project:
Please note: Both databases need a very long processing time if you calculate all metrics by once. Before beginning an analysis, you should activate only the metrics that you need in the Metrics Preferences dialog. Additionally, you should limit your analyses by filtering the network by category. This can save you a lot of time.
(And by the way, there is a really helpful article that describes standard use cases when querying a MediaWiki database on MediaWiki.)
Further datasets
Enron
- Enron
- Enron March 2, 2004 Version of dataset
- A subset of about 1700 labeled email messages (4.5M) from UC Berkeley Enron Email Analysis
- A set of categories from UC Berkeley Enron Email Analysis
- Aufbereitetes MySQL Datenset der Mails
Articles
- A paper describing the Enron data was presented at the 2004 CEAS conference
- Document Classification on Enron Email Dataset]
Weblogs
Misc
- Openbeacon Project: RFID-User-Tracking at 24C3
- Data sets SONIA-Project
- StudiVZ statitics
- Sputnik Projekt 23C3
- MIT reality Project (100 mobil phone user one year)

