Saturday, May 27, 2006

Proposal for a Guided Book Reading Web Application

Introduction


When I came across an online chapter of a book, in which footnotes are added to words, with a translation of these words in the margin, I got the following idea.

Wouldn't it be nice, if there was a web application that takes a text from, say, Project Gutenberg or Wikisource, and then creates such a margin automatically? For this one needs to know for each word how difficult it is or how early it should be learned. This could be edited manually. One could then also specify the difficulty, by setting a threshold parameter called minimal difficulty. A next step would be to include sayings and the like, but this is of course much more difficult and not really necessary.

Before I want to start implementing it, I would like to have some feedback from other people.

The Texts


First of all there are several questions we need to ask ourselves concerning the text.

  • What should the format of the text be? It is hard to start from an unstructured text and many texts are in HTML, therefore it might be a good idea to start from HTML/XHTML.

  • Examples of interesting free texts can be found at:
    1. Project Gutenberg: contains many documents in HTML, ASCII and Plucker (probably the biggest and therefore the most interesting). Also Project Gutenberg can be downloaded as a whole, which gives us the possibility not to use too much of their bandwidth.

    2. Wikisource: contains quite some documents in many languages, but the text is rather unstructured.

Rated Word List


There are different approaches to make a rated word list, and here I shall discuss the one that I think is the best.
  • Rather than using a word list based on frequency, it is better to make the system for a rated word list by hand. A list of frequencies is not precisely what one wants. What one really wants is a rating that tells you how early you should learn the word. Because a list of frequencies is static, it does not leave space for improvement. When one rates every word by hand the list will gradually improve. Moreover different people can rate, and therefore a combined averaged list will represent a general opinion. Furthermore it is not that much work to get the most important words rated, especially not when one does it while reading.

  • My idea is the following: We make a scale from 1 to 10, in which 1 corresponds to the words one should learn first and 10 to the most unused words. For each number we then make a list of sample words that can be used as a reference, for instance in the English language:
    1. the, a, an, it, ...

    2. before, under, after, because, ...

    3. etc.

    There is also the possibility of putting all articles in category 1, all prepropositions category 2 etcetera, thereby simplifying deciding for some words in which category they belong. Of course I chose the number 10 rather arbitrary here, anyone a better suggestion?

  • There might be an advantage in keeping user-accounts with user-rated lists. A general rated list can then be formed from these user-rated lists. Furthermore each user can keep track of a list of words that might have a very rating but that the user just happened to know. If translations of these words just keep showing up, then the user can remove this redundant information by adding them to his known-words list.

  • Sometimes a word might not be in the dictionary. Then we can show the word in red and give the user the opportunity to either add a translation of the word to the dictionary (and give it a rating), or add the word to the known-words list.

The Dictionary


This section has contains some thoughts about what kind of dictionary we can use and where we can find it.
  • It would save some work in making the rated list if would find a way to deal with the morphology of words, that is the different forms of verbs, the adjective-adverb by adding or removing 'ly' etc. The key thing is here to identify a group of words that, due to some abstract grammatical relation, necessarily must have the same rating. The best thing would be a dictionary that has this information, and helps in this way identifying words with a different form.

  • We can use several dictionaries at the same time, since we'll digest them into a word list with translations (and morphology). For each dictionary we can write a separate harvesting program.

  • There are several things we are looking for in the dictionary:
    1. It should be possible to digest the right information from it. That is, word translations and possibly morphology.

    2. It should be free, allowing us to use it.

    3. It would be nice if it supported many languages.

    4. It would be nice if it was relatively complete.

  • Possible dictionaries are:
    1. Wiktionary

    2. Loco

    3. Dictionaries on Gutenberg

The Screen Layout


In this section I want to discuss the layout of the screen. This is important because a reader will spend a lot of time looking at it.
  • We need the following elements on the screen:
    1. Text block: a block with the foreign text that we want too read.

    2. Translation block: a block containing the translations of the difficult words.

    3. threshold field: a field in which the user can specify the desired threshold.

    4. Dictionary/language button: a button where the reader can specify the language of the text.

  • We need the following elements in a pop-up edit window.
    1. Sample words block: a block containing sample words to guide the user while choosing a category.

    2. A rating field: a field in which the user rating of the word can be specified, together with the common rating.

    3. Translation field: a field that contains the possible translations of the word.

    4. Isknown field: a field with a boolean value that states if the word is known or unknown to the user.

  • Should we go for a dynamic or a static layout? That is, should the user decide where to place the blocks? I don't think this is necessary, because there is a small and fixed number of blocks.

  • The text block should be rather narrow for easy reading.

  • If the text in the text block is chopped up into pages that fit on the screen, than it is easier to read, and will also make sure that the text and the translations won't start to walk out of line. On the other hand it is much easier to make a layout that does not chop the text up in pages. Therefore I propose to start with the latter, and change to the former when everything works.

  • The words with rating higher than the threshold should be marked with a 'footnote,' a number in super script. This won't be too distracting and links the words with their translations.

  • We have several different types of words occurring in the text. This is a proposal for distinguishing between these words (see the picture):
    1. Unrated words that appear in the dictionary (gray).

    2. Unrated words that don't appear in the dictionary (gray, underlined).

    3. Rated words whose rating is lower than the threshold (regular).

    4. Rated words whose rating is higher than the threshold (footnote).



  • The block with sample words in the category can be placed in a pop-up, thereby only showing them when it is necessary.

  • The following picture shows a proposal for a screen layout.


General Remarks


There are some general remarks that I would like to make.
  • For the accessibility I want it to be a web application. Does anyone have an idea what programming tools would be suitable? Do I hear AJAX?
  • It might be rather easy to make printable PDF-versions, by exporting to a TeX-file that adds the translations as footnotes at the bottom of the page. This of course misses the dynamical options of rating words and changing the threshold on the spot, but might serve as a prototype while 'getting things work.'

  • We could make a showcase of books that have been read thoroughly before, and therefore have no unrated and unknown words and maybe an audio file.

  • Gutenberg also contains many human and computer read audio books. We could, for such books, add a button which allows one to read and listen at the same time.

  • It might be easier to start with an English language and dictionary.

  • If this idea would get very popular, people might submit their own texts. These could, in some cases, be sent to either Project Gutenberg or Wikisource. More importantly, translations of words in the dictionary could be sent back to the dictionaries we use.

A Road Map


In this section I want to propose a road map that will lead to a fruitful product in the end, but also gives us something working in the beginning.
  1. A version in the English language that exports to TeX.

  2. A web application that doesn't permit changing the dictionary, changing the threshold, adding a rating, doesn't chop up the text into pages

  3. A web application that allows for changing the dictionary, the threshold, the rating, chops up the text into pages, adds a button to play an optional accompanying audio file from Project Gutenberg.

Questions


Several questions to get some input.
  1. Which dictionaries are suitable to use?

  2. Where can we find suitable online free texts?

  3. Does anyone have a different idea of the screen layout?

  4. What is the language in which the application should be written?

  5. Is there anything else we need to ask ourselves?

Labels: ,

Friday, May 26, 2006

Abel Prize

The Swedish mathematician Lennart Carleson received the 2006 Abel Prize from Queen Sonja at an award ceremony in the University Aula in Oslo 23 May. The Norwegian Minister of Education and Research Oeystein Djupedal also attended the event.
One can read more at the website of the Abel Prize.


The Abel Prize is a famous prize in mathematics, in some sense the runner-up of the Fields Medal. On Tuesday I was at the Abel Prize ceremony. It was rather formal and I was afraid that the organization wouldn't let me in because I was completely soaked by the rain, but they were very nice to me. On Wednesday I went to the Abel Lectures, an interesting series of one-hour lectures about several topics connected to the research of Professor Carleson. Because the University of Oslo just happened to be at strike (nice timing), the schedule had to be changed somewhat to make sure not everybody would be removed of the building by the guards... I think the organization solved it very well by speeding up a little bit, thereby making sure everybody would be inside once the guards would come...

Labels:

Sunday, May 21, 2006

The Sound of Windmills

In my first year, in 2001, I did a research project together with Joost Massolt, Derek Land and Herman Kloosterman. The goal was to measure in case of little turbulence how, near the ground, the wind speed depends on the height, how much sound windmills make at night and to establish a connection between these. This project was invented by Frits van den Berg, and recently he did his PhD dissertation in this area. Links and more details can be found at Joost's blog post.

Labels:

17th of May

Past Wednesday I experienced the national day of Norway, the 17th of May. On this day almost all Norwegians get their national costume out of the closet and go to the center of their local city. There they wave with Norwegian flags at the king and the queen. Those who don't believe me (which is very understandable, I personally didn't believe it before I saw it) can take a look at Annett's picture page. Annett wrote more about this.



In the afternoon we went, together with Klaus and Ingvild, to Kanutte and Tor Helge. There we grilled sausages and ate them with bread and lumpe, a Norwegian pancake (as I see it) made of potatoes. In the evening we played two games of Settlers. It was a great day.

Labels:

Sunday, May 14, 2006

Maze of Galious Remake

When I was seven year old my parents bought an MSX. One of the games I used to play was Maze of Galious, Konami's sequel to Knightmare. Now I found out that some Spanish guy made a remake, and I'm hooked once again!

For the lazy Debian users typing apt-get install mazeofgalious as root suffices, but there is also a port for windows and a port for the mac. You have to download the data files separately though.

If you did not play Maze of Galious, you can find several other remakes at Classical Retro Games. I wish they made a remake of The Goonies...

Labels: , , ,

Friday, May 12, 2006

Back in The Netherlands for Some Days

There and back again. Wednesday night I returned from a short but busy trip to Groningen, where I stayed at my parents's place. Besides of being perhaps rather windy, the weather has been invigorating, giving me the possibility to bike back and forth to the university almost every day. At least my face and my arms are now not as white as they were ten days ago, and it even seems that my endurance is gradually increasing from a minimum in which it resided for several months. This was due to a persevering flu and probably also due to a high level of stress in all the things I was doing: finishing my master thesis in mathematics, doing the course evaluations, arranging all the documents for my PhD application, arranging my trip to Japan, moving to Oslo, learning Norwegian, etc. And of course the fear that one of these things wouldn't work out. Basically I am just very greedy.

When you are applying for a position you need to obtain many "official documents," that is, documents with a signature. Often things you have done in the past are not even taken into consideration if you don't attach something that makes it official. So one should figure out what documents one needs, and then try to get a hold of the people for the signatures. These people were very nice by the way, but I just don't really like to make so many requests...

Most important of all, I finished my master thesis. Last week on Wednesday I had the discussion with my supervisors, and on Thursday I subscribed to the diploma ceremony, which will take place the 12th of June, 16:30 at the Academiegebouw, Senaatskamer in Groningen. I am so glad I finally finished my thesis. It is written in the dirtiest TeX-code I have written so far. It doesn't even compile, because of more than twenty errors for which I did not have the time/patience to fix. Anyway, stubbornly ignoring all the error messages in pdflatex gave me the following documents: a pdf-file suitable for printing and a pdf-file suitable for reading on the screen. A preprint of a corresponding article is sent to Indagationes Mathematicae.

It was really nice to see my family and so many of my friends again. Twice I stayed over at Laurens's place so that I didn't need to bike back to my parents late at night. Together with Laurens, Edwin and Ramon we barbecued on Friday the 5th, the Dutch Liberation Day. That was really cozy. Monday I ate diner with Maarten and Johan, and Tuesday we ate diner and drank a beer with Wicher, whom I haven't seen for almost a year because he studied in Italy. Although it was really nice to see my family and friends again, I missed Annett a lot. It is good to be back home. In Oslo, that is. :)