Proposal for a Guided Book Reading Web Application
When I came across an online chapter of a book, in which footnotes are added to words, with a translation of these words in the margin, I got the following idea.
Wouldn't it be nice, if there was a web application that takes a text from, say, Project Gutenberg or Wikisource, and then creates such a margin automatically? For this one needs to know for each word how difficult it is or how early it should be learned. This could be edited manually. One could then also specify the difficulty, by setting a threshold parameter called minimal difficulty. A next step would be to include sayings and the like, but this is of course much more difficult and not really necessary.
Before I want to start implementing it, I would like to have some feedback from other people.
First of all there are several questions we need to ask ourselves concerning the text.
- What should the format of the text be? It is hard to start from an unstructured text and many texts are in HTML, therefore it might be a good idea to start from HTML/XHTML.
- Examples of interesting free texts can be found at:
- Project Gutenberg: contains many documents in HTML, ASCII and Plucker (probably the biggest and therefore the most interesting). Also Project Gutenberg can be downloaded as a whole, which gives us the possibility not to use too much of their bandwidth.
- Wikisource: contains quite some documents in many languages, but the text is rather unstructured.
Rated Word List
There are different approaches to make a rated word list, and here I shall discuss the one that I think is the best.
- Rather than using a word list based on frequency, it is better to make the system for a rated word list by hand. A list of frequencies is not precisely what one wants. What one really wants is a rating that tells you how early you should learn the word. Because a list of frequencies is static, it does not leave space for improvement. When one rates every word by hand the list will gradually improve. Moreover different people can rate, and therefore a combined averaged list will represent a general opinion. Furthermore it is not that much work to get the most important words rated, especially not when one does it while reading.
- My idea is the following: We make a scale from 1 to 10, in which 1 corresponds to the words one should learn first and 10 to the most unused words. For each number we then make a list of sample words that can be used as a reference, for instance in the English language:
- the, a, an, it, ...
- before, under, after, because, ...
There is also the possibility of putting all articles in category 1, all prepropositions category 2 etcetera, thereby simplifying deciding for some words in which category they belong. Of course I chose the number 10 rather arbitrary here, anyone a better suggestion?
- There might be an advantage in keeping user-accounts with user-rated lists. A general rated list can then be formed from these user-rated lists. Furthermore each user can keep track of a list of words that might have a very rating but that the user just happened to know. If translations of these words just keep showing up, then the user can remove this redundant information by adding them to his known-words list.
- Sometimes a word might not be in the dictionary. Then we can show the word in red and give the user the opportunity to either add a translation of the word to the dictionary (and give it a rating), or add the word to the known-words list.
This section has contains some thoughts about what kind of dictionary we can use and where we can find it.
- It would save some work in making the rated list if would find a way to deal with the morphology of words, that is the different forms of verbs, the adjective-adverb by adding or removing 'ly' etc. The key thing is here to identify a group of words that, due to some abstract grammatical relation, necessarily must have the same rating. The best thing would be a dictionary that has this information, and helps in this way identifying words with a different form.
- We can use several dictionaries at the same time, since we'll digest them into a word list with translations (and morphology). For each dictionary we can write a separate harvesting program.
- There are several things we are looking for in the dictionary:
- It should be possible to digest the right information from it. That is, word translations and possibly morphology.
- It should be free, allowing us to use it.
- It would be nice if it supported many languages.
- It would be nice if it was relatively complete.
- Possible dictionaries are:
- Dictionaries on Gutenberg
The Screen Layout
In this section I want to discuss the layout of the screen. This is important because a reader will spend a lot of time looking at it.
- We need the following elements on the screen:
- Text block: a block with the foreign text that we want too read.
- Translation block: a block containing the translations of the difficult words.
- threshold field: a field in which the user can specify the desired threshold.
- Dictionary/language button: a button where the reader can specify the language of the text.
- We need the following elements in a pop-up edit window.
- Sample words block: a block containing sample words to guide the user while choosing a category.
- A rating field: a field in which the user rating of the word can be specified, together with the common rating.
- Translation field: a field that contains the possible translations of the word.
- Isknown field: a field with a boolean value that states if the word is known or unknown to the user.
- Should we go for a dynamic or a static layout? That is, should the user decide where to place the blocks? I don't think this is necessary, because there is a small and fixed number of blocks.
- The text block should be rather narrow for easy reading.
- If the text in the text block is chopped up into pages that fit on the screen, than it is easier to read, and will also make sure that the text and the translations won't start to walk out of line. On the other hand it is much easier to make a layout that does not chop the text up in pages. Therefore I propose to start with the latter, and change to the former when everything works.
- The words with rating higher than the threshold should be marked with a 'footnote,' a number in super script. This won't be too distracting and links the words with their translations.
- We have several different types of words occurring in the text. This is a proposal for distinguishing between these words (see the picture):
- Unrated words that appear in the dictionary (gray).
- Unrated words that don't appear in the dictionary (gray, underlined).
- Rated words whose rating is lower than the threshold (regular).
- Rated words whose rating is higher than the threshold (footnote).
- The block with sample words in the category can be placed in a pop-up, thereby only showing them when it is necessary.
- The following picture shows a proposal for a screen layout.
There are some general remarks that I would like to make.
- For the accessibility I want it to be a web application. Does anyone have an idea what programming tools would be suitable? Do I hear AJAX?
- It might be rather easy to make printable PDF-versions, by exporting to a TeX-file that adds the translations as footnotes at the bottom of the page. This of course misses the dynamical options of rating words and changing the threshold on the spot, but might serve as a prototype while 'getting things work.'
- We could make a showcase of books that have been read thoroughly before, and therefore have no unrated and unknown words and maybe an audio file.
- Gutenberg also contains many human and computer read audio books. We could, for such books, add a button which allows one to read and listen at the same time.
- It might be easier to start with an English language and dictionary.
- If this idea would get very popular, people might submit their own texts. These could, in some cases, be sent to either Project Gutenberg or Wikisource. More importantly, translations of words in the dictionary could be sent back to the dictionaries we use.
A Road Map
In this section I want to propose a road map that will lead to a fruitful product in the end, but also gives us something working in the beginning.
- A version in the English language that exports to TeX.
- A web application that doesn't permit changing the dictionary, changing the threshold, adding a rating, doesn't chop up the text into pages
- A web application that allows for changing the dictionary, the threshold, the rating, chops up the text into pages, adds a button to play an optional accompanying audio file from Project Gutenberg.
Several questions to get some input.
- Which dictionaries are suitable to use?
- Where can we find suitable online free texts?
- Does anyone have a different idea of the screen layout?
- What is the language in which the application should be written?
- Is there anything else we need to ask ourselves?