Wednesday, March 21, 2007

About making onigiri, sushi and dumplings, and our puppy

Annett has written a couple of posts about us making onigiri (the pictures), making sushi and dumplings with my friends from the CMA (the pictures), and the birth of our puppy! In two weeks we can go and look at her, and six weeks later we can go get her!

Labels: ,

Wednesday, March 14, 2007

The SAGE Computer Algebra System

For a long time I've been struggling with the following problem. When you start working with a certain Computer Algebra System (CAS), you spend a lot of time becoming familiar with it, adjusting it to your preferences, writing your own functions etc. Because of this, you automatically trap yourself into using one and only one CAS for most purposes, and with time it becomes increasingly difficult to switch to any other CAS. Therefore this choice, this decision about which CAS I would "go for", became a very important one for me.

Unfortunately it is not only very important, but also very difficult. There are many CAS out there, each of them with their own particularly attractive features. Mathematica and Maple, for instance, have a huge user base, and Axiom works with structures that are very closely related to mathematical structures (e.g. categories). In order to make the right choice, an obvious strategy is to make a list of features that can not be compromised on. Of course this is very difficult because typically you only discover what is essential when you find yourself in the position of lacking it. A rather hopeless situation if you, like me, have a relatively limited experience with CAS.

About half a year ago I became acquainted with a CAS that tries to be something distinctively different from the rest, and it is called SAGE. For short, it is a free multi-purpose CAS with a strong interaction with other existing CAS. It is mostly this "strong interaction" that solves my problem, and in my opinion it is a revolutionary piece of software that deserves to be better known. Therefore I will try to give an overview of the features and design decisions of SAGE that make it the best option that I can use and contribute to. Before this large post scares you away, let me remark that you can try SAGE in your browser (at least in Firefox, Safari and Opera).

So what is SAGE?

SAGE aims to be the following:

  1. A complete self contained distribution of free open source mathematics and other special-purpose software for Linux, OS X, Microsoft Windows, and Solaris.

  2. A new library of functionality to fill in the many gaps in functionality in existing CAS. That is, new implementations of algorithms that are generally missing in other CAS.

  3. A collection of interfaces to most other mathematical software systems (both free and commercial), bringing all these CAS under the same roof.

For a CAS whose development started in early 2005, the results are amazing. This is a direct consequence of this third point, of the idea to let SAGE borrow functionality from other existing CAS. Therefore SAGE developers like to say that they don't reinvent the wheel, but build the car.

Figure 1: Don't reinvent the wheel. (Image created by Martin Albrecht.)

Free software and mathematics

Long term followers of this blog know that I am generally a proponent of free software and open source. As I mentioned for instance in my post on Granule, I strongly prefer working with free software when there is the chance to contribute something back to the project, which is almost always the case.

Furthermore I believe that free software projects have the most potential in the long term. Linus Torvalds, initiator of the Linux kernel, said something sensible about this:

I think, fundamentally, open source does tend to be more stable software. It's the right way to do things. I compare it to science vs. witchcraft. In science, the whole system builds on people looking at other people's results and building on top of them. In witchcraft, somebody had a small secret and guarded it -- but never allowed others to really understand it and build on it.

Traditional software is like witchcraft. In history, witchcraft just died out. The same will happen in software. When problems get serious enough, you can't have one person or one company guarding their secrets. You have to have everybody share in knowledge.

Apart from these general reasons to work with free software, there are some special reasons in the case of mathematical software. In particular, some people think that a mathematical result obtained from a CAS can only be accepted if it comes from an open source program. Already in 1993, Joachim Neubüser, who initiated the free software CAS GAP in 1986, made the following interesting remark:

You can read Sylow's Theorem and its proof in Huppert's book in the library [...] then you can use Sylow's Theorem for the rest of your life free of charge, but for many computer algebra systems license fees have to be paid regularly [...]. You press buttons and you get answers in the same way as you get the bright pictures from your television set but you cannot control how they were made in either case.

With this situation two of the most basic rules of conduct in mathematics are violated: In mathematics information is passed on free of charge and everything is laid open for checking. Not applying these rules to computer algebra systems that are made for mathematical research [...] means moving in a most undesirable direction. Most important: Can we expect somebody to believe a result of a program that he is not allowed to see?

David Joyner, professor at the United States Naval Academy and long term contributor to SAGE, formulated this second point as follows:

A result computed by a computer algebra system, whose source code is not "open source", can not be accepted as part of a mathematical proof. Within the general mathematical community, it seems fair to say that a mathematical truth is not a theorem unless its proof is written down for public scrutiny (i.e., "open source") and generally accepted as correct. Just as to verify the correctness of a theorem you can go through the proofs of all the results your theorem depends on, one should be able to verify the correctness of an algorithm by reading the programming code of all the algorithms your algorithm depends on.

Moreover, a typical feature of the mathematical community that uses a CAS, is that a very large part of it is a potential contributor. Such a contribution could be reporting a bug and making a patch, requesting a feature, improving an algorithm or its implementation, or improving the documentation. The possibility to contribute back, to be more than just a user, is an attractive feature for a CAS.

To fully benefit from such an eager audience, one should make it easy for them to contribute. That is, one should give them direct access to the source code. In SAGE this is achieved by just typing the command followed by two question marks, for example plot??, and this results immediately in a box with the source code of the function (though at the moment this only works for native SAGE functions, and it doesn't give you the source code of a function that can be reached through an interface to, say, GAP). Because of these reasons, I find the CAS being free software not something that can be compromised on.

Interpreter versus compiler

William Stein, founder and leader of the SAGE project, explains his choice for using Python and Pyrex in the SAGE Programming Guide better than I can:

Every serious computer algebra system, e.g., MAGMA, PARI, Mathematica, Maple, GAP, Singular, etc., is implemented as a combination of compiled and interpreted code. For Mathematica, Singular, MAGMA, and PARI, most of the implementation is in compiled C (or C++) code; some of these systems tend to be very optimized (subject to the constraints of the algorithms they implement). In contrast, Maple and GAP have a relatively small compiled "kernel" that defines the underlying programming language; most of the system is then implemented in this language. If you do benchmarks you'll discover that Mathematica is much faster than Maple at some basic operations. Likewise, benchmarks reveal that MAGMA is often faster than GAP.

This fusion of interpreted and compiled code is extremely natural for mathematics software; some algorithms are much better implemented in an interpreter because all time-critical steps involve low level arithmetic -- other algorithms, e.g., matrix multiplication, must be implemented in a compiled language in order to perform optimally. Also, existing compiled libraries are sometimes of very high quality, and a compiled language is needed to create the best possible interface to such libraries. It's crucial that both approaches to programming be fully supported. When deciding how to implement SAGE, I searched for months for an environment that could support both approaches. Python and Pyrex provide exactly this in a way that I believe is much easier conceptually than the implementation models of any of the other systems mentioned above.

Personally I love Python, and I couldn't think of any better choice for a scripting language. At the moment I am programming in R for a course in applied statistics that I'm taking. Although it is a wonderful and amazingly complete free statistics environment, its scripting language cannot compare to Python, and I find myself longing for the transparent syntax and list comprehensions in Python. It is only a matter of time before there is also an interface between R and SAGE, combining the functionality of R with the power of Python.

A smart interface

The ordinary SAGE interface is built around IPython. This is a Python interpreter with a pleasant command line that features for example command line completion, which is very convenient for saving time on typing and remembering long function names. Strictly speaking, instead of function, the term method would be more accurate, because the command line completion yields only the methods within the class you are looking. Typing for instance set.[tab], yields all the methods for the object set, like intersection and union.

This naturally leads to an obvious advantage of using an object oriented language like Python: it prevents the global name space from becoming polluted with thousands of very specific functions, making it almost impossible to find the function you're looking for and making it easy to confuse the context in which a certain function operates. Instead, each method is associated to the type of object on which it acts. For example, set.union acts on sets. That being said, some functions can still be called from the global name space, allowing one for instance to write sqrt(a) instead of the rather awkward a.sqrt().

Now most modern CAS have, besides the command line, some kind of a graphical user interface. This makes it easier for users to get started, enables users to work more conveniently with graphical objects and multi line input, and displays formulae in a more readable form. For a CAS, the usual form of such a GUI is a notebook, which is basically a graphical representation of text cells, input cells, output cells and a menu system. An amazing feature of SAGE is that its notebook is not a stand-alone application that runs directly on your system. Instead, it is a web application that runs in your browser and uses AJAX technology to feel like a stand-alone application. This way anyone with a decent browser and an internet connection can try SAGE by surfing to an internet site that runs SAGE. Alternatively, you can run the notebook on your own computer by typing notebook() in your SAGE shell, and surfing to http://localhost:8000 in your web browser. This notebook, I think, is what will attract most users to try out the program in the first place. The fact that it is free software and can be used just by surfing to a website, gives it an accessibility commercial software can never offer.

Moreover, the notebook uses jsmath to display formulae, and if you have installed true type fonts on your computer the result looks as if it were coming from TeX. Calling the function latex to an object, will give you in most cases the code for rendering it in your TeX document.

Figure 2: The SAGE notebook.

Smart documentation

Analogously to the source code, the documentation of a function can be reached by typing a command and one question mark, for example plot?. In a way typical of Python (and e.g. Java), such documentation is automatically generated from the source code. There is a rather specific template this documentation should adhere to, containing a one-line description of the function, the input, the output, examples, notes, references and authors. Apart from providing a very useful tool to keep track of the documentation, the examples can be automatically tested. Because of the nature of SAGE as combining different CAS, each of them complex by themselves, tools for testing for bugs are essential, and this is a rather ingenious way to deal with an ever growing more complex interaction of pieces of code. On the other hand, this ensures that the examples in your documentation are correct.

Apart from this smart way of dealing with reference documentation, SAGE has a convenient way of presenting other documentation. By clicking on the edit-button in the notebook, you can edit the source code of the notebook. The possibility of inserting HTML outside the input cells, gives you a way to present a tutorial, a how-to, or any other type of documentation in an interactive way. A user can then play around with the examples you give in the tutorial, and absorb the information quicker than he would have been able to with a static tutorial. Of course such interactive notebook tutorials are not by any means new, but the fact that you can take them from anywhere with a decent browser and internet connection gives it an accessibility that is unmatched.

Further reading

I collected a couple of links to places where you can find more information about SAGE, some obvious, and some maybe not so obvious. Enjoy!

  1. The main website of SAGE.

  2. An interactive tutorial.

  3. Static online documentation of SAGE.

  4. A recently updated webpage with a list of open source CAS together with a description.

  5. A directory with talks about SAGE.

  6. A recent talk about the social and technical status of the SAGE project.

  7. A Google Groups discussion group about the development of SAGE.

Labels: ,