Tuesday, December 15, 2009

Scientific Python tools use rising in education

It's pretty clear to me that Python is rapidly growing in acceptance as a computational platform at universities everywhere. I recently heard from Josh Bloom at UC Berkeley astronomy that his proposal for a short 'boot camp' course at the beginning of the Fall semester was approved. This is excellent news, last year I taught something similar for neuroscience students and postdocs, and I'm glad to see the campus adopting python further as a key component of the computational training the science students receive.

John Hunter and I just completed a few days ago teaching another such workshop at the Claremont Colleges, supported by an NSF grant that John Milton, (J. Hunter's PhD advisor) has for exposing undergraduates to a number of research-related experiences. This grant supports summer research internships where two undergrads visit together a research lab away from their home campus to work independently on a project, as well as our teaching of scientific python tools (we were also there in 2008 and hopefully will continue next year).

I have to say that I really enjoy teaching this type of workshop, especially now that the core tools are fairly mature, installation isn't the problem it used to be, and we have a chance of presenting interesting examples to the students from the very beginning. By now we've taught a number of these at various labs and universities, and I think we've found a good workflow.  John is a phenomenal lecturer, with a real knack for illustrating interesting concepts on the fly, in a very natural manner that is honestly similar to how exploratory work is actually done in real research, where you run a bit of code, plot some results, print a few things, code some more, etc. In the picture above, he was working through some of the matplotlib image tutorial, which gave us an opportunity to find and fix a small bug in how the luminance histogram was being computed (the current form is correct). Every one of these students probably has a digital camera these days (if nothing else, in a cell phone), so an example like this is a great way to connect something they are used to with mathematical and programming concepts.

Here, John gave a great illustration of random numbers and simple statistics. He built up interactively a simulation of random walks, working up from a single (one-dimensional) walker to a group of them, comparing the simulation results with the analytical predictions for various quantities (like mean and variance), and also explaining to the students how these squiggly lines could be considered a model of price fluctuations over time. In this manner, he connected the somewhat abstract statistical concepts to something the students could relate to, namely the risk of an investment making a profit or a loss after a certain period.

We also talked about FFTs, dynamical systems, error analysis, data formats, and a few other things. It was very encouraging to have most of the students return on the second day, considering how this was a completely optional activity for them, covering an entire weekend (with morning lectures both days) right before they had to start diving into their finals. But they were a smart and enthusiastic bunch, and I hope that the workshop gave them some useful starting points they can then develop on their own as they get involved in research projects.

These are just two examples of how we are now seeing Python's acceptance in university computing growing. We have a lot of work still ahead of us, but it's really encouraging to see how far we've come from the days when building scipy was a black art, IPython was little more than a different prompt for the python shell and matplotlib could only do basic line plots in a GTK window. We now have tools that provide a complete computational environment for teaching and research, and things are only getting better...

Friday, November 6, 2009

Guido van Rossum at UC Berkeley's Py4Science

Update: Quick links

On November 4 2009, we had a special session of our informal Py4Science seminar where Guido van Rossum visited for an open discussion regarding the uses of the Python language in scientific research. Guido had expressed his interest in discussing the work that various scientists do with Python but mentioned that instead of a formal talk, he would prefer a format that allowed for more interaction with the audience. We agreed that a good plan would be for us to present a rapid-fire sequence of very short talks highlighting multiple projects so that he could get a good "high altitude" view of the scientific python landscape, leaving then ample time for discussions with the audience.

Guido has already posted his impressions of the visit, and so have my colleagues Jarrod Millman and Matthew Brett, so I'll try to provide a complementary view here without too much repetition.

We gathered for lunch first with a small group and had a very interesting discussion on various topics; we had a chance to talk in some detail about the transition to Python 3 for Numpy, something a number of people have started to think about seriously. Numpy is pretty much a 'root dependency' for so many scientific projects, that until it makes the jump it will be very difficult for anyone else in science to seriously consider Python 3. Understandably, Guido would like to see some movement from our community in this direction, and he offered useful guidance. In particular, he said that in the core Python-dev team there might be enough interest that if we ask for help there, we might find developers willing to pitch in and provide some assistance. He also expressed some disappointment that PEP 3118, which was accepted with our interests in mind, still hadn't been fully implemented. Limited manpower is the simple reason for this situation, but fortunately Jarrod mentioned that there's a good plan to address this in the near future.

I had a chance to quiz Guido about something I've wondered for a while: Python has unusually good number types in its core (arbitrary length integers, extended precision decimals and complex numbers), but the integers divide either into the integers (the truncating behavior of Python 2.x) or into the floats (in 3.x). While the 3.x division is an improvement, I would have really liked to see Python go to native rationals; for one thing, this would make the Sage 'language' (by which I mean the extensions Sage makes to pure Python) behave like Python in algorithms involving integers, eliminating a recurring source of confusion between the two. I also happen to think it would be a 'better' behavior, though there are valid reasons also for someone to expect a more 'calculator-like' answer to divisions like 1/2, who might be annoyed if they get 1/2 back instead of 0.5. While obviously such changes will not be on the table for a long while (and I should say here that I am very happy with the planned moratorium to core language changes, as I hope that will allow a more focused effort on the needs of the standard library), it was interesting to hear Guido's approach to this question as one that could be handled via overloadable literals rather than a change of integer semantics. I'd never thought of that, but it's an intriguing idea... Something to think about for when we start looking at Python 4000 :)

We then headed over to the official presentation, where we managed to cram 14 talks in 50 minutes and then had a full hour of open conversation with Guido, where the audience asked him questions on a number of topics. You can see a complete video of the entire session: after 50 minutes of talks we have a transition, and Guido's section starts at the 54 minute mark. On my website I have posted a page with all the slides for these mini-talks.

I presented an overview introduction and material on behalf of 4 others who were not present locally, but coincidentally, William Stein of Sage fame was on campus to give a talk in the same building almost at the same time, and he could present the Sage slides directly. Ondrej Certik from SymPy was able to make the trip from Reno, completing our out-of-town speakers. The other 7 presentations were from a number of local speakers (from various departments at UC Berkeley and Lawrence Berkeley National Laboratory, just up the hill from us).

I have received very good feedback from several people, and I am really thankful to all the speakers for being so attentive to the time constraints, which let us pack a lot of material while leaving ample time for the discussion with Guido. My intention with this was to really provide Guido with a broad understanding of how significant Python's penetration has been in scientific computing, where many different projects from disciplines ranging from computer science to astronomy are relying heavily on his creation. I wanted both to thank him for creating and shepherding such a high-quality language for us scientists, and to establish a good line of communication with him (and indirectly the core python development group) so that he can understand better what are some of the use patterns, concerns and questions we may have regarding the language.

I have the impression that in this we were successful, especially as we had time after the open presentations for a more detailed discussion of how we use and develop our tools. Most of us in scientific computing end up spending an enormous amount of time with open interpreter sessions, typically IPython ones (I started the project in the first place because I wanted a very good interactive environment, beyond Python's default one), and in this work mode the key source of understanding for code are good docstrings. This is an area where I've always been unhappy about the standard library, whose docstrings are typically not very good (and often they are non-existent). We showed Guido the fabulous Numpy/Scipy docstring editor by Pauli Virtanen and Emmanuelle Gouillart, as well as the fact that Numpy has an actual docstring standard that is easy to read yet fairly complete. I hope that this may lead in the future to an increase in the quality of the Python docstrings, and perhaps even to the adoption of a more detailed docstring standard as part of PEP 8, which I think would be very beneficial to the community at large.

In the end, putting all this together took me a lot more time than I'd originally planned (I think I've had this same problem before...), but I am very pleased with the results. Python has become a central tool for the work many of us do, and I am really happy to establish a good dialogue with Guido (and hopefully other core developers), which I'm sure will have benefits in both directions.

Monday, September 7, 2009

CSE'09 article in SIAM News

CSE'09 article in SIAM News

This isn't fresh-off-the-oven news, but it's still on the main SIAM News page, so I'll mention it as I am trying to do a better job of keeping up with this blog...

As I mentioned on a previous post, our three part minisymposium on Python at the SIAM CSE09 conference was fairly well received. But even better, SIAM asked us to write up an article for publication in the SIAM News bulletin. I was really pleased with this, as SIAM News reaches a very broad international audience and is actually read by people. I often find very interesting material in it, as the publication hits a very good balance of quality and not being overly specialized (as we all drown in work, we tend to focus only on very narrow publication lists for everyday reading,

Randy, Hans-Petter and I drafted up an article, and we received great feedback from all of the presenters at the minisymposium, including figure contributions by John Hunter and Hank Childs (who was at LLNL at the time, and I'm delighted to see is just up the hill from me at LBL).

Our article is now available in HTML online at the SIAM site, and I also have a PDF version that is more suitable for printing in case you are interested (note that my version is missing the very last corrections from the SIAM editor, but the differences should be minor).

Tuesday, June 30, 2009

Scipy advanced tutorials results

We recently conducted a poll on Doodle, soliciting feedback on the preferred topics for the advanced track, which is meant to contain 2 days with 8 2-hour sessions focusing on one specific topic at a time. The table below shows the complete results, which I've only sorted for convenient viewing and anonymized (the raw Doodle output contains the names given by each person voting). If anyone would like the raw spreadsheet, just drop me a line.

The score was computed as #yes-#no (i.e., yes=+1, neutral=0, no=-1), from a total of 30 responses, and the results are in the table below, ranked from highest to lowest score. In my personal opinion, all the topics offered would have made for very good and interesting tutorials, but the point of asking for feedback is obviously to follow it to some degree, which we will now do.

I think it's worth noting --though not particularly surprising-- that the ranking roughly follows the generality of the tools: matplotlib and numpy are at the top, with finite elements and graph theory at the bottom. While I personally use NetworkX and love it, it's a specialized tool that for many probably offers no compelling reason to learn it, while pretty much every single numerical python user needs numpy and matplotlib. We are now in the process of contacting possible speakers for the top topics, and will communicate on the mailing list a final list of topics once we have confirmed speakers for all.  

Yes Neutral No Score Rank
Advanced topics in matplotlib use 18 10 2 16 1
Advanced numpy 18 10 2 16 2
Designing scientific interfaces with Traits 15 11 4 11 3
Mayavi/TVTK 13 11 6 7 4
Cython 14 8 8 6 5
Symbolic computing with sympy 15 6 9 6 6
Statistics with Scipy 9 15 6 3 7
Using GPUs with PyCUDA 13 7 10 3 8
Testing strategies for scientific codes 11 11 8 3 9
Parallel computing in Python and mpi4py 12 8 10 2 10
Sparse Linear Algebra with Scipy 9 12 9 0 11
Structured and record arrays in numpy 8 14 8 0 12
Design patterns for efficient iterator-based scientific codes 9 7 14 -5 13
Sage 8 6 16 -8 14
The TimeSeries scikit 4 13 13 -9 15
Hermes: high order Finite Element Methods 6 9 15 -9 16
Graph theory with NetworkX 5 9 16 -11 17

Monday, March 9, 2009

Python at the SIAM CSE'09 meeting

After the success of last year's Python minisymposium at the annual SIAM meeting, this year we had a repeat: Simula's Hans-Petter Langtangen (author of the well-known Python Scripting for Computational Science), U. Washington's Randy LeVeque and I co-organized another minisymposium on Python for Scientific computing.

At the Computational Science and Engineering 2009 meeting, held in downtown Miami March 2-6, we had again 3 sessions with 4 talks each (part I, II and III), with a different mix of speakers and focus than last year. While last year we spent some effort introducing the language and to a certain extent justyfing its use in real-world scientific work, we felt that this time, the growth of the many python projects out there speaks for itself and that we should instead turn our attention to actual tools and projects useful for specific work. Thus, we had no 'why python for science' talk, although obviously most speakers spent some time providing this kind of information in the context of their own projects.

I think that this is the right path to follow: for a number of years, many of us have been developing tools and justyfing Python as a viable alternative to tools such as Matlab or IDL, but we need to start moving away from that mode. There is no doubt in my mind now that, while still immature in certain areas, Python is a credible, production alternative to said packages. In many contexts it can actually be far superior to its propietary and expensive counterparts, as many of us have found in practice. So hopefully as we move forward, we will do less of 'you can use python for scientific work' and more of 'here is a great scientific project/tool that happens to be implemented using Python'. Eventually, I hope we will not have any Python-specific sessions at scientific meetings, just like we don't see any "Fortran for scientific computing" special sessions.

This was a smaller meeting than the annual one, but attendance for our sessions was good. There were interesting discussions and questions from audience members who were obviously new to the tools and curious to learn more. Personally, I had a great time both catching up with some familiar faces and meeting some new ones, all of whom I hope will become more regular contributors to the greater ecosystem of scientific python tools. Ondrej Certik, always full of energy, promptly posted his own report, along with some video from where we were staying.

An aside: when I tried to book the conference hotel all available rooms at the conference rates were taken, so a plan B was in order. I found online a room across the street that was cheaper than the Miami Hilton's conference rate and rented it. It turned out to be a great condo on the 50th floor of this tower, with free high quality internet, lots of work space we could use and a gorgeous view. Lesson learned: in this economic crisis, shop around: we got a great place and still saved money over the normal conference rate... The conference hotel is just barely visible, in the lower right hand corner of this shot: I am currently in the process of collecting slides from the speakers; check that page later if you are interested in any talk for which I haven't posted them yet: I'll continue to update it as I receive them. Until the next meeting... [ Image credits: John D. Hunter and F.P., full album available here ]

Book review: Expert Python Programming

Update: I've slightly modified the language of this review, which as my wife correctly pointed out to me, was unnecessarily harsh. While I stand by my previous evaluation of the book, I think the same things can be said in a more constructive tone.

While this isn't strictly a SciPy post, I've already have a few questions about this book, so I guess I'll tag it as 'scipy' as well, for those interested. I recently reviewed the book Expert Python Programming by Tarek Ziadé. While not aimed at a scientific audience, the book covers a number of topics that we frequently discuss on the Numpy and Scipy lists (such as documentation and testing, workflows, API decisions, etc). Since I really prefer to write longer text in reST using Emacs than in a blog editor, I've posted the review over at my static site. Feel free to head over there if you are interested in the full review, I've only reproduced here the summary:


Summary Expert Python Programming covers a list of very interesting topics regarding real-world development using Python. It assumes a reader who already knows the basics of the language and covers a number of important topics, both in the more advanced parts of the language and in terms of developing applications using Python. There is a strong emphasis on agile development practices, testing, documentation, application life cycle management and other aspects of 'real world' work. The list of topics covered is excellent, and the book is well printed and bound.

However, unfortunately it suffers from rather poor editing throughout, with a broken idea flow that makes for choppy reading. Very few ideas are properly developed, as the book relies excessively on code snippets, bullet lists and stand-out info boxes. Ultimately, this gives it more the feel of a set of notes than that of a coherent volume. This should not be read as an indictment of the book: the table of contents alone is a list of 'right things to do' when using Python, and there is a great deal of useful material in all of the chapters.

If you are looking for reference material, links and starting points for further reading, Expert Python Programming can be an excellent resource and well worth your investment. However, if you are searching for a text that develops complex concepts at length, delving into details and subtleties, it might not be the ideal tool for you. I hope this provides a useful picture of the qualities of this book.