Pages

Sunday, July 4, 2010

My Journey into Open Source and Cross Platform Independence

When you advance in your career as a scientist, the choices that you make to accomplish certain research tasks become of crucial importance. If you spend a year collecting data and analyzing in Excel for instance, creating all sorts of plots and customizations, it will be very hard to make the switch to another software, say OpenOffice. Data analysis is not the only crucial component of a successful research endeavour. Your entire digital world is at stake here. The way you manage your emails, code, graphics, presentations etc... will have an impact to the way you handle things. From sharing data with collaborators, to publishing in journals, to accessing your files from anywhere on the planet; the way you do things can make all this truly enjoyable and lasting.

Part of the problem lies in the fact that there are no unified standards to doing things, especially when the tasks become "high level", such as a graphic presentation. Let me give you an example. If you are writing a piece of code in C, then you can rest assured that your code can be made to run on any type of computer. I call this type of approach "low level". In contrast, say that you are preparing a high quality presentation in PowerPoint or KeyNote. Your presentation is full of graphics and animations. Then it can be safe to say that yours will only work using the software that you used to create it. Of course, keynote can read pptx, but you'll spend more time fixing the presentation that you may as well just do it from scratch. This type of task is high level because of the advanced nature of the software and proprietary nature of some of its features.

For these reasons, and many others as well, I have decided to align my choices with three premises: (1) Low Level Approach, (2) Open Source, (3) Platform Independence. Here are the choices that I made

Operating System(s)

I use all of the following: Windows, MacOSX, and Linux. Because I am heading towards open standards, I have had very little problems handling files across these platforms.

The Cloud: Email, Calendar...

  1. Google goodness: I use almost all google services. In particular, I use gmail to handle all my mail accounts and Google calendar for events. I only use the browser to check & send mail. I have not used an email client since 2009.
  2. Live Mesh, Sync: I keep a copy of all my research files online AND across all my computers. For that purpose, i've been quite happy with Windows Live Mesh and Windows Live Sync. When go beyond the storage limitations, I may just move to drop box or some similar service.

Manuscript Preparation

  1. Word Processing: I ONLY use LaTeX to prepare my documents (even short letters). The last document that I have in Word is my CV which I am now converting to LaTeX. However, many scholars (especially in engineering) prefer to use Word, therefore I keep a copy of OpenOffice on all my computers in case I need to use it or I use Google docs.
    My policy on this is simple: If I am leading a project, I will exclusively use LaTeX to document our findings. If someone else wants me to help with their project, I will use whatever they have prepared their report in.
  2. Reference Management: I use Mendeley. It keeps a bibtex library constantly updated so that I only reference that in all my LaTeX documents.
  3. Graphics: I use InkScape! That was one of my most valued discoveries this year. It is cross platform and uses an open source graphics format called SVG. SVG stands for Scalable Vector Graphics. So you would expect really hight quality graphics in your PDFs!
  4. Plotting: Here's one problem I have not resolved yet. I now use OriginLab and I find it to be a very good piece of software given all its programming capabilities. Unfortunately, it is not platform independent. So far, I have not found a decent replacement for OriginLab and I may have to stick with for a while. I have looked at plotting with PSTricks, PGF/TikZ, and GnuPlot, but was not satisfied with the process.

Presentations

  1. So far, I am quite stuck with Powerpoint. I am very impressed by its capabilities and will find it quite hard to move to an open source presentation software. Because this type of work is high level, it may be hard to adopt a simple open standard approach.

Scientific Software

  1. C/C++/Java etc... Just need an editor! I use different editors on different platforms.
  2. Mathematica: This is one piece of great software that I would not get rid of. I have looked at open source alternatives such as Sage, but found that it lacks several features related to symbolic analysis. Mathematica's symbolic capabilities are the main reason for me using it.
I may have missed a few points, but will add them later. If you have any suggestions or know of any software that would handle plotting and symbolics, please let me know.

Finally, here's a message from good ol' uncle Sam


7 comments:

  1. Regarding plotting you might want to consider pgfSweave (a LaTeX package), which lets you integrate R code into your LaTeX source code.

    Although R is gross overkill for my current purposes (basic 2D plots) the combination of R and LaTeX ensure reproducibility in a way that Origin couldn't (I used Origin for my master's degree project).

    Plus pgfSweave gives results that look aesthetically pleasing (the plots are effectively rendered using LaTeX).

    R is open-source. If you already master LaTeX, learning some basic R should not prove too hard (extensive resources exist online).

    ReplyDelete
  2. Thanks a lot for that. I will look at it but I don't know how much this will differ from using pstricks (other than the limitation of pstricks to compile as eps).

    ReplyDelete
  3. I would add that packages such as lattice and ggplot are responsible for a lot of the better-looking data visualization work I've seen recently.

    You might have a look at GNU Octave for the open source math - there are lots of add-ons to be had. As well, check the R project. Look through Crantastic for the modules that suit your purposes.

    Good luck with your quest!

    ReplyDelete
  4. GiNaC is an interesting symbolic package that can be implemented in C++, but it's more for doing symbolic math manipulation on-the-fly, and is definitely not a Mathematica clone.

    ReplyDelete
  5. (although it's actually what Octave uses for its symbolic math package)

    ReplyDelete
  6. Kent, thank you for sharing that. I'd prefer to have a plotting package with a user interface. If my option is to go with command line/coding, I'd most likely stick with PGF/TikZ, pgfplot, and PSTricks. All those have a learning curve, and I don't know how steep that is. Not sure if they are worth all the time and effort.

    I've looked at OCTAVE before, but I had the impression that it is more dedicated for numerical analysis. I liked Sage better because it runs in a browser!

    Charles, I'm glad you mentioned GiNaC as I was looking for something similar for programming purposes. Presently, I am only interested in using a front end for symbolic manipulation.

    ReplyDelete
  7. I would suggest you to look for free Beamer templates; specifically since you know about latex. It isn't as hard as you may think. It does look Professional. Simply use \usepackage{tikz} if you like to suffer a bit; but at least tables and charts look slick :) Microsoft products are just for the mass who still have too much money in their pockets and very little learning potential...

    ReplyDelete