GFS Version 2

We have now officially completed GFS Version 2. What is GFS version 2? Based upon the original Genome-based Fingerprint Scanning program, GFS V2 is designed as a general-purpose, open proteomics search engine. Its specialty is matching mass spectrometry data (both MS and/or MS/MS in V2!) to unannotated genome sequences. In other words, if you want to perform a proteomic search against an unannotated genome, or you don't fully trust the computational gene predictions in the database, GFS might be for you. You can try it at our redesigned website http://gfs.unc.edu, or just drop us a line and we'll send you a link for downloading the full set of source and executables from ProteomeCommons (under a liberal Apache-style license). We will publicize the link as soon as we have our manuscript detailing version 2.0 out the door.

New Features in Version 2.0
Runs on multiple platforms! Specifically, we provide pre-compiled binaries for Windows, Linux, and Mac OS X
• MS/MS data - GFS now has two modes to take advantage of tandem MS data, both a "shotgun" mode and a Peptide Mass Fingerprint mode.
• Distributed/Cluster usage - GFS has a built-in system for distributing searches over a compute cluster
• Post Translational Modifications (PTMs) - GFS V2 now has support for PTM's (only for PMF mode at present)
• Many speed & Memory improvements and bug fixes
• Gene-scanning mode - use annotations if you have them. This may be particularly useful for comparing results between matches to annotated genes versus raw genome
• Batch mode - run many files at once

For more details on what is in version 2.0, feel free to have a look at our draft manual (which is a work in progress).

Coming soon, Version 2.1!
We are hard at work on version 2.1 already, which is planned for a November 2007 release. Version 2.1 will include:
• A new very high accuracy MS/MS matching mode, that does a much better job for matching shotgun results against large genomes
• A newer cluster/distributed computing implementation, allowing a search to be divvied up into 100's or even 1000's of jobs to take full advantage of available compute resources.
• Bug fixes for the B+ Tree mode.
• 64 bit operation
If you are interested in beta testing GFS with the above features, please drop us a line.
|

GFS for Windows

We now have GFS running on Windows XP. It will be available for download from our website as a Windows executable file shortly. It seems to run a bit more slowly on Windows than under Linux or OSX on the same processor. We haven't yet determined the source of this, though we suspect that it has to do with memory allocation speed when using the Gnustep kit under Windows. However, we've done a lot of overall speed optimization lately, so hopefully it won't be too slow... we welcome user feedback and input, especially if you have insights on this.

In other news, we are considering moving the GFS project over to sourceforge. We welcome your feedback about this (remove "remove.these.words." from the email address).


|

We're on the cover of BMC Evolutionary Biology

The front page of the Journal BMC Evolutionary Biology features our paper on the evolution of the Dscam gene family in their Research Highlights section on the front page.

This is gratifying, since this paper has been a long time coming. Among other things, the lead author, Mack Crayton, was displaced from his new home in New Orleans while we were in the revision process.

This paper has now been rated as "highly accessed" by BMC Evolutionary biology.
|

GFS updates

For those wondering what's going on with GFS these days, here's a rundown of some highlights (more detail coming in a paper soon);

- We have GFS running on Debian Linux now, and it seems to run just fine. It relies on the GnuSTEP object kit, which is essentially a replica of Cocoa in MacOSX. We rely on this object kit because it provides extensive functionality such as that which is now present in Java, e.g. lists, arrays, strings, sets, and all sorts of useful stuff. We are now working on testing and packaging GFS for Linux. Next we will work on the Windows port, which is not expected to be difficult, since GnuSTEP is also available for Windows, and the port to GnuSTEP is done.

- We have added MS/MS interpretation via sequence tags. Presently it only works with combined MS and MS/MS data (i.e. peptide mass fingerprints). However, we are extending the program to operate with MS/MS spectra alone, e.g. for MudPIT type data.

- GFS is now designed for automatic distribution of large jobs on a compute cluster. Presently the functionality is designed for Sun Grid Engine, but can be easily extended to other queue management software.

- We now have automated the calculation of the Expectation value for each match made, using the approach originally developed by Fenyö.

- Many optimizations have been made to speed searches and improve the clarity of the results. And, of course, there's the occasional bug fix as well.
|

About the News/Blog page

Our Wiki has been great. But it is already suffering from major bloat, so we've decided to create a distinct set of pages for outside visitors that are a bit more focused and less diffuse. Here we'll talk about the latest happenings, and maybe get a little bit philosophical from time to time. You are still welcome to visit the Wiki if you want to find out more detail about us.

Morgan
|