Data access library: MSFTBX - Mass-Spec File Toolbox
The data access library is at the core of BatMass, but is a separate standalone project. It can be used in regular Java programs (read as a jar library) and NetBeans platform applications via the included NetBeans Module wrapper.
Features
- Single API to mzML and mzXML files
- mzML and mzXML parsing
- Very fast multi-threaded parser
- Can separately parse LC/MS run information, the index and data
- Separation of parsing of scan meta-information and spectral data
- Automatic indexing of the data
- maps from scan numbers to scans
- maps from retention time to scans
- same maps separately at each MS level
- automatic DIA (data Independent Acquisition) detection and automated grouping of DIA MS2 scans according to the corresponding isolation windows
- Memory management
- can parse the whole structure of the run (all scans with all meta-info) and dynamically parse spectral data from the disk only when it's accessed
- an object can be used as the ‘owner’ of loaded data, if the ‘owner’ is garbage collected, and no other ‘owners’ claimed the scans, the corresponding resources can be automatically released
- Tolerance to broken index
- automatically detects errors in the index, such as all scan offsets are the same (which happens with some versions of ProteoWizard's msconvert when converting large files)
- if the index is not present, will reindex the file
- Tolerance to MS2 scan tags being enclosed in the corresponding MS1 scan tag (old data converted with ReAdW)
- PepXML parsing/writing
- ProtXML parsing/writing
- MzIdentML parsing/writing
Usage
Take a look at this tutorial for a short introduction and check the sources at github.