Ana Sayfa
Ana Sayfa » Çalışmalar »
?

 PTMPredictor
: 13 Şubat 2008 Çarşamba
: Aktif
: 29 Nisan 2008 Salı
: Dr. Türker İnce
: Dr. Jens Allmer
: Engin Kırmacı 2nd year software engineering student
: İzmir University of Economics
: Post translational modifications (PTMs) are amendments to proteins after they have been produced through transcription and translation. Virtually every functional protein in a cell has usually several PTMs which significantly alters the function of the protein.
Mass spectrometry can determine proteins from complex mixtures, but may fail to detect a sequence if it contains a PTM. For the determination of protein function it is vital to not only find all sequences in a given biological experiment, but also determine the type and location of the PTMs involved.
Currently, this process is in its infancy and at best usable on extremely small datasets. We would like to propose a new approach in finding and mapping PTMs to amino acid sequences, represented as mass spectra.
The method we would like to employ initiates with determining all possible sequences that could explain the amino acid sequence in question. This is being done, by first mapping all those mass spectra to protein sequences that are not post translationally modified. The pool of protein sequences we find in the sample provides the basis for investigation into PTMs. We assume, that many of the spectra which were not identified in the first round are derived from sequences which contain PTMs. Our algorithm then proceeds to take sub-sequences from the generated pool and maps them to their respective position on the recorded mass spectra. From these positioning we can then mend shifts in the mapping which may be explained by PTMs. The size of the gap can then be mapped to all known PTMs and a potential PTM can be reported with its position in the sequence.
This mapping involves large amount of cross correlations that have to be performed to many mass spectra and theoretical mass spectra, derived from sequences. Cross correlations are however time consuming, and here we have to convolute the complete spectra each time with subsets of the theoretical spectrum in order to achieve mapping.
Other parts of the system that are time critical are the general mapping (again cross correlation / FFT) that is done to determine the sequence pool. Furthermore, it is necessary to construct possible sequences with their positioned PTMs from the mapping achieved after all theoretical spectra have been matched to the measured spectra.
All these calculations would make it infeasible to investigate a sufficient amount of spectra on the computers available to us. Therefore we aim to implement the complete algorithm outlined above in form of an FPGA and later as a chip.
This would enable thorough investigation of protein function, which is at the heart of all biological and medical research.
: HW-SPAR3E-SK
ISE Foundation
Embedded Development Kit
: Xilinx HW-SPAR3E-SK,
ISE Foundation yazılımı,
Embedded Development Kit (EDK) yazılımı,
System Generator for DSP yazılımı,
ChipScope Pro yazılımı
: jens.allmer@ieu.edu.tr
: [ Gizli ]
: Jens Allmer Izmir University of Economics, Faculty of Computer Science, Sakarya Caddesi 156, Izmir, Turkey
: 232 488 8533

Stage 1 Progress Report

Progress Report
The project PTMPredictor consists of several modules as outlined in the abstract. From the perspective of the VHDL code, the first stage has been reached.
At this stage we wanted to implement a module which given an experimental mass spectrum and a short amino acid sequence (peptide) can calculate the likelihood of that particular sequence haven given rise to the mass spectrum.
This was done by accepting the mass spectrum as a signal which terminates with 0 (opcode) and accepting the peptide that shall be compared as another signal which again terminates with 0 (number). Additional information, which will be necessary in the following stages of the project, is provided as signals as well (chargein, precursormass).
The experimental mass spectrum can easily be converted into an integer array. Calculating a theoretical mass spectrum from the peptide sequence is not as trivial however. We chose to use b- and y-ion types for this first, crude assessment which will be extended to include c- , z-ions and some neutral losses later. The theoretical mass spectrum is converted into an integer array of the same dimensions as the experimental mass spectrum.
To calculate the score between theoretical and experimental spectrum, cosine similarity is used at this stage since the assessment acts as a filter and may therefore be somewhat permissive, but should be extremely fast. Finally, the score is provided as a signal (similarity).
At this stage of the project the mass spectra are compared to a large set of peptide sequences. Those that pass the filter are retained. Each of the peptides comes with a network of connected peptides which are selected subsequently.
These steps are performed on a PC since we are unsure of the space requirements. If space allows, this information will be stored on the FPGA as well in an attempt to increase performance. The steps of stage one have thus been performed. The stages of stage two are going to be implemented next. The following list may serve as an outline for the project:

First stage
1. Record tandem mass spectra
2. Send mass spectra and possible peptide sequences to FPGA
a. Assess similarity
b. Return score
3. Discard low scoring sequences
4. Extend high scoring sequences with their neighborhood sequences
Second stage
5. Send MS/MS spectra with low scores and newly identified sequences (from the network) to FPGA
a. Assess similarity iteratively
b. Predict post translational modifications
c. Report similarity, modification, and its location
6. Collect and visualize the results on a PC

Current Problems
A number of problems have been identified.
1. Unfortunately, we were not able to install the software which came with the package. Either the keys were wrong, or we were not able to match the key to the appropriate product, or the installation was faulty. We therefore reverted to use the free versions of necessary software.
2. Writing VHDL code and emulating it was successful. The next step, however, fails at the synthesis level, which was quit unforeseeable for us and led to large delays in development. These issues couldn’t be resolved as of now.

VHDL Code

 


BİLGİ: Projenize ait yeni bir çalışma göndermek için, yukarıdaki "Çalışma" butonuna tıklayınız ve ekrana gelen formu doldurunuz. Formu kaydettikten sonra, çalışmanızla ilgigi dosayları da yükleyebilirsiniz.

Bu proje TÜBİDER Okul Bilişim liderliğinde gerçekleştirilmektedir. / Web Tasarım: Hüseyin YİĞİT
Ana Sayfa | Katılım Koşulları | Sık Sorulan Sorular (FAQ) | Başvuru | İletişim | Haberler & Duyurular