Profiling SPM
The start of the journey of speeding up an application is profiling it to have an idea about which parts of the application consume most of the time and consequently would provide maximum benefits on parallelizing.
SPM being a Matlab based software, the intuitive choice of tool to profile it was the ‘Matlab Profiler’. The ‘Matlab Profiler’ is a wonderful feature which provides plethora of useful profiling information for your Matlab application. It is more like an API consisting of some functions which can be ‘plugged in’ the Matlab code. It records information about execution time, number of calls, parent functions, child functions, code line hit count, and code line execution time. It can also save a comprehensive profiling report into an html file for future reference. For those interested further, this link from the Mathworks Website does a wonderful job of explaining the profiler immaculately.
So I profiled the SPM fMRI workflow with the ‘Matlab Profiler’. The way the profiler works demands us to create a batch file of the entire workflow so that one can ‘plug-in’ profiling chunks of code between calls to the workflow stages. The Batch interface tab provided in SPM can be used to create a Batch file easily for our purposes. The dataset I used for profiling SPM fMRI was the famous/non-famous face repetition example at this link
The results obtained from profiling are as follows:

From the above pie chart it can be seen that Segmentation, Model Estimation and Re-alignment are the most expensive operations. Since Model Estimation is very case specific we neglect it for now and focus our attention on Segmentation and Re-alignment which have a fixed mathematical algorithm and are good candidates to parallelize.
In course of profiling it was also observed that B-spline interpolation consumed about 18% of the entire workflow with contributions to the Segmentation and Re-alignment stages. Also B-spline interpolation is implemented as a MEX-file written in C within SPM and consequently can be re-written in CUDA and can be incorporated into SPM using the ‘nvmex’ utility provided by nVidia. So I decided to start off with implementing B-spline in CUDA towards speeding up the SPM software.
More on this to be coming soon…
