Sun Grid Engine Command-Dump

Here in the institute we have a Sun Grid Engine available. It is a tool to post computing-jobs on other workspaces (we have I think up to 60 available). There are certain commands and things that I do regularly which I tend to forget after half a year, or which might be useful for orthes.

  • Show all jobs that are running on the grid
    qstat -u \* or alternativly for a single user qstat -u behinger

  • exclude a single computer/host from running a job
    qsub -l mem=5G,h=!computername.domain script.sh
    to exclude multiple hosts: h=!h4&!h5 or h=!(h4|h5) Source
    of course mem=5G is an arbitrary other requirement.

  • Run a gridjob on a single R-File
    add #!/usr/bin/Rscript in the beginning of the file, then you can simply run qsub Rscript_name.R. I had problems using qsub Rscript -e "Rscript_name.R" due to the many quotes that would need escaping (I use to call the grid using system() command in matlab/R).

Matlab winsorized mean over all dimension

This is a function I wrote back in 2014. I think it illustrates an advanced functionality in matlab that I hadn’t found written about before.

The problem:

Calculate the winsorized mean of a multidimensional matrix over an arbitrary dimension.

Winsorized Mean

The benefits of the winsorized mean can be seen here:
Unbenannt-3

We replace the top 10% and bottom 10% by the remaining most extreme value before calculating the mean (left panel). The right panel shows how the mean is influenced by a single outlier, but the winsorized mean is not (ignore the “yuen”-box”)

Current Implementation

I adapted an implementation from the LIMO toolbox based on Original Code from Prof. patrick J Bennett, McMaster University. In this code the dimension is fixed at dim = 3, the third dimension.

They solve it in three steps:

  1. sort the matrix along dimension 3
  2. xsort=sort(x,3);

  3. replace the upper and lower 10% by the remaining extreme value
  4. % number of items to winsorize and trim
    g=floor((percent/100)*n);
    wx(:,:,1:g+1)=repmat(xsort(:,:,g+1),[1 1 g+1]);
    wx(:,:,n-g:end)=repmat(xsort(:,:,n-g),[1 1 g+1]);
    

  5. calculate the mean over the sorted matrix
  6. wvarx=var(wx,0,3);

Generalisation

To generalize this to any dimension I have seen two previous solution that feels unsatisfied:
– Implement it for up to X dimension hardcoded and then use a switch-case to get the solution for the case.
– use permute to reorder the array and then go for the first dimension (which can be slow depending on the array)

Let’s solve it for X = 20 x 10 x 5 x 2 over the third dimension


function [x] = winMean(x,dim,percent)
% x = matrix of arbitrary dimension
% dim = dimension to calculate the winsorized mean over
% percent = default 20, how strong to winsorize

% How long is the matrix in our required dimension
n=size(x,dim);
% number of items to winsorize and trim
g=floor((percent/100)*n);
x=sort(x,dim);


up to here it my and the original version are very similar. The hardest part is to generalize the part, where the entries are overwritten without doing it in a loop.
We are now using the subsasgn command and subsref
We need to generate a structure that mimics the syntax of
x(:,:,1:g+1,:) = y
for arbitrary dimensions and we need to construct y

% Prepare Structs
Srep.type = '()';
S.type = '()';

% replace the left hand side
nDim = length(size(x));

beforeColons = num2cell(repmat(':',dim-1,1));
afterColons  = num2cell(repmat(':',nDim-dim,1));
Srep.subs = {beforeColons{:} [g+1]    afterColons{:}};
S.subs    = {beforeColons{:} [1:g+1]  afterColons{:}};
x = subsasgn(x,S,repmat(subsref(x,Srep),[ones(1,dim-1) g+1 ones(1,nDim-dim)])); % general case

The output of Srep is:

Srep =
type: ‘()’
subs: {‘:’ ‘:’ [2] ‘:’ }

thus subsref(x,Srep) outputs what x(:,:,2,:) would output. And then we need to repmat it, to fit the number of elements we replace by the winsorizing method.

This is put into subsasgn, where the S here is :

Srep =
type: ‘()’
subs: {‘:’ ‘:’ [1 2] ‘:’ }

Thus equivalent to x(:,:,[1 2],:).
The evaluated structure then is:
x(:,:,[1:2]) = repmat(x[:,:,1],[1 1 2 1])

The upper percentile is replaced analogous:

% replace the right hand side
Srep.subs = {beforeColons{:} [n-g]            afterColons{:}};
S.subs    = {beforeColons{:} [n-g:size(x,dim)]  afterColons{:}};

x = subsasgn(x,S,repmat(subsref(x,Srep),[ones(1,dim-1) g+1 ones(1,nDim-dim)])); % general case

And in the end we can take the mean, var, nanmean or whatever we need:

x = squeeze(nanmean(x,dim));

That finishes the implementation.

Timing

But how about speed? I thus generated a random matrix of 200 x 10000 x 5 and measured the timing (n=100 runs) of the original limo implementation and mine:

algorithm timing (95% bootstraped CI of mean)
limo_winmean 185 – 188 ms
my_winmean 202 – 203ms
limo_winmean otherDimension than 3    218 – 228 ms

For the last comparison I permuted the array prior to calculating the winsorized mean, thus the overhead. In my experience, the overhead is greater the larger the arrays are (I’m talking about 5-10GB matrices here).

Conclusion

My generalization seems to work fine. As expected it is slower than the hardcoded version. But it is faster than permuting the whole array.

3 of 3
123