/************************************************************************
 *                                                                       *
 *               ROUTINES IN THIS FILE:                                  *
 *                                                                       *
 *                      main(): main calling routine                     *
 *                                                                       *
 ************************************************************************/

/********************************************************************
	Scope of Algorithm


The actual processing is split up into a few pieces:

        1) power spectral analysis
        2) auditory spectrum computation
        3) compression (possibly adaptive)
        4) temporal filtering of each channel trajectory
        5) expansion (also possibly adaptive)
        6) postprocessing (e.g., preemphasis, loudness compression)
        7) autoregressive all-pole modeling (cepstral coefficients)

and i/o can either be ascii, binary (shorts on the input and floats
on the output), or standard ESPS files.

Since the lowest frequency and highest frequency bands extend
into forbidden territory (negative or greater-than-Nyquist
frequencies), they are ignored for most of the analysis. This
is done by computing a variable called first_good (which for
1 bark spacing is 1) and generally ignoring the first and last
``first_good'' channels. Just prior to the all-pole modeling,
the values from the good channels are copied over to the questionable
ones. (Note that ``first_good'' is available to most routines
via a pointer to a general parameter structure that has such
useful things as the number of filters, the analysis stepsize
in msec, the sampling rate, etc. . See rasta.h for
a definition of this structure).


This program (Rasta 2.0) implements the May 1994 version of RASTA-PLP
analysis. It incorporates several primary pieces:

1) PLP Analysis - as described in Hermansky's 1990 JASA paper,
the basic form of spectral analysis is Perceptual Linear Prediction,
or PLP. This computes the cepstral parameters of an all-pole model
of an auditory spectrum, which is a power spectrum that
has been frequency warped to the bark scale, smoothed by
an asymmetric critical-band trapezoid function,
down-sampled into approximately 1 Bark intervals,
cube root compressed for an intensity-loudness transformation, 
and weighted by a fixed equal loudness curve. 

2) RASTA filtering - as described in several Hermansky and Morgan
papers, the basic idea here is to bandpass filter the trajectories
of the spectral parameters. In the case of RASTA-PLP, this filtering
is done on a nonlinear transformation of an auditory-like spectrum, 
prior to the autoregressive modeling in PLP.

3) J-processing - For RASTA, the bandpass filtering is done in the 
log domain. An alternative is to use the J-family
of log-like curves

	y = log(1 + Jx)

where J is a constant that can appears to be optimally set when
it is inversely proportional to the noise power, (currently typically
1/3 of the inverse noise), x is the
critical band value, and y is the non-linearly transformed critical 
band value.
Rather than do the true inverse, which would be

	x = (exp(y) - 1)/J

and could get negative values, we use

	x' = (exp(y))/J

This prevents negative values, and in doing so effectively adds noise
floor by adding 1/J to the true inverse. 

One way of doing J-processing is to pick one constant J value and enter
this value at the command line. This J value should be dependent on the
SNR of the speech. We also may want to estimate the noise level for 
adaptive settings of the J-parameter during the utterance. 
Both methods of picking J should be handled with care. For the first
case, see the README file for a discussion of the perils of using even
a default J if it is too far from what you really need;
if the application situation is relatively fixed, you are better off making
an off-line noise measurement to get a good J value; in any event some
experimentation will soon show the proper constants involved for a problem.
For the second case, noise level is estimated for adaptive settings of the
J-parameter during the utterance. This should be done with care as well, 
as the use of a time-varying J brings in a new complication that you must
consider in the training and recognition, since changing J's over a time
series introduces a new source of variability in the analysis that must
be accounted for. The different J values, as required by differing noise
conditions, generates different spectral shapes and dynamics of the spectra.
This means that the training system must contend with a new source of 
variability due to the change in processing strategy that is adaptively
determined from the data. One approach to handle this variability is by
doing spectral mapping. In the current version, Spectral mapping is 
performed whenever J-Rasta processing is used with adaptive Js.

Spectral mapping - transform the spectrum processed with a J-value  
                   corresponding to noisy speech to a spectrum processed
                   with a J value for clean speech. In other words, we
                   find a mapping between log(1+xJ) and log(1+xJref) 
                   where Jref is the reference J, i.e. J value for clean speech.
                   For this approach, we have used a multiple regression 
                   within each critical band. In principle, this solution
                   reduces the variability due to the choice of J, and so
                   minimizes the effect on the training process.
       

How this works is:

1) Training of the recognizer:
   -- Train the recognizer with clean speech processed with J = 1.0e-6, a 
      suitable J value for clean speech.

2) Finding the relationship between spectrum corresponding to different Jah 
   values to the spectrum corresponding to J = 1.0e-6
   -- For each of the Jahs in the set {3.16e-7, 1.0e-7, 3.16e-8, 1.0e-8, 
      3.16e-9, 1.0e-9}, find a mapping relationship of the corresponding 
      bandpass filtered spectrum to the spectrum corresponding to J =
      1.0e-6. In other words, find a set of mapping coefficients for each
      Jah to 1.0e-6. The mapping method will be discussed later.

3) Extracting the speech features for the testing speech data
   -- obtain the critical band values as in PLP
   -- estimate the noise energy and thus the Jah value. Call this Jah value 
      J(orig).
   -- Pick a Jah from the set {3.16e-7, 1.0e-7, 3.16e-8, 1.0e-8,3.16e-9, 1.0e-9}
      that is closest to J(orig) and call this J(quant).
   -- perform the non-linear transformation of the spectrum using 
      log (1+J(quant)* X).
   -- bandpass filter the transformed critical band values.
   -- use the set of mapping coefficients for J(quant) to do the spectral 
      mapping or spectral transformation.
   -- preemphasize via the equal-loudness curve and then perform amplitude
      compression using the intensity-loudness power law as in PLP
   -- take the inverse of the non-linear function. 
   -- compute the cepstral parameters for the AR model.   
    
    
 
How regression coefficients are computed in our experiment:

    In order to map critical band values(after bandpass filtering) processed
    with different J values to those processed with J = 1.0e-6, J-Rasta 
    filtered critical band outputs from 10 speakers(excluded from the training
    and testing sets) are used to train 
    multiple regression models.  
    For example, for mapping from J= J(quant) to J = 1.0e-6, the regression
    equation can be written as:

    Yi = B2i* X2 + B3i* X3 + ... + B16i * X16 + B17i       (**)

where Yi = i th bandpass filtered critical band processed with J=1.0e-6
           i = 2, .. 16
      X2, X3, ... X16    2rd to 16th bandpass filtered critical band values
                         processed with J = J(quant), where
                         J(quant) is in the set 
                         {3.16e-7, 1.0e-7, 3.16e-8, 1.0e-8,3.16e-9, 1.0e-9} 
      B2i, B3i ... B17i     are the 16  mapping coefficients 

      
    For equation (**), we have made the assumption that the sampling frequency
    is 8kHz and the number of critical bands is seventeen. The first and 
    the last bands extend into forbidden territory -- negative or greater
    than Nyquist frequencies. Thus the the two bands are ignored for most
    of the analysis. Their values are made equal to the adjacent band's just
    before the autoregressive all-pole modeling. This is why we only make
    Yi dependent on bandpass filtered critical bands X2,X3,... X16, altogether
    fifteen critical bands. 

    The default mapping coefficients sets is stored in map_weights.dat. 
    This is suitable for s.f. 8kHz, 17 critical bands. For users who have 
    a different setup, they may want to find their own mapping coefficients
    set. This could be done by using the command options -R and -f. Command 
    -R allows you to get bandpass filtered critical band values as output
    instead of cepstral coefficients. 
    These outputs could be used as regression data. A simple multiple
    regression routine can be used to generate the mapping coefficients
    from these regression data. These mapping coefficients can be stored
    in a file. Command -f allows you to use this file to replace the
    default file map_weights.dat. The format of this file is:

     beginning of file 
    <Total number of Jahs, for the example shown above, it is [7] > 
    <# of critical bands, for the setup for 8kHz, this is [15]>
    <# of mapping coefficients/band, for the setup for 8kHz, this is [16]>
    
    <The J for clean speech, [1.0e-6]>
   
    <mapping coefficients for Y2, [B22, B32, B42,...]>
    <mapping coefficients for Y3, [B23, B33, B43 ...]>
        |
        |
        |
        V
    <mapping coefficients for Y16>
 
    
    <The second largest Jah besides 1e-6, [3.16e-7]>
        
        |
        |      mapping coefficients
        |
        |
        V 
  
    <The third largest Jah besides 1e-6, [1.0e-7]>
       
        |
        |      mapping coefficients
        |
        |
        V 
  
     .
     .
     .
     .    
     end of file



*********************************************************************/
#include <stdio.h>
#include "others/mymath.h"
#include "rasta.h"
#include "functions.h"
/******************************************************/

void param_init(struct param* pptr)
{
  pptr->winsize  = TYP_WIN_SIZE; 
  pptr->stepsize = TYP_STEP_SIZE; 
  pptr->sampfreq = PHONE_SAMP_FREQ;
  pptr->polepos = POLE_DEFAULT;
  pptr->order = TYP_MODEL_ORDER;
  pptr->lift = TYP_ENHANCE;
  pptr->winco = HAMMING_COEF;
  pptr->rfrac =  ONE;
  pptr->jah = JAH_DEFAULT;
  pptr->gainflag = TRUE;
  pptr->lrasta = FALSE;
  pptr->jrasta = FALSE;
  pptr->cJah = FALSE;
  pptr->mapcoef_fname = "map_weights.dat";
  pptr->crbout = FALSE;
  pptr->comcrbout = FALSE;
  pptr->infname = "-";          /* used for stdin */
  pptr->outfname = "-";         /* used for stdout */
  pptr->num_fname = NULL;       /* file with RASTA polynomial numer\n") */
  pptr->denom_fname = NULL;     /* file with RASTA polynomial denom\n") */
  pptr->ascin = FALSE;
  pptr->ascout = FALSE;
  pptr->debug = FALSE;
  pptr->smallmask = FALSE;
  pptr->espsin = FALSE;
  pptr->espsout = FALSE;
  pptr->matin = FALSE;
  pptr->matout = FALSE;
  pptr->swapbytes = FALSE;
  pptr->nfilts = NOT_SET;
  pptr->nout = NOT_SET;
  pptr->online = FALSE;         /* If set, do frame-by-frame analysis
                                   rather than reading in whole file first */
  pptr->HPfilter = FALSE;
  pptr->history = FALSE;
  pptr->hist_fname = "history.out";

}

void init_param(struct fvec *sptr, struct param *pptr) 
{
  int overlap, usable_length;
  float tmp;
  float step_barks;
  char *funcname;

  funcname = "init_param";

  pptr->winpts = (int)((double)pptr->sampfreq * (double)pptr->winsize
                       / 1000.);

  pptr->steppts = (int)((double)pptr->sampfreq * (double)pptr->stepsize
                        / 1000.);

  overlap = pptr->winpts - pptr->steppts;

  if(pptr->online == TRUE)
    {
      pptr->nframes = 1;        /* Always looking at one frame,
                                   in infinite loop. */
    }
  else
    {
      usable_length = sptr->length - overlap;
      pptr->nframes = (double)usable_length / (double)pptr->steppts;
    }

  /* Here is some magical stuff to get the Nyquist frequency in barks */
  tmp = pptr->sampfreq / 1200.;
  pptr->nyqbar = 6. * log(((double)pptr->sampfreq /1200.) 
                          + sqrt(tmp * tmp + 1.));

  /* compute number of filters for at most 1 Bark spacing;
     This includes the constant 1 since we have a filter at d.c and
     a filter at the Nyquist (used only for dft purposes) */

  if(pptr->nfilts == NOT_SET)
    {
      pptr->nfilts = ceil(pptr->nyqbar) + 1;
    }
  if((pptr->nfilts < MINFILTS) || (pptr->nfilts > MAXFILTS))
    {
      fprintf(stderr,"Nfilts value of %d not OK\n",
              pptr->nfilts);
      exit(-1);
    }

  /* compute filter step in Barks */
  step_barks = pptr->nyqbar / (float)(pptr->nfilts - 1);
  /* for a given step, must ignore the first and last few filters */
  pptr->first_good = (int)(1.0 / step_barks + 0.5);


  if(pptr->nout == NOT_SET)
    {
      pptr->nout = pptr->order + 1;
    }
  if((pptr->nout < MIN_NFEATS) || (pptr->nout > MAX_NFEATS))
    {
      fprintf(stderr,"Feature vector length of %d not OK\n",
              pptr->nout);
      exit(-1);
    }
}



/* Check numerical parameters to see if in a reasonable range, and the logical
   sense of combinations of flags. For the numerical comparisons,
   see the constant definitions in rasta.h . */
void
check_args( struct param *pptr )
{
        
  char *funcname;
  funcname = "check_args";

#ifndef IO_ESPS
  if(pptr->espsin == TRUE || pptr->espsout == TRUE)
    {
      fprintf(stderr,"Compiled without IO_ESPS flag (no ESPS licence) -> no ESPS file I/O available");
      fprintf(stderr,"\n");
      exit(-1);
    }
#endif
#ifndef IO_MAT
  if(pptr->matin == TRUE || pptr->matout == TRUE)
    {
      fprintf(stderr,"Compiled without IO_MAT flag (no MATLAB licence) -> no MAT file I/O available");
      fprintf(stderr,"\n");
      exit(-1);
    }
#endif
  if((pptr->winsize < MIN_WINSIZE ) || (pptr->winsize > MAX_WINSIZE ))
    {
      fprintf(stderr,"Window size of %f msec not OK\n",
              pptr->winsize);
      exit(-1);
    }
  if((pptr->stepsize < MIN_STEPSIZE )||(pptr->stepsize > MAX_STEPSIZE ))
    {
      fprintf(stderr,"Step size of %f msec not OK\n",
              pptr->stepsize);
      exit(-1);
    }
  if((pptr->sampfreq < MIN_SAMPFREQ ) || (pptr->sampfreq > MAX_SAMPFREQ ))
    {
      fprintf(stderr,"Sampling frequency of %d not OK\n",
              pptr->sampfreq);
      exit(-1);
    }
  if((pptr->polepos < MIN_POLEPOS ) || (pptr->polepos >= MAX_POLEPOS ))
    {
      fprintf(stderr,"Pole position of %f not OK\n",
              pptr->polepos);
      exit(-1);
    }
  if((pptr->order < MIN_ORDER ) || (pptr->order > MAX_ORDER ))
    {
      fprintf(stderr,"LPC model order of %d not OK\n",
              pptr->order);
      exit(-1);
    }
  if((pptr->lift < MIN_LIFT ) || (pptr->lift > MAX_LIFT ))
    {
      fprintf(stderr,"Cepstral exponent of %f not OK\n",
              pptr->lift);
      exit(-1);
    }
  if((pptr->winco < MIN_WINCO ) || (pptr->winco > MAX_WINCO ))
    {
      fprintf(stderr,"Window coefficient of %f not OK\n",
              pptr->winco);
      exit(-1);
    }
  if((pptr->rfrac < MIN_RFRAC ) || (pptr->rfrac > MAX_RFRAC ))
    {
      fprintf(stderr,"Rasta fraction of %f not OK\n",
              pptr->rfrac);
      exit(-1);
    }
  if((pptr->jah < MIN_JAH ) || (pptr->jah > MAX_JAH ))
    {
      fprintf(stderr,"Jah value of %e not OK\n",
              pptr->jah);
      exit(-1);
    }
  if((pptr->lrasta ==FALSE) && (pptr->jrasta == FALSE))
    {
      if(pptr->rfrac != 1.0)
        {
          fprintf(stderr,"Can't mix if no rasta flag\n");
          exit(-1);
        }
    }
  if((pptr->lrasta == TRUE) && (pptr->jrasta == TRUE))
    {
      fprintf(stderr,"Can't do log rasta and jah rasta at the same time\n");
      exit(-1);
    }
  if(pptr->online == TRUE)
    {
      if(pptr->espsin==TRUE)
        {
          fprintf(stderr,"can't run on-line on esps input\n");
          exit(-1);
        }
      if(pptr->matin==TRUE)
        {
          fprintf(stderr,"can't run on-line on MAT input\n");
          exit(-1);
        }
      if(pptr->ascin==TRUE)
        {
          fprintf(stderr,"can't run on-line on ascii input\n");
          exit(-1);
        }
      if(strcmp (pptr->infname, "-") != 0)
        {
          fprintf(stderr,"on-line mode uses stdin only\n");
          exit(-1);
        }
    }
  if((pptr->espsin == TRUE && pptr->matin == TRUE) ||
     (pptr->espsin == TRUE && pptr->ascin == TRUE) ||
     (pptr->ascin == TRUE && pptr->matin == TRUE))
    {
      fprintf(stderr,"can't read different input formats simultaneously\n");
      exit(-1);
    }
  if((pptr->espsout == TRUE && pptr->matout == TRUE) ||
     (pptr->espsout == TRUE && pptr->ascout == TRUE) ||
     (pptr->ascout == TRUE && pptr->matout == TRUE))
    {
      fprintf(stderr,"can't write different output formats simultaneously\n");
      exit(-1);
    }
  if((pptr->swapbytes == TRUE) && (sizeof(short) != 2) &&
     (sizeof(short) != 4))
    {
      fprintf(stderr,"Shorts are %ld bytes.\n", sizeof(short));
      fprintf(stderr,"Byte-swapping function in rasta.h will\n");
      fprintf(stderr,"not work!\n");
      exit(-1);
    }
}

/* Print out ascii for float vector with specified width (n columns) */
void print_vec(const struct param *pptr, FILE *fp, struct fvec *fptr, int width)
{
  int i;
  char *funcname;
  int lastfilt;

  funcname = "print_vec";
  if ((pptr->crbout == FALSE) && (pptr->comcrbout == FALSE))
    {
        
      for(i=0; i<fptr->length; i++)
        {
          fprintf(fp, "%g ", fptr->values[i]);
          if((i+1)%width == 0)
            {
              fprintf(fp, "\n");
            }
        }
    }
  else
    {
      lastfilt = pptr->nfilts - pptr->first_good;
      for (i= pptr->first_good; i<lastfilt; i++)
        {
          fprintf(fp, "%g ", fptr->values[i]);
        }
      fprintf(fp, "\n");
    }

}


void
do_rasta (struct fvec *s, struct fvec **p, int* nframes, int rate, int rasta_f)
{
	/* function prototypes for local calls */
	void init_param(struct fvec *, struct param *);
	struct fvec *get_data(struct param *), 
	    *rastaplp(struct fhistory *, struct param *, struct fvec *),
	    *fill_frame(struct fvec *, struct param *, int),
	    *get_online_bindata(struct fhistory *, struct param * );

	/*  	Variables and data structures	*/
	struct param runparam;
	struct fhistory history;
	struct fvec *speech, *frame;
	int nframe;
	char *funcname;

	speech = s;
	param_init(&runparam);
	runparam.sampfreq = rate;
	runparam.lrasta = rasta_f? TRUE : FALSE;

	init_param(speech, &runparam); /* Compute necessary parameters 
					  for analysis */
	
	check_args( &runparam ); /* Exits if params out of range */

/* 	main analysis loop */

	*nframes = runparam.nframes;
	for(nframe = 0; nframe < runparam.nframes; )
	{
		frame = fill_frame( speech, &runparam, nframe);
		nframe++;
		if (nframe>=1000) { fprintf(stderr,"Too much frames (>1000)!"); exit(-1); }
		p[nframe-1] = alloc_fvec (runparam.nout);
		fvec_copy("do_rasta",rastaplp(&history,&runparam,frame),p[nframe-1]);
	}
	
        return;
}
