OFDM modulator. Parallel processing.

Parallel processing is a non-trivial method to speed up our applications. Our computers have often two or more cores today. It is also possible to use some coprocessors like GPU or FPGA. All of this staff allows to run complex systems in the real-time.

In this post I will show you how to simply implement parallel processing in the OFDM modulator. Let’s take a look at the listings below.

void Cworking_Application_Layer::cworking_process_OFDM_modulation( Cworking_Matrix_Data& cworking_input, Cworking_Matrix_Data& cworking_output )
{
  /* For each symbol */
  #pragma omp parallel for
  for ( size_t cworking_symbol = 0; cworking_symbol < cworking_input.cworking_symbols; cworking_symbol++ )
  {
    /* Create temporary symbol vector */
    Cworking_Complex_Vector cworking_symbol_output;

    /* ------------------------ OFDM Modulation ------------------------ */

    /* Calculate IFFT */
    this->cworking_dsp.muged_1D_ifft( cworking_input.cworking_radio_frame[ cworking_symbol ], cworking_symbol_output );

    /* ------------------------ OFDM Modulation ------------------------ */

    /* Store output into matrix */
    for ( size_t cworking_subcarrier = 0; cworking_subcarrier < cworking_symbol_output.length; cworking_subcarrier++ )
    {
      /* Store single sub-carrier */
      cworking_output.cworking_radio_frame[ cworking_symbol ].array[ cworking_subcarrier ] =
      cworking_symbol_output.array[ cworking_subcarrier ];
    }

    /* Clean memory */
    delete [] cworking_symbol_output.array;
  }
}

The most important difference between sequential and parallel code appears in application layer. Here we use OpenMP library, which allows to parallelize loops using “#pragma omp parallel for” command. In this case a radio frame is divided into groups which are processed in parallel by different threads on different cores.

int main()
{
  ...
  /* Declare thread engine configuration */
  Cworking_Thread_Engine_Configuration cworking_thread_conf;
  ...
  /* Create thread engine configuration */
  cworking_infrastructure.cworking_create_thread_configuration( 3, cworking_thread_conf );
  ...
}

OpenMP library requires some initialization. This is done by infrastructure layer. We set number of threads to 3. This means that “for loop” known from application layer will be divided into three parts. Quite simple. Isn’t it?

Unfortunately it is not as simple as it seems. If we compile and run this example we could notice that sequential code is faster than its parallel version. It is because OpenMP hides from us many additional operations like:

  • threads communication and synchronization
  • memory and cache sharing

Furthermore, during the execution of our application many other processes are run, often on the same cores. It means that to get the reasonable speed up, we shouldn’t try to parallelize trivial functions. For an educational purposes try add “sleep(1)” function inside the loop in the application layer. Then you should notice that parallel code is much faster than sequential and it really works :) In the next post I will describe OpenMP library more particularly.

P.S
If you use Eclipse IDE don’t forget to add -fopenmp flag to the linker and compiler miscellaneous options, otherwise #pragma will be ignored and code will be executed always in the main thread!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s