原创 FSL接口可重构模块

2011-7-12 21:09 1400 3 3 分类: EDA/ IP/ 设计与制造

背景:

我们的目的是:把Benchmark里的不同测试用例以IP核的形式集成到“面向服务的异构多核平台”上,并要求这些IP核都是可以部分重构的。

结构框图:

结构框图主要由三个功能模块构成:
1)MicroBlaze 主要功能是发送计算任务和接收计算结果
2)mycore 实现计算功能。该模块是可重构的
3) MyFSL 实现MicroBlaze和计算模块的数据交换。该模块是静态模块,即对于不同的IP核,该模块是完全相同的

Fig1. 结构框图

MyFSL模块

Fig2. MyFSL模块的状态机

MyFSL的状态转换图如上图所示

Idle:一旦FSL_Rst信号有效,则进入该状态。若FSL_S_Exists信号有效(FSL总线上有输入数据),则进入Read_Inputs状态

Read_Inputs:连续从FSL总线读入8组32位数据,保存在输入缓冲区。之后自动转入Send_Inputs状态,并使能begin_of_send信号,通知mycore模块接收输入数据

Send_Inputs:连续向mycore模块输出8组32位数据,之后自动转入Receive_Outputs状态

Receive_Outputs:等待接收mycore模块的计算结果,等待的时间由mycore模块决定。mycore计算结束后,使能end_of_computation信号,MyFSL模块检测到该信号之后,通过result端口连续接收8组32位信号,并保存在输出缓冲。之后转入Delay状态

Delay:该状态的时间是可配置的,供实验、调试时使用。Delay状态结束后,转入Write_Outputs状态

Write_Outputs:连续向FSL总线输出8组32位计算结果,供MicroBlaze使用。之后转入Idle状态,表示正常完成一次计算任务

mycore模块

mycore的具体实现方式不限,只要和MyFSL的时序相符即可。下面是mycore模块的一个例子:

image

Idle:一旦FSL_Rst信号有效,则进入该状态。一旦检测到begin_of_send信号,则转入Receive_Inputs状态

Receive_Inputs:连续接收8组32位信号,并开始计算。之后转入Computation状态

Computation:由于某些计算任务无法在一个时钟周期内完成,因此设置Computation状态,用于等待计算过程的完成。具体等待的时间需要视实际情况进行配置。之后转入Output_Results状态

Output_Results:连续向result端口输出8组32位数据。之后转入Idle状态,表示正常完成一次硬件计算过程

参考代码:

MyFSL.v

module MyFSL
    (
        FSL_Clk,
        FSL_Rst,
        FSL_S_Clk,
        FSL_S_Read,
        FSL_S_Data,
        FSL_S_Control,
        FSL_S_Exists,
        FSL_M_Clk,
        FSL_M_Write,
        FSL_M_Data,
        FSL_M_Control,
        FSL_M_Full
    );

input                                     FSL_Clk;
input                                     FSL_Rst;
output                                    FSL_S_Clk;
output                                    FSL_S_Read;
input      [0 : 31]                       FSL_S_Data;
input                                     FSL_S_Control;
input                                     FSL_S_Exists;
output                                    FSL_M_Clk;
output                                    FSL_M_Write;
output     [0 : 31]                       FSL_M_Data;
output                                    FSL_M_Control;
input                                     FSL_M_Full;

   // Total number of input data.
   localparam NUMBER_OF_INPUT_WORDS  = 8;

   // Total number of output data
   localparam NUMBER_OF_OUTPUT_WORDS = 8;
    localparam NUMBER_OF_DELAY = 10;

   // Define the states of state machine
   localparam Idle  = 3'b000;
   localparam Read_Inputs = 3'b001;
   localparam Send_Inputs = 3'b010;
       localparam Receive_Outputs = 3'b011;
       localparam Delay = 3'b100;
   localparam Write_Outputs  = 3'b101;

   reg [0:2] state;
   reg [0:31] input_data;
   wire [0:31] to_input_data, result;

   // Accumulator to hold sum of inputs read at any point in time
   reg [0:31] o_data;

   // Counters to store the number inputs read & outputs written
   reg [0:NUMBER_OF_INPUT_WORDS - 1] nr_of_reads;
   reg [0:NUMBER_OF_OUTPUT_WORDS - 1] nr_of_receives;
   reg [0:NUMBER_OF_OUTPUT_WORDS - 1] nr_of_sends;
   reg [0:NUMBER_OF_OUTPUT_WORDS - 1] nr_of_writes;
    reg [0:31] nr_of_delay;
    // I/O Buffers
    reg [0:31] o_buf [0:7];
    reg [0:31] i_buf [0:7];
    //Control Signals
    reg begin_of_send;   
    wire end_of_computation;

   assign FSL_S_Read  = (state == Read_Inputs) ? FSL_S_Exists : 0;
   assign FSL_M_Write = (state == Write_Outputs) ? ~FSL_M_Full : 0;

   assign FSL_M_Data = o_data;

   always @(posedge FSL_Clk)
   begin  // process The_SW_accelerator
      if (FSL_Rst)               // Synchronous reset (active high)
        begin
           // CAUTION: make sure your reset polarity is consistent with the
           // system reset polarity
           state        <= Idle;
           nr_of_reads  <= 0;
           nr_of_writes <= 0;
           o_data        <= 0;
           begin_of_send <= 0;
        end
      else
        case (state)
          Idle:
            if (FSL_S_Exists == 1)
            begin
                    state <= Read_Inputs;
                    nr_of_reads <= NUMBER_OF_INPUT_WORDS;
                    o_data <= 0;
            end
          Read_Inputs:     //3'b001
          begin
            if (FSL_S_Exists == 1)
            begin
              case(nr_of_reads)
                8: i_buf[0] <= FSL_S_Data;
                7: i_buf[1] <= FSL_S_Data;
                6: i_buf[2] <= FSL_S_Data;
                5: i_buf[3] <= FSL_S_Data;
                4: i_buf[4] <= FSL_S_Data;
                3: i_buf[5] <= FSL_S_Data;
                2: i_buf[6] <= FSL_S_Data;
                1: i_buf[7] <= FSL_S_Data;
              endcase
              if (nr_of_reads != 0)
                nr_of_reads <= nr_of_reads - 1;
            end

            if(nr_of_reads == 0)
              begin
                state <= Send_Inputs;
                nr_of_sends <= NUMBER_OF_INPUT_WORDS;
                begin_of_send <= 1;
              end
          end
          Send_Inputs:    //3'b010
            begin
              case(nr_of_sends)
                8: input_data <= i_buf[0];
                7: input_data <= i_buf[1];
                6: input_data <= i_buf[2];
                5: input_data <= i_buf[3];
                4: input_data <= i_buf[4];
                3: input_data <= i_buf[5];
                2: input_data <= i_buf[6];
                1: input_data <= i_buf[7];
              endcase
              if(nr_of_sends == 0)
                begin
                    state <= Receive_Outputs;
                    nr_of_receives <= NUMBER_OF_OUTPUT_WORDS;
                    begin_of_send <= 0;
                end
              else
                nr_of_sends <= nr_of_sends -1;
            end
            Receive_Outputs:    //3'b011
                if(end_of_computation == 1)
                 begin
                    if(nr_of_receives == 0)
                        begin
                            state <= Delay;
                            nr_of_delay <= NUMBER_OF_DELAY - 1;
                        end
                    else
                        begin
                            nr_of_receives <= nr_of_receives - 1;
                        end
                     case (nr_of_receives)
                        8: o_buf[0] <= result;
                        7: o_buf[1] <= result;
                        6: o_buf[2] <= result;
                        5: o_buf[3] <= result;
                        4: o_buf[4] <= result;
                        3: o_buf[5] <= result;
                        2: o_buf[6] <= result;
                        1: o_buf[7] <= result;
                     endcase
                 end

            Delay:
             begin
                if(nr_of_delay == 0)
                begin
                    state <= Write_Outputs;
                    nr_of_writes <= NUMBER_OF_OUTPUT_WORDS;
               end
                else
                    nr_of_delay <= nr_of_delay - 1;
             end
          Write_Outputs:
           begin
            if (FSL_M_Full == 0)
             case(nr_of_writes)
                8: o_data <= o_buf[0];
                7: o_data <= o_buf[1];
                6: o_data <= o_buf[2];
                5: o_data <= o_buf[3];
                4: o_data <= o_buf[4];
                3: o_data <= o_buf[5];
                2: o_data <= o_buf[6];
                1: o_data <= o_buf[7];
             endcase
                if (nr_of_writes == 0)
                    state <= Idle;               
                else
                    nr_of_writes <= nr_of_writes - 1;
           end
        endcase
   end
   assign to_input_data = input_data;
   mycore c1(
        //Global Signals
           FSL_Clk,
        FSL_Rst,
        //user added ports
        to_input_data,
        result,
        begin_of_send,
        end_of_computation
        );

endmodule

 

mycore.v

module mycore(
        FSL_Clk,
        FSL_Rst,
        //user added ports
        input_data,
        result,
        begin_of_send,
        end_of_computation
        );
input FSL_Clk;
input FSL_Rst;
input [0:31] input_data;    //input data
output [0:31] result;        //input flag
input begin_of_send;
output end_of_computation;    //output flag

localparam delay_of_computation = 63;
localparam NUMBER_OF_INPUT_WORDS  = 8;
localparam NUMBER_OF_OUTPUT_WORDS  = 8;

localparam Idle  = 2'b00;
localparam Receive_Inputs =2'b01;
localparam Computation = 2'b10;
localparam Output_Results = 2'b11;

reg [0:1] state;
reg [0:NUMBER_OF_INPUT_WORDS - 1] nr_of_reads, nr_of_trans;
reg [0:31] counter;
//reg begin_of_computation;
reg   end_of_computation;
reg [0:31] temp_in [0:7], temp_out[0:7];

reg [0:31] result;

reg [0:31] C [0:63];
wire [0:31] out_data [0:7];

   initial
   begin

         C[0] =11585;C[1] =16069; C[2] =15137; C[3] =13623; C[4] =11585; C[5] =9102;  C[6] =6270;  C[7] =3196;
         C[8] =11585;C[9] =13623; C[10]=6270;  C[11]=-3196; C[12]=-11585;C[13]=-16069;C[14]=-15137;C[15]=-9102;
         C[16]=11585;C[17]=9102;  C[18]=-6270; C[19]=-16069;C[20]=-11585;C[21]=3196;  C[22]=15137; C[23]=13623;
         C[24]=11585;C[25]=3196;  C[26]=-15137;C[27]=-9102; C[28]=11585; C[29]=13623; C[30]=-6270; C[31]=-16069;
         C[32]=11585;C[33]=-3196; C[34]=-15137;C[35]=9102;  C[36]=11585; C[37]=-13623;C[38]=-6270; C[39]=16069;
         C[40]=11585;C[41]=-9102; C[42]=-6270; C[43]=16069; C[44]=-11585;C[45]=-3196; C[46]=15137; C[47]=-13623;
         C[48]=11585;C[49]=-13623;C[50]=6270;  C[51]=3196;  C[52]=-11585;C[53]=16069; C[54]=-15137;C[55]=9102;
         C[56]=11585;C[57]=-16069;C[58]=15137; C[59]=-13623;C[60]=11585; C[61]=-9102; C[62]=6270;  C[63]=-3196;
   end
always @(posedge FSL_Clk)
  begin
    if (FSL_Rst)
      begin
        state <= Idle;
        //begin_of_computation <= 0;
        end_of_computation <= 0;
      end
    else
        case(state)
          Idle:
            if (begin_of_send == 1)
              begin
                state <= Receive_Inputs;
                result <= 0;
                nr_of_reads <= NUMBER_OF_INPUT_WORDS;
                //begin_of_computation <= 0;
                //end_of_computation <= 0;
              end
          Receive_Inputs:
            //if (begin_of_send == 1)
              begin
                case(nr_of_reads)
                    8:temp_in[0] <= input_data;
                    7:temp_in[1] <= input_data;
                    6:temp_in[2] <= input_data;
                    5:temp_in[3] <= input_data;
                    4:temp_in[4] <= input_data;
                    3:temp_in[5] <= input_data;
                    2:temp_in[6] <= input_data;
                    1:temp_in[7] <= input_data;
                endcase       
                if (nr_of_reads == 0)
                  begin
                    state        <= Computation;
                    //begin_of_computation <= 1;
                    end_of_computation <= 0;
                    counter <= delay_of_computation;

    temp_out[0] <= C[0]+temp_in[0] + C[1]+temp_in[1] + C[2]+temp_in[2] + C[3]+temp_in[3] +
                C[4]+temp_in[4] + C[5]+temp_in[5] + C[6]+temp_in[6] + C[7]+temp_in[7] ;
    temp_out[1] <= C[8]+temp_in[0] + C[9]+temp_in[1] + C[10]+temp_in[2] + C[11]+temp_in[3] +
                C[12]+temp_in[4] + C[13]+temp_in[5] + C[14]+temp_in[6] + C[15]+temp_in[7] ;
    temp_out[2] <= C[16]+temp_in[0] + C[17]+temp_in[1] + C[18]+temp_in[2] + C[19]+temp_in[3] +
                C[20]+temp_in[4] + C[21]+temp_in[5] + C[22]+temp_in[6] + C[23]+temp_in[7] ;
    temp_out[3] <= C[24]+temp_in[0] + C[25]+temp_in[1] + C[26]+temp_in[2] + C[27]+temp_in[3] +
                C[28]+temp_in[4] + C[29]+temp_in[5] + C[30]+temp_in[6] + C[31]+temp_in[7] ;
    temp_out[4] <= C[32]+temp_in[0] + C[33]+temp_in[1] + C[34]+temp_in[2] + C[35]+temp_in[3] +
                C[36]+temp_in[4] + C[37]+temp_in[5] + C[38]+temp_in[6] + C[39]+temp_in[7] ;
    temp_out[5] <= C[40]+temp_in[0] + C[41]+temp_in[1] + C[42]+temp_in[2] + C[43]+temp_in[3] +
                C[44]+temp_in[4] + C[45]+temp_in[5] + C[46]+temp_in[6] + C[47]+temp_in[7] ;
    temp_out[6] <= C[48]+temp_in[0] + C[49]+temp_in[1] + C[50]+temp_in[2] + C[51]+temp_in[3] +
                C[52]+temp_in[4] + C[53]+temp_in[5] + C[54]+temp_in[6] + C[55]+temp_in[7] ;
    temp_out[7] <= C[56]+temp_in[0] + C[57]+temp_in[1] + C[58]+temp_in[2] + C[59]+temp_in[3] +
                C[60]+temp_in[4] + C[61]+temp_in[5] + C[62]+temp_in[6] + C[63]+temp_in[7] ;

                  end
                else
                    nr_of_reads <= nr_of_reads - 1;
              end
          Computation:
            if(counter == 0)
              begin
                state <= Output_Results;               
                end_of_computation <= 1;
                nr_of_trans <= NUMBER_OF_OUTPUT_WORDS;
              end
            else
                counter <= counter - 1;
          Output_Results:
            begin
              case(nr_of_trans)
                8: result <= out_data[0];
                7: result <= out_data[1];
                6: result <= out_data[2];
                5: result <= out_data[3];
                4: result <= out_data[4];
                3: result <= out_data[5];
                2: result <= out_data[6];
                1: result <= out_data[7];
              endcase
              if(nr_of_trans == 0)
                state <= Idle;
              else
                nr_of_trans <= nr_of_trans - 1;
            end
        endcase
  end
   assign out_data[0] = (temp_out[0][0]==0)?(temp_out[0]>>16)-((-temp_out[0])>>16));
   assign out_data[1] = (temp_out[1][0]==0)?(temp_out[1]>>16)-((-temp_out[1])>>16));
   assign out_data[2] = (temp_out[2][0]==0)?(temp_out[2]>>16)-((-temp_out[2])>>16));
   assign out_data[3] = (temp_out[3][0]==0)?(temp_out[3]>>16)-((-temp_out[3])>>16));
   assign out_data[4] = (temp_out[4][0]==0)?(temp_out[4]>>16)-((-temp_out[4])>>16));
   assign out_data[5] = (temp_out[5][0]==0)?(temp_out[5]>>16)-((-temp_out[5])>>16));
   assign out_data[6] = (temp_out[6][0]==0)?(temp_out[6]>>16)-((-temp_out[6])>>16));
   assign out_data[7] = (temp_out[7][0]==0)?(temp_out[7]>>16)-((-temp_out[7])>>16));
endmodule

PARTNER CONTENT

文章评论0条评论)

登录后参与讨论
EE直播间
更多
我要评论
0
3
关闭 站长推荐上一条 /3 下一条