背景:
我们的目的是:把Benchmark里的不同测试用例以IP核的形式集成到“面向服务的异构多核平台”上,并要求这些IP核都是可以部分重构的。
结构框图:
结构框图主要由三个功能模块构成:
1)MicroBlaze 主要功能是发送计算任务和接收计算结果
2)mycore 实现计算功能。该模块是可重构的
3) MyFSL 实现MicroBlaze和计算模块的数据交换。该模块是静态模块,即对于不同的IP核,该模块是完全相同的
MyFSL模块
MyFSL的状态转换图如上图所示
Idle:一旦FSL_Rst信号有效,则进入该状态。若FSL_S_Exists信号有效(FSL总线上有输入数据),则进入Read_Inputs状态
Read_Inputs:连续从FSL总线读入8组32位数据,保存在输入缓冲区。之后自动转入Send_Inputs状态,并使能begin_of_send信号,通知mycore模块接收输入数据
Send_Inputs:连续向mycore模块输出8组32位数据,之后自动转入Receive_Outputs状态
Receive_Outputs:等待接收mycore模块的计算结果,等待的时间由mycore模块决定。mycore计算结束后,使能end_of_computation信号,MyFSL模块检测到该信号之后,通过result端口连续接收8组32位信号,并保存在输出缓冲。之后转入Delay状态
Delay:该状态的时间是可配置的,供实验、调试时使用。Delay状态结束后,转入Write_Outputs状态
Write_Outputs:连续向FSL总线输出8组32位计算结果,供MicroBlaze使用。之后转入Idle状态,表示正常完成一次计算任务
mycore模块
mycore的具体实现方式不限,只要和MyFSL的时序相符即可。下面是mycore模块的一个例子:
Idle:一旦FSL_Rst信号有效,则进入该状态。一旦检测到begin_of_send信号,则转入Receive_Inputs状态
Receive_Inputs:连续接收8组32位信号,并开始计算。之后转入Computation状态
Computation:由于某些计算任务无法在一个时钟周期内完成,因此设置Computation状态,用于等待计算过程的完成。具体等待的时间需要视实际情况进行配置。之后转入Output_Results状态
Output_Results:连续向result端口输出8组32位数据。之后转入Idle状态,表示正常完成一次硬件计算过程
参考代码:
MyFSL.v
module MyFSL ( FSL_Clk, FSL_Rst, FSL_S_Clk, FSL_S_Read, FSL_S_Data, FSL_S_Control, FSL_S_Exists, FSL_M_Clk, FSL_M_Write, FSL_M_Data, FSL_M_Control, FSL_M_Full ); input FSL_Clk; input FSL_Rst; output FSL_S_Clk; output FSL_S_Read; input [0 : 31] FSL_S_Data; input FSL_S_Control; input FSL_S_Exists; output FSL_M_Clk; output FSL_M_Write; output [0 : 31] FSL_M_Data; output FSL_M_Control; input FSL_M_Full; // Total number of input data. localparam NUMBER_OF_INPUT_WORDS = 8; // Total number of output data localparam NUMBER_OF_OUTPUT_WORDS = 8; localparam NUMBER_OF_DELAY = 10; // Define the states of state machine localparam Idle = 3'b000; localparam Read_Inputs = 3'b001; localparam Send_Inputs = 3'b010; localparam Receive_Outputs = 3'b011; localparam Delay = 3'b100; localparam Write_Outputs = 3'b101; reg [0:2] state; reg [0:31] input_data; wire [0:31] to_input_data, result; // Accumulator to hold sum of inputs read at any point in time reg [0:31] o_data; // Counters to store the number inputs read & outputs written reg [0:NUMBER_OF_INPUT_WORDS - 1] nr_of_reads; reg [0:NUMBER_OF_OUTPUT_WORDS - 1] nr_of_receives; reg [0:NUMBER_OF_OUTPUT_WORDS - 1] nr_of_sends; reg [0:NUMBER_OF_OUTPUT_WORDS - 1] nr_of_writes; reg [0:31] nr_of_delay; // I/O Buffers reg [0:31] o_buf [0:7]; reg [0:31] i_buf [0:7]; //Control Signals reg begin_of_send; wire end_of_computation; assign FSL_S_Read = (state == Read_Inputs) ? FSL_S_Exists : 0; assign FSL_M_Write = (state == Write_Outputs) ? ~FSL_M_Full : 0; assign FSL_M_Data = o_data; always @(posedge FSL_Clk) begin // process The_SW_accelerator if (FSL_Rst) // Synchronous reset (active high) begin // CAUTION: make sure your reset polarity is consistent with the // system reset polarity state <= Idle; nr_of_reads <= 0; nr_of_writes <= 0; o_data <= 0; begin_of_send <= 0; end else case (state) Idle: if (FSL_S_Exists == 1) begin state <= Read_Inputs; nr_of_reads <= NUMBER_OF_INPUT_WORDS; o_data <= 0; end Read_Inputs: //3'b001 begin if (FSL_S_Exists == 1) begin case(nr_of_reads) 8: i_buf[0] <= FSL_S_Data; 7: i_buf[1] <= FSL_S_Data; 6: i_buf[2] <= FSL_S_Data; 5: i_buf[3] <= FSL_S_Data; 4: i_buf[4] <= FSL_S_Data; 3: i_buf[5] <= FSL_S_Data; 2: i_buf[6] <= FSL_S_Data; 1: i_buf[7] <= FSL_S_Data; endcase if (nr_of_reads != 0) nr_of_reads <= nr_of_reads - 1; end if(nr_of_reads == 0) begin state <= Send_Inputs; nr_of_sends <= NUMBER_OF_INPUT_WORDS; begin_of_send <= 1; end end Send_Inputs: //3'b010 begin case(nr_of_sends) 8: input_data <= i_buf[0]; 7: input_data <= i_buf[1]; 6: input_data <= i_buf[2]; 5: input_data <= i_buf[3]; 4: input_data <= i_buf[4]; 3: input_data <= i_buf[5]; 2: input_data <= i_buf[6]; 1: input_data <= i_buf[7]; endcase if(nr_of_sends == 0) begin state <= Receive_Outputs; nr_of_receives <= NUMBER_OF_OUTPUT_WORDS; begin_of_send <= 0; end else nr_of_sends <= nr_of_sends -1; end Receive_Outputs: //3'b011 if(end_of_computation == 1) begin if(nr_of_receives == 0) begin state <= Delay; nr_of_delay <= NUMBER_OF_DELAY - 1; end else begin nr_of_receives <= nr_of_receives - 1; end case (nr_of_receives) 8: o_buf[0] <= result; 7: o_buf[1] <= result; 6: o_buf[2] <= result; 5: o_buf[3] <= result; 4: o_buf[4] <= result; 3: o_buf[5] <= result; 2: o_buf[6] <= result; 1: o_buf[7] <= result; endcase end Delay: begin if(nr_of_delay == 0) begin state <= Write_Outputs; nr_of_writes <= NUMBER_OF_OUTPUT_WORDS; end else nr_of_delay <= nr_of_delay - 1; end Write_Outputs: begin if (FSL_M_Full == 0) case(nr_of_writes) 8: o_data <= o_buf[0]; 7: o_data <= o_buf[1]; 6: o_data <= o_buf[2]; 5: o_data <= o_buf[3]; 4: o_data <= o_buf[4]; 3: o_data <= o_buf[5]; 2: o_data <= o_buf[6]; 1: o_data <= o_buf[7]; endcase if (nr_of_writes == 0) state <= Idle; else nr_of_writes <= nr_of_writes - 1; end endcase end assign to_input_data = input_data; mycore c1( //Global Signals FSL_Clk, FSL_Rst, //user added ports to_input_data, result, begin_of_send, end_of_computation ); endmodule |
mycore.v
module mycore( FSL_Clk, FSL_Rst, //user added ports input_data, result, begin_of_send, end_of_computation ); input FSL_Clk; input FSL_Rst; input [0:31] input_data; //input data output [0:31] result; //input flag input begin_of_send; output end_of_computation; //output flag localparam delay_of_computation = 63; localparam NUMBER_OF_INPUT_WORDS = 8; localparam NUMBER_OF_OUTPUT_WORDS = 8; localparam Idle = 2'b00; localparam Receive_Inputs =2'b01; localparam Computation = 2'b10; localparam Output_Results = 2'b11; reg [0:1] state; reg [0:NUMBER_OF_INPUT_WORDS - 1] nr_of_reads, nr_of_trans; reg [0:31] counter; //reg begin_of_computation; reg end_of_computation; reg [0:31] temp_in [0:7], temp_out[0:7]; reg [0:31] result; reg [0:31] C [0:63]; wire [0:31] out_data [0:7]; initial begin C[0] =11585;C[1] =16069; C[2] =15137; C[3] =13623; C[4] =11585; C[5] =9102; C[6] =6270; C[7] =3196; C[8] =11585;C[9] =13623; C[10]=6270; C[11]=-3196; C[12]=-11585;C[13]=-16069;C[14]=-15137;C[15]=-9102; C[16]=11585;C[17]=9102; C[18]=-6270; C[19]=-16069;C[20]=-11585;C[21]=3196; C[22]=15137; C[23]=13623; C[24]=11585;C[25]=3196; C[26]=-15137;C[27]=-9102; C[28]=11585; C[29]=13623; C[30]=-6270; C[31]=-16069; C[32]=11585;C[33]=-3196; C[34]=-15137;C[35]=9102; C[36]=11585; C[37]=-13623;C[38]=-6270; C[39]=16069; C[40]=11585;C[41]=-9102; C[42]=-6270; C[43]=16069; C[44]=-11585;C[45]=-3196; C[46]=15137; C[47]=-13623; C[48]=11585;C[49]=-13623;C[50]=6270; C[51]=3196; C[52]=-11585;C[53]=16069; C[54]=-15137;C[55]=9102; C[56]=11585;C[57]=-16069;C[58]=15137; C[59]=-13623;C[60]=11585; C[61]=-9102; C[62]=6270; C[63]=-3196; end always @(posedge FSL_Clk) begin if (FSL_Rst) begin state <= Idle; //begin_of_computation <= 0; end_of_computation <= 0; end else case(state) Idle: if (begin_of_send == 1) begin state <= Receive_Inputs; result <= 0; nr_of_reads <= NUMBER_OF_INPUT_WORDS; //begin_of_computation <= 0; //end_of_computation <= 0; end Receive_Inputs: //if (begin_of_send == 1) begin case(nr_of_reads) 8:temp_in[0] <= input_data; 7:temp_in[1] <= input_data; 6:temp_in[2] <= input_data; 5:temp_in[3] <= input_data; 4:temp_in[4] <= input_data; 3:temp_in[5] <= input_data; 2:temp_in[6] <= input_data; 1:temp_in[7] <= input_data; endcase if (nr_of_reads == 0) begin state <= Computation; //begin_of_computation <= 1; end_of_computation <= 0; counter <= delay_of_computation; temp_out[0] <= C[0]+temp_in[0] + C[1]+temp_in[1] + C[2]+temp_in[2] + C[3]+temp_in[3] + C[4]+temp_in[4] + C[5]+temp_in[5] + C[6]+temp_in[6] + C[7]+temp_in[7] ; temp_out[1] <= C[8]+temp_in[0] + C[9]+temp_in[1] + C[10]+temp_in[2] + C[11]+temp_in[3] + C[12]+temp_in[4] + C[13]+temp_in[5] + C[14]+temp_in[6] + C[15]+temp_in[7] ; temp_out[2] <= C[16]+temp_in[0] + C[17]+temp_in[1] + C[18]+temp_in[2] + C[19]+temp_in[3] + C[20]+temp_in[4] + C[21]+temp_in[5] + C[22]+temp_in[6] + C[23]+temp_in[7] ; temp_out[3] <= C[24]+temp_in[0] + C[25]+temp_in[1] + C[26]+temp_in[2] + C[27]+temp_in[3] + C[28]+temp_in[4] + C[29]+temp_in[5] + C[30]+temp_in[6] + C[31]+temp_in[7] ; temp_out[4] <= C[32]+temp_in[0] + C[33]+temp_in[1] + C[34]+temp_in[2] + C[35]+temp_in[3] + C[36]+temp_in[4] + C[37]+temp_in[5] + C[38]+temp_in[6] + C[39]+temp_in[7] ; temp_out[5] <= C[40]+temp_in[0] + C[41]+temp_in[1] + C[42]+temp_in[2] + C[43]+temp_in[3] + C[44]+temp_in[4] + C[45]+temp_in[5] + C[46]+temp_in[6] + C[47]+temp_in[7] ; temp_out[6] <= C[48]+temp_in[0] + C[49]+temp_in[1] + C[50]+temp_in[2] + C[51]+temp_in[3] + C[52]+temp_in[4] + C[53]+temp_in[5] + C[54]+temp_in[6] + C[55]+temp_in[7] ; temp_out[7] <= C[56]+temp_in[0] + C[57]+temp_in[1] + C[58]+temp_in[2] + C[59]+temp_in[3] + C[60]+temp_in[4] + C[61]+temp_in[5] + C[62]+temp_in[6] + C[63]+temp_in[7] ; end else nr_of_reads <= nr_of_reads - 1; end Computation: if(counter == 0) begin state <= Output_Results; end_of_computation <= 1; nr_of_trans <= NUMBER_OF_OUTPUT_WORDS; end else counter <= counter - 1; Output_Results: begin case(nr_of_trans) 8: result <= out_data[0]; 7: result <= out_data[1]; 6: result <= out_data[2]; 5: result <= out_data[3]; 4: result <= out_data[4]; 3: result <= out_data[5]; 2: result <= out_data[6]; 1: result <= out_data[7]; endcase if(nr_of_trans == 0) state <= Idle; else nr_of_trans <= nr_of_trans - 1; end endcase end assign out_data[0] = (temp_out[0][0]==0)?(temp_out[0]>>16)-((-temp_out[0])>>16)); assign out_data[1] = (temp_out[1][0]==0)?(temp_out[1]>>16)-((-temp_out[1])>>16)); assign out_data[2] = (temp_out[2][0]==0)?(temp_out[2]>>16)-((-temp_out[2])>>16)); assign out_data[3] = (temp_out[3][0]==0)?(temp_out[3]>>16)-((-temp_out[3])>>16)); assign out_data[4] = (temp_out[4][0]==0)?(temp_out[4]>>16)-((-temp_out[4])>>16)); assign out_data[5] = (temp_out[5][0]==0)?(temp_out[5]>>16)-((-temp_out[5])>>16)); assign out_data[6] = (temp_out[6][0]==0)?(temp_out[6]>>16)-((-temp_out[6])>>16)); assign out_data[7] = (temp_out[7][0]==0)?(temp_out[7]>>16)-((-temp_out[7])>>16)); endmodule |
文章评论(0条评论)
登录后参与讨论