The reason the model is slow when you are using the Analog Output and Analog Input blocks is because you are outputting one sample at a time with the Analog Output block (which was designed to output a chunk of data at a time). This is because Simulink is fundamentally sample-based. An equivalent analogy of the Simulink execution using sample MATLAB code would be if you wrote a for-loop where in each iteration you read one value from the 'output' variable, queue that scalar value, and call 'startForeground'. Because of the overhead of sending data to the DAQ board, there is a significant slowdown in Simulink.
To have something similar to what your script is doing in Simulink, you need to include a Buffer block to output a chunk of buffered data instead of single samples to the Analog Output block. This will allow you to speed up your model because you will not need to send data to the DAQ board as frequently.
There is also some delay and data incorrectness which comes from the fact that the Analog Input and Analog Output are in the same model. These blocks cannot execute at the same time when they are in the same model, so what is achieved is that the Analog Output will send some data, then the Analog Input will send some data, and they will continue to alternate. In this case you may notice that the data being read in is not actually the same data that is sent out. To solve this, you should put the input into a separate model, as shown below:
Output Model:
Input Model:
Now you will be able to see that the model simulates much faster, and the data matches what you expect. With a buffer size of 100, for instance, here is what the Time Scope looks like:
You can see in this image that the sinusoid exists. However, there seems to be noise between each period. In this example the frequency was set to 10Hz with a sample time of 0.001s. This is why each buffer of 100 samples is equivalent to one period. Each chunk of 100 samples that is sent to the DAQ board is guaranteed to be continuous, but because of the overhead of sending a chunk of data (as mentioned above), the various chunks are not continuous with each other. This is the reason for the noisy sections between each chunk.
To resolve this issue, you want to set the buffer size to be large enough that you do not have many chunks to send. This will eliminate much of the overhead. Below is a screenshot when the buffer size is set to 10000: