
mithril-ntu.github.io
Daniel Liu · Daniel LiuThis is Daniels Blog. Thu, Jun 9, 2016. Sequence to Sequence Video to Text. This paper mainly proposes a end-to-end method to translate videos into text descriptions. It uses CNN for feature extraction and LSTM for encoding and decoding of the features and word representations. The main framework of the S2VT system is shown below:. Video and text representation. The hyper-parameter alpha is tuned on the validation set. Wed, May 25, 2016. Deep Neural Networks for Acoustic Modelling in Speech Recognition.
http://mithril-ntu.github.io/