腾讯 AI Lab 官网

腾讯 AI Lab 官网
论文

Target Foresight based Attention for Neural Machine

In neural machine translation, an attention model is used to identify the aligned source words for a target word (target foresight word)in order to select translation context, but it does not make use of any information of this target foresight word at all.Previous work proposed an approach to improve the attention model by explicitly accessing this target foresight word and demonstrated the substantial gains in alignment task. However,this approach is useless in machine translation task on which the target foresight word is unavailable. In this paper, we propose a new attention model enhanced by the implicit information of target foresight word oriented to both alignment and translation tasks. Empirical experiments on Chinese-to-English and Japanese-to-English datasets show that the proposed attention model delivers significant improvements in terms of both alignment error rate and BLEU.....

NAACL 2018 · 2018

Microblog Conversation Recommendation via Joint Modeling of Topics and Discourse

Millions of conversations are generated every day on social media platforms. With limited attention, it is challenging for users to select which discussions they would like to participate in. Here we propose a new method for microblog conversation recommendation.While much prior work has focused on postlevel recommendation, we exploit both the conversational context,and user content and behavior preferences. We propose a statistical model that jointly captures: (1) topics for representing user interests and conversation content,and (2) discourse modes for describing user replying behavior and conversation dynamics.Experimental results on two Twitter datasets demonstrate that our system outperforms methods that only model content without considering discourse.....

NAACL 2018 · 2018

Encoding Conversation Context for Neural Keyphrase Extraction from Microblog Posts

Existing keyphrase extraction methods suffer from data sparsity problem when they are conducted on short and informal texts,especially microblog messages. Enriching context is one way to alleviate this problem. Considering that conversations are formed by reposting and replying messages, they provide useful clues for recognizing essential content in target posts and are therefore helpful for keyphrase identification.In this paper, we present a neural keyphrase extraction framework for microblog posts that takes their conversation context into account, where four types of neural encoders,namely, averaged embedding, RNN, attention,and memory networks, are proposed to represent the conversation context. Experimental results on Twitter and Weibo datasets1 show that our framework with such encoders outperforms state-of-the-art approaches......

NAACL 2018 · 2018

Directional Skip-Gram: Explicitly Distinguishing Left and Right Context forWord Embeddings

In this paper, we present directional skipgram (DSG), a simple but effective enhancement of the skip-gram model. By introducing an additional vector to explicitly distinguish left and right context in word prediction, each word’s embedding is learned by not only word co-occurrence patterns in its context, but also the directions of its contextual words. Compared to other extensions of the skip-gram model,our model has lower complexity and therefore can be trained efficiently. Experimental results show that our model outperforms other models on different datasets in semantic and syntactic evaluations.....

NAACL 2018 · 2018

APIReal: An API Recognition and Linking Approach for Online Developer Forums

When discussing programming issues on social platforms (e.g, Stack Overflow,Twitter), developers often mention APIs in natural language texts. ExtractingAPI mentions from natural language texts serves as the prerequisite to effective indexingand searching for API-related information in software engineering social content.The task of extracting API mentions from natural language texts involves twosteps: 1) distinguishing API mentions from other English words (i.e., API recognition),2) disambiguating a recognized API mention to its unique fully qualified name(i.e., API linking). Software engineering social content lacks consistent API mentions and sentence writing format....

Empirical Software Engineering 2018 · 2018

NEURAL NETWORK LANGUAGE MODELING WITH LETTER-BASED FEATURES AND IMPORTANCE SAMPLING

In this paper we describe an extension of the Kaldi software toolkit to support neural-based language modeling, intended for use in automatic speech recognition (ASR) and related tasks. We combine the use of subword features (letter n-grams) and one-hot encoding of frequent words so that the models can handle large vocabularies containing infrequent words. We propose a new objective function that allows for training of unnormalized probabilities. An importance sampling based method is supported to speed up training when the vocabulary is large. ...

ICASSP 2018 · 2018

FEATURE BASED ADAPTATION FOR SPEAKING STYLE SYNTHESIS

Speaking style plays an important role in the expressivity of speech for communication. Hence speaking style is very important for synthetic speech as well. Speaking style adaptation faces the difficulty that the data of specific styles may be limited and difficult to obtain in large amounts. A possible solution is to leverage data from speaking styles that are more available, to train the speech synthesizer and then adapt it to the target style for which the data is scarce.Conventional DNN adaptation approaches directly update the top layers of a well-trained, style-dependent model towards the target style. The detailed local context-level mismatch between the original and the target styles is not considered. . ...

ICASSP 2018 · 2018

ADAPTIVE PERMUTATION INVARIANT TRAINING WITH AUXILIARY INFORMATION FOR MONAURAL MULTI-TALKER SPEECH RECOGNITION

In this paper, we extend our previous work on direct recognition of single-channel multi-talker mixed speech using permutation invari-ant training (PIT). We propose to adapt the PIT models with aux-iliary features such as pitch and i-vector, and to exploit the gender information with multi-task learning which jointly optimizes for the speech recognition and speaker-pair prediction. We also compare CNN-BLSTMs against BLSTM-RNNs used in our previous PIT-ASR model. The experimental results on the artificially mixed two-talker AMI data indicate that our proposed model improvements can reduce word error rate (WER) by 10.0% relative to our previous work for both speakers in the mixed speech. Our results also con-firm that PIT can be easily combined with advanced techniques to improve the performance on multi-talker speech recognition. ...

ICASSP 2018 · 2018

KNOWLEDGE TRANSFER IN PERMUTATION INVARIANT TRAINING FOR SINGLE-CHANNEL MULTI-TALKER SPEECH RECOGNITION

This paper proposes a framework that combines teacher-student training and permutation invariant training (PIT) for single-channel multi-talker speech recognition. In contrast to most of conventional teacher-student training methods that aim at compressing the model,the proposed method distills knowledge from the single-talker model to improve the multi-talker model in the PIT framework. The inputs to the teacher and student networks are the single-talker clean speech and the multi-talker mixed speech, respectively....

ICASSP 2018 · 2018

Learning to Guide Decoding for Image Captioning

Recently, much advance has been made in image captioning, and an encoder-decoder framework has achieved outstanding performance for this task. In this paper, we propose an extension of the encoder-decoder framework by adding a component called guiding network. The guiding network models the attribute properties of input images, and its output is leveraged to compose the input of the decoder at each time step. The guiding network can be plugged into the current...

2018 AAAI · Feb 2018

Discovering and Distinguishing Multiple Visual Senses for Polysemous Words

To reduce the dependence on labeled data, there have been increasing research efforts on learning visual classifiers by exploiting web images. One issue that limits their performance ...

2018 AAAI · Feb 2018

Reduced-Rank Linear Dynamical Systems

Linear Dynamical Systems are widely used to study the underlying patterns of multivariate time series. A basic assumption of these models is that high-dimensional time series can be characterized by some underlying, low-dimensional and time-varying latent states. However, existing approaches to LDS modeling mostly learn the latent space with a prescribed dimensionality. When dealing with short-length highdimensional time series data, such models would be easily overfitted. We propose Reduced-Rank Linear Dynamical Systems (RRLDS), to automatically retrieve the intrinsic dimensionality of the latent space during model learning. ...

2018 AAAI · Feb 2018

Stochastic Non-convex Ordinal Embedding with Stabilized Barzilai-Borwein Step Size

Learning representation from relative similarity comparisons, often called ordinal embedding, gains rising attention in recent years. Most of the existing methods are batch methods designed mainly based on the convex optimization, say, the projected gradient descent method....

2018 AAAI · Feb 2018

HodgeRank with Information Maximization for Crowdsourced Pairwise Ranking Aggregation

Recently, crowdsourcing has emerged as an effective paradigm for human-powered large scale problem solving in various domains. However, task requester usually has a limited amount of budget, thus it is desirable to have a policy to wisely allocate the budget to achieve better quality. In this paper, we study the principle of information maximization for active sampling strategies in the framework of HodgeRank, an approach based on Hodge Decomposition of pairwise ranking data with multiple workers. ...

2018 AAAI · Feb 2018

Adaptive Graph Convolutional Neural Networks

Graph Convolutional Neural Networks (Graph CNNs) are generalizations of classical CNNs to handle graph data such as molecular data, point could and social networks. Current filtersin graph CNNs are built for fixed and shared graph structure. However, for most real data, the graph structures varies in both size and connectivity. The paper proposes a generalized and flexible graph CNN taking data of arbitrary graph structure as input. In that way a task-driven adaptive graph is learned for each graph data while training. To efficiently learn the graph, adistance metric learning is proposed. Extensive experiments on nine graph-structured datasets have demonstrated the superior performance improvement on both convergence speed and predictive accuracy. ...

2018 AAAI · Feb 2018

Latent Sparse Modeling of Longitudinal Multi-dimensional Data

We propose a tensor-based model to analyze multidimensional data describing sample subjects. It simultaneously discovers patterns in features and reveals past temporal points having impact on current outcomes. The model coeffi- cient, a k-mode tensor, is decomposed into a summation of k tensors of the same dimension. To accomplish feature selection, we introduce the tensor ‘latent F-1 norm’ as a grouped penalty in our formulation. ...

2018 AAAI · Feb 2018

Translating Pro-Drop Languages with Reconstruction Models

Pronouns are frequently omitted in pro-drop languages, such as Chinese, generally leading to significant challenges with respect to the production of complete translations. To date, very little attention has been paid to the dropped pronoun (DP) problem within neural machine translation (NMT). In this work, we propose a novel reconstruction-based approach to alleviating DP translation problems for NMT models. Firstly, DPs within all source sentences are automatically annotated with parallel information extracted from the bilingual training corpus. Next, the annotated source sentence is reconstructed from hidden representations in the NMT model. ...

2018 AAAI · Feb 2018

Improving Sequence-to-Sequence Constituency Parsing

Sequence-to-sequence constituency parsing casts the tree-structured prediction problem as a general sequential problem by top-down tree linearization,and thus it is very easy to train in parallel with distributed facilities. Despite its success, it relies on a probabilistic attention mechanism for a general purpose,which can not guarantee the selected context to be informative in the specific parsing scenario....

2018 AAAI · Feb 2018

Collaborative Filtering with User-Item Co-Autoregressive Models

Deep neural networks have shown promise in collaborative filtering (CF). However, existing neural approaches are either user-based or item-based, which cannot leverage all the underlying information explicitly. We propose CF-UIcA, a neural co-autoregressive model for CF tasks, which exploits the structural correlation in the domains of both users and items. The co-autoregression allows extra desired properties to be incorporated for different tasks. Furthermore, we develop an efficient stochastic learning algorithm to handle large scale datasets. We evaluate CF-UIcA on two popular benchmarks: MovieLens 1M and Netflix, and achieve state-of-the-art performance in both rating prediction and top-N recommendation tasks, which demonstrates the effectiveness of CF-UIcA....

2018 AAAI · Feb 2018

EAD: Elastic-Net Attacks to Deep Neural Networks via Adversarial Examples

Recent studies have highlighted the vulnerability of deep neural networks (DNNs) to adversarial examples - a visually indistinguishable adversarial image can easily be crafted to cause a well-trained model to misclassify. Existing methods for crafting adversarial examples are based on L2 and L∞ distortion metrics. However, despite the fact that L1 distortion accounts for the total variation and encourages sparsity in the perturbation, little has been developed for crafting L1-based adversarial examples. ...

2018 AAAI · Feb 2018