井冈山大学学报自然科学版

文章摘要

汤鹏杰,谭云兰,许恺晟,李金忠.基于GoogLeNet多阶段连带优化的图像描述[J].井冈山大学自然版,2016,(5):47-57

基于GoogLeNet多阶段连带优化的图像描述

MULTI-STAGE JOINTLY MODELING BASED ON GOOGLENET FOR IMAGE DESCRIPTION

投稿时间：2016-06-13 修订日期：2016-06-26

DOI：10.3969/j.issn.1674-8085.2016.05.010

英文关键词: image description GoogLeNet LSTM multi stage jointly modeling

基金项目:2015年度江西省艺术科学规划项目（YG2015081）；2015年度江西省教育厅科学技术研究项目（GJJ150788）；流域生态与地理环境监测国家测绘地理信息局重点实验室资助课题（WE2016015）；井冈山大学科研基金项目（JZ14012）

作者	单位	E-mail
汤鹏杰	井冈山大学数理学院, 江西, 吉安 343009 井冈山大学流域生态与地理环境监测国家测绘地理信息局重点实验室, 江西, 吉安 343009 同济大学计算机科学与技术系, 上海 201804
谭云兰	井冈山大学流域生态与地理环境监测国家测绘地理信息局重点实验室, 江西, 吉安 343009 井冈山大学电子与信息工程学院, 江西, 吉安 343009 同济大学计算机科学与技术系, 上海 201804	tanyunlan@163.com
许恺晟	同济大学计算机科学与技术系, 上海 201804
李金忠	井冈山大学流域生态与地理环境监测国家测绘地理信息局重点实验室, 江西, 吉安 343009 井冈山大学电子与信息工程学院, 江西, 吉安 343009 同济大学计算机科学与技术系, 上海 201804

摘要点击次数: 2380

全文下载次数: 3809

中文摘要:

图像描述是使用计算机将一副图像中的内容使用自然语言的形式重新表达，是图像理解任务中极具挑战性的工作。目前，使用深度CNN模型和RNN模型对图像进行编码和解码框架来解决该问题已经成为研究热点，也在多个数据集上取得了突破。但这些工作在使用CNN的过程中对其参数优化不足，且常使用分阶段训练的方式，导致整个系统易陷入局部最优。针对这些问题，在GoogLeNet模型的基础上，利用其中间特征，自底向上添加了两个辅助LSTM分支及其监督函数，通过联合训练，对整个模型进行优化，保证了CNN模型低层参数对任务的有效性，避免了系统陷入局部最优点；同时，由于加入了低层监督函数的干扰，使得模型有了额外的正则化，提高了模型的泛化能力。在Flickr8K和Flickr30K两个数据集上的实验表明，本文方法优势明显，在多个统计指标上均超过了现有其他方法。

英文摘要:

The goal of image description is to translate the image content into natural language with correct grammar and structure.It is very difficult and challenge but promising prospect.Nowadays,it has been the most popular approach that combines the deep CNN model and RNN technology for encoding and decoding respectively,with obtaining great breakthrough on several corresponding datasets.However,these works always focus on modeling the natural language but ignoring further optimization of CNN model;moreover,the training of the model is usually split,with disadvantage that may be easily caught in local optimization.In this work,the GoogLeNet model is employed as basis,and two auxiliary LSTM branches and supervision functions are added by utilizing the intermediate features.According to jointly training for the model,the parameters in the lower layers are optimized sufficiently and the local optimum is overcome.In addition,the extra regularization is provided,leads to stronger generalization ability for the model,since disturbance from lower supervision functions is added.The experimental results on the public Flickr8K and Flickr30K datasets also demonstrate that the proposed model is effective,with exceeding most of current popular approaches on BLEU,METEOR,et al.

查看全文查看/发表评论下载PDF阅读器

关闭