استادیار، گروه مهندسی کامپیوتر و فناوری اطلاعات، دانشکده فنی مهندسی، دانشگاه شاهد، تهران، ایران
چکیده
روشهای متداول موقعیتیابی متن در تصاویر طبیعی دارای چالش فضای زیاد جستجو جهت تشخیص دقیق، صحیح، سریع و کارا است و برای پشتیبانی از دادههای عظیم تصویری به یک مسئله سخت لاینحل تبدیل میشود. میتوان کاهش محاسبات پیچیده را با هدف کنترل فضای جستجو، منابع مصرفی و هزینهها، با برخورداری از تکنیکهای نرمافزاری در فضای سختافزاری چندهستهای، ساختارهای گرید و رایانش ابری شکل داد. موقعیتیابی و خواندن نوری متون در سطوح مختلف پاراگراف، خط متن، کلمه و کاراکتر در دادههای عظیم بهطور وحشتناکی به پیچیدگی آن میافزاید. در این مقاله روشهای عمومی غیریادگیر موقعیتیابی متون در تصاویر کمحجم و روشهای متداول دادههای حجیم طبیعی بررسی شده و مدلی مناسب دارای تحلیلگر استراتژیک تصویر با استفاده از عاملهای هوشمند و باتهای زیرک برای پردازش دادههای عظیم تصویری جهت موقعیتیابی متون توسط روباتها ارائه و در کنار مجموعه دادههای متفاوت و معیارهای ارزیابی مختلف تشریح شده است.
[1] Z. Yingying, C. Yao, and X. Bai, "Scene text detection and recognition: Recent advances and future trends," Frontiers of Computer Science, vol.10, no.1, pp. 19-36, 2016.
[2] L. Rainer, and A. Wernicke, "Localizing and segmenting text in images and videos," Circuits and Systems for Video Technology, vol.12, no.4, pp.256-268, 2002.
[3] W. Edward, and M. Chen, "A new robust algorithm for video text extraction," Pattern Recognition, vol.36, no.6, pp.1397-1406, 2003.
[4] C. Min, J. Song, and M. R. Lyu, "A new approach for video text detection," IEEE International Conference on Image Processing, vol.1, pp.1-17, 2002.
[5] J A. Jamil, I. Siddiqi, F. Arif, and A. Raza, "Edge-based features for localization of artificial Urdu text in video images," IEEE International Conference on Document Analysis and Recognition (ICDAR), pp.1120-1124, 2011.
[6] A. Marios, B. Gatos, I. Pratikakis, "A two-stage scheme for text detection in video images," Image and Vision Computing, vol.28, no.9, pp.1413-1426, 2010.
[7] P. Xujun, H. Cao, R. Prasad, and P. Natarajan, "Text extraction from video using conditional random fields," IEEE International Conference on Document Analysis and Recognition (ICDAR), pp.1029-1033, 2011.
[8] S. Palaiahnakote, T. Phan, S. Lu, and C. Lim Tan, "Gradient vector flow and grouping-based method for arbitrarily oriented scene text detection in video images,"Circuits and Systems for Video Technology, vol.23, no.10, pp.1729-1739, 2013.
[9] P. Yi-Feng, X. Hou, and C. Liu, "A hybrid approach to detect and localize texts in natural scene images," Image Processing, vol.20, no.3, pp.800-813, 2011.
[10] A. K. Jain, and B. Yu. "Automatic text location in images and video frames," Pattern recognition, vol.31, no.12, pp.2055-2076, 1998.
[11] E. Boris, E. Ofek, Y. Wexler, "Detecting text in natural scenes with stroke width transform," IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.2963-2970, 2010.
[12] N. Lukas, and J. Matas, "A method for text localization and recognition in real-world images," Asian Conference on Computer Vision, pp.770-783, 2011.
[13] Y. Cong, X. Bai, W. Liu, Y. Ma, and Z. Tu, "Detecting texts of arbitrary orientations in natural images," IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1083-1090, 2012.
[14] H. Weilin, Z. Lin, J. Yang, and J. Wang, "Text localization in natural images using stroke feature transform and text covariance descriptors," IEEE international conference on Computer Vision (ICCV), pp. 1241-1248 ,2013.
[15] T. Novikova, O. Barinova, P. Kohli, and V. Lempitsky, "Large-lexicon attribute-consistent text recognition in natural images," European conference on Computer Vision–ECCV, pp.752-765, 2012.
[16] Y. Cong, X. Bai, and W. Liu, "A unified framework for multioriented text detection and recognition," Image Processing, vol.23, no.11, pp.4737-4749, 2014.
[17] Y. Xu-Cheng, X. Yin, K. Huang, and H. Hao, "Robust text detection in natural scene images," Pattern Analysis and Machine Intelligence, vol.36, no.5, pp.970-983, 2014.
[18] J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, and Y. W. Ma,” Robust face recognition via sparse representation,” Pattern Analysis and Machine Intelligence, vol.31, no. 2, pp.210-227, 2008.
[19] E. Michael, and M. Aharon, "Image denoising via sparse and redundant representations over learned dictionaries," Image Processing, vol.15, no.12, pp.3736-3745, 2006.
[20] Z. Ming, S. Li, and J. Kwok, "Text detection in images using sparse representation with discriminative dictionaries," Image and Vision Computing, vol.28, no.12, pp.1590-1599, 2010.
[21] S. Palaiahnakote, T. QuyPhan, and C. L. Tan, "A laplacian approach to multi-oriented text detection in video," Pattern Analysis and Machine Intelligence, vol. 33, no.2, pp.412-419, 2011.
[22] L. Simon, A. Panaretos, L. Sosa, A. Tang, S. Wong, and R. Young, "ICDAR 2003 robust reading competitions," International Conference on Document Analysis and Recognition (ICDAR), pp.682-678, 2003.
[23] L. Simon, "ICDAR 2005 text locating competition results, " International Conference on Document Analysis and Recognition (ICDAR), pp.80-84, 2005.
[24] H. Chen, S. S. Tsai, G. Schroth, D. M. Chen, R. Grzeszczuk, and B. Girod, "Robust text detection in natural images with edge-enhanced maximally stable extremal regions," IEEE International Conference on Image Processing (ICIP), pp.2609-2612, 2011.
[25] L. Su, and K. E. Barner, "Weighted DCT coefficient based text detection," International Conference on Acoustics, Speech and Signal Processing ICASSP, pp.1341-1344, 2008.
[26] W. Edward, and M. Chen, "A new robust algorithm for video text extraction," Pattern Recognition, vol.36, no.6, pp. 1397-1406, 2003.
[27] C. Min, J. Song, and M. R. Lyu, "A new approach for video text detection," International Conference on Image Processing Proceedings, vol.1, pp.110-117, 2002.
[28] Y. Qixiang, Q. Huang, W. Gao, and D. Zhao, "Fast and robust text detection in images and video frames," Image and Vision Computing, vol.23, no.6, pp.565-576, 2005.
[29] L. C. Woo, K. Jung, and H. J. Kim, "Automatic text detection and removal in video sequences," Pattern Recognition Letters, vol.24, no.15, pp.2607-2623, 2003.
[30] C. Datong, J. Odobez, and J. Thiran, "A localization/verification scheme for finding text in images and video frames based on contrast independent features and machine learning methods," Signal Processing: Image Communication, vol.19, no.3, pp.205-217, 2004.
[31] W. Tao, D. J. Wu, A. Coates, and A. Y. Ng, "End-to-end text recognition with convolutional neural networks," International Conference on Pattern Recognition (ICPR), pp.3304-3308, 2012.
[32] J. Max, K. Simonyan, A. Vedaldi, A. Zisserman, "Synthetic data and artificial neural networks for natural scene text recognition," arXiv preprint arXiv:1406.2227 (2014).
[33] S. Bolan, and S. Lu, "Accurate scene text recognition based on recurrent neural network," Computer Vision-ACCV 2014, Springer International Publishing, pp.35-48, 2015.
[34] J. Max, K. Simonyan, A. Vedaldi, and A. Zisserman, "Reading text in the wild with convolutional neural networks," International Journal of Computer Vision, vol.116, no.1, pp.1-20, 2014.
[35] J. Max, K. Simonyan, A. Vedaldi, and A. Zisserman, "Deep structured output learning for unconstrained text recognition," arXiv preprint arXiv:1412.5903, 2014.
[36] J. Munho, and K. Jo, "Multi language text detection using fast stroke width transform," Korea-Japan Joint Workshop onFrontiers of Computer Vision (FCV), pp.1-4, 2015.
[37] T. Kobchaisawat, and H. C. Thanarat, "A method for multi-oriented Thai text localization in natural scene images using Convolutional Neural Network," International Conference on Signal and Image Processing Applications (ICSIPA), pp.220-225, 2015.
[38] D. Karatzas, L. Gomez-Bigorda, A. Nicolaou, S. Ghosh, A. Bagdanov, M. Iwamura, J. Matas, L. Neumann, V. R. Chandrasekhar, S. Lu, and F. Shafait, "Icdar 2015 competition on robust reading," International Conference on Document Analysis and Recognition (ICDAR), pp.1156–1160, 2015.
[39] Y. Zheng, Q. Li, J. Liu, H. Liu, G. Li, and S. Zhang, "A cascaded method for text detection in natural scene images," Neurocomputing, vol.238, pp.307-315, 2017.
[40] N. Vasilopoulos, and E. Kavallieratou, "Unified layout analysis and text localization framework, " Electronic Imaging, vol. 26, no.1, 2017.
[41] A. B. Ayed, M. B. Halima, and A. M. Alimi, "MapReduce Based Text Detection in Big Data Natural Scene Videos," INNS Conference on Big Data, vol.53, pp.216-223, 2015.
[42] S. Ali S, K. Iqbal, S. Khan, Q. Z. Aqil, and R. Tariq, "A Review on Text Detection Techniques," VFAST Transactions on Software Engineering, vo.78, pp.4-3, 2016.
[43] P. Shivakumara, R. P. Sreedhar, T. Q. Phan, S. Lu, and C. L. Tan, "Multioriented video scene text detection through bayesian classification and boundary growing," Circuits and Systems for Video Technology, vol.22, no.8, pp.1227-1235, 2012.
[44] X. Wang, Y. Jiang, Z. Luo, C. L. Liu, H. Choi, and S. Kim, "Arbitrary shape scene text detection with adaptive text region representation," IEEE Conference on Computer Vision and Pattern Recognition, pp.6449-6458, 2019.
[45] J. Zhou, L. Xu, B. Xiao, and R. Dai, "A robust system for text extraction in video," IEEE International Conference on Machine Vision, pp.119-124, 2007.
[46] S. Palaiahnakote, T. Q. Phan, and C. L. Tan, "New Fourier-statistical features in RGB space for video text detection," Circuits and Systems for Video Technology, vol. 20, no.11, pp.1520-1532, 2010.
[47] L. Chunmei, C. Wang, and R. Dai, "Text detection in images based on unsupervised classification of edge-based features," International Conference on Document Analysis and Recognition, pp.610-614, 2005.
[48] W. Huang, Z. Lin, J. Yang, and J. Wang, "Text localization in natural images using stroke feature transform and text covariance descriptors," International Conference on Computer Vision (ICCV), pp.1241-1248, 2013.
[49] W. Huang, Q. Yu, and X. Tang, "Robust scene text detection with convolution neural network induced mser trees," Computer Vision–ECCV, pp.497-511, 2014.
[50] Z. Yu, K. Karu, and A. K. Jain, "Locating text in complex color images," International Conference on Document Analysis and Recognition, vol.1, pp.146-149, 1995.
[51] L. Huiping, D. Doermann, and O. Kia, "Automatic text detection and tracking in digital video," Image Processing, vol.9, no.1, pp,147-156, 2000.
[52] Y. Chucai, and Y. Tian, "Text string detection from natural scenes by structure-based partition and grouping." Image Processing, vol.20, no.9, pp.2594-2605, 2011.
[53] K. Kwang, K. Jung, and J. H. Kim, "Texture-based approach for text detection in images using support vector machines and continuously adaptive mean shift algorithm," Pattern Analysis and Machine Intelligence, vol.25, no.12, pp.1631-1639, 2003.
[54] L. Michael, J. Song, and M. Cai, "A comprehensive method for multilingual video text detection, localization, and extraction," Circuits and Systems for Video Technology, vol.15, no.2, pp.243-255, 2005.
[55] Y. Liu, and T. Ikenaga, "A contour-based robust algorithm for text detection in color images," IEICE transactions on information and systems, vol.89, no.3, pp.1221-1230, 2006.
[56] W. Kai, and S. Belongie, “Word spotting in the wild,” European Conference on Computer Vision, Springer Berlin Heidelberg, 2010.
[57] C. Xiangrong, and A. L. Yuille, "Detecting and reading text in natural scenes," IEEE Conference on Computer Vision and Pattern Recognition, vol.2, pp.II-II, 2004.
[58] W. Christian, and J. M. Jolion, "Extraction and recognition of artificial text in multimedia documents," Formal Pattern Analysis & Applications, vol.6, no.4, pp.309-326, 2004.
[59] Y. Bae, and J. Park, "Architecture for fast object detection supporting CPU-GPU hybrid and distributed computing, " IEEE International Conference Consumer Electronics (ICCE), pp.158-159, 2017.
[60] J. Pont-Tuset, P. Arbelaez, J. T. Barron, F. Marques, and J. Malik, "Multiscale combinatorial grouping for image segmentation and object proposal generation, " Pattern Analysis and Machine Intelligence, vo.39, no.1, pp.128-140, 2017.
[61] T. Kurc, X. Qi, D. Wang, F. Wang, G. Teodoro, L. Cooper, M. Nalisnik, L. Yang, J. Saltz, and D. J. Foran, "Scalable analysis of Big pathology image data cohorts using efficient methods and high-performance computing strategies, " BMC bioinformatics, vol.16. no.1, pp.1-21, 2015.
[62] Z. Chen, W. Zhang, B. Hu, X. Cao, S. Liu, and D. Meng, "Retrieving Objects by Partitioning," IEEE Transactions on Big Data, vol.3, no.1, pp.44-54, 2017.
[63] R. Kune, P. K. Konugurthi, A. Agarwal, R. R. Chillarige, and R. Buyya, "XHAMI–extended HDFS and MapReduce interface for Big Data image processing applications in cloud computing environments, " Software: Practice and Experience, vol.47, no.3, pp.455-472, 2017.
[64] K. Gauen, R. Rangan, A. Mohan, Y. H. Lu, W. Liu, and A. C. Berg, "Low-power image recognition challenge, " IEEE Conference on Design Automation (ASP-DAC), pp.99-104, 2017.
[65] I. Lee, "Big data: Dimensions, evolution, impacts, and challenges, " Business Horizons, vol. 60, no.3, pp.293-303, 2017.
[66] J. Liu, Y. Huang, J. Peng, J. Yao, and L. Wang, "Fast Object Detection at Constrained Energy, " IEEE Transactions on Emerging Topics in Computing, vol.6, no.3, pp.409-4016, 2016.
[67] R. Zhang, X. Liu, J. Hu, K. Chang, and K. Liu, "A fast method for moving object detection in video surveillance image, " Signal, Image and Video Processing, vol.11, no.5, pp.841-848, 2017.
[68] L. Dong, Z. Lin, Y. Liang, L. He, N. Zhang, Q. Chen, X. Cao, and E. Izquierdo, "A Hierarchical Distributed Processing Framework for Big Image Data," IEEE Transactions on Big Data, vol.2, no.4, pp.297-309, 2016.
[69] F. Ronald B., S. Gardner, and P. Palangpour, "Energy-efficient secure vision processing applying object detection algorithms," U.S. Patent Application, No. 15/227,949, 2017.
[70] G. Xiang, H. Yeh, and P. Marayong, "A high-speed color-based object detection algorithm for quayside crane operator assistance system," Annual IEEE International Systems Conference (SysCon), pp.1-6, 2017.
[71] D. Nguyen, L. Shijian, N. Ouarti, and M. Mokhtari, "Text-Edge-Box: An Object Proposal Approach for Scene Texts Localization," IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1296-1305, 2017.
[72] U. B. Karanje, and R. Dagade, "Survey on Text Detection, Segmentation and Recognition from a Natural Scene Images," International Journal of Computer Applications, vol.108, no.13, 2014.
[73] X. Shen, W. Liu, I. Tsang, F. Shen, and Q. S. Sun, “Compressed K-Means for Large-Scale Clustering, ” Thirty-first aaai Conference on Artificial Intelligence, 2017.
[74] H. Kevin, and M. Golparvar-Fard, "Potential of big visual data and building information modeling for construction performance analytics: An exploratory study," Automation in Construction, vol. 73, pp.184-198, 2017.
[75] K. Amandeep, R. Dhir, and G. S. Lehal, "A survey on camera-captured scene text detection and extraction: towards Gurmukhi script," International Journal of Multimedia Information Retrieval, vol.6, no.2, pp.115-142, 2017.
[76] T. Mukesh, and R. Singhai, "A Review of Detection and Tracking of Object from Image and Video Sequences," International Journal of Computational Intelligence Research, vol.13, no.5, pp.745-765, 2017.
[77] L. Yang, S. Cheng, P. K. Verma, and S. Wang, "Text Search: Towards Fast Text Localization in Scene Images," IEEE International Symposium on Multimedia (ISM), pp. 83-86, 2016.
[78] M. Liao, B. Shi, X. Bai, X. Wang, and W. Liu, "TextBoxes: A Fast Text Detector with a Single Deep Neural Network," arXiv preprint arXiv:1611.06779, 2017.
[79] G. Luís, and D. Karatzas. "Text proposals: a text-specific selective search algorithm for word spotting in the wild," Pattern Recognition, vol.70, pp.60-74, 2017.
[80] S. Qin, and M. Manduchi. "Cascaded Segmentation-Detection Networks for Word-Level Text Spotting," arXiv preprint arXiv:1704.00834, 2017.
[81] J. Zhang, G. Wu, X. Hu, and X. Wu, “A distributed cache for hadoop distributed file system in real-time cloud services,” International Conference on Grid Computing (GRID), pp.12-21, 2012.
[82] B. Kulis, and K. Grauman, “Kernelized locality-sensitive hashing for scalable image search,” international conference on computer vision (ICCV), pp.2130-2137, 2017.
[83] Z. Zhang, D. S. Katz, J. M. Wozniak, A. Espinosa, and I. Foster, “Design and analysis of data management in scalable parallel scripting,” International Conference on High Performance Computing, Networking, Storage and Analysis, pp. 1-12, 2012.
[84] M. Almeer, “Cloud Hadoop MapReduce for remote sensing image analysis,” Emerging Trends in Computing and Information Sciences, vol.3, no.4, pp.637-644, 2012.
[85] L. Neumann, and J. Matas, "Real-time lexicon-free scene text localization and recognition," IEEE transactions on pattern analysis and machine intelligence, vol.38, no.9, pp.1872-85, 2016.
[86] A. Veit, T. Matera, L. Neumann, J. Matas, and S. Belongie, "Coco-text: Dataset and benchmark for text detection and recognition in natural images," arXiv preprint arXiv:1601.07140, 2016.
[87] H. Turki, M. B. Halima, and A. M. Alimi, "Scene text detection images with pyramid image and MSER enhanced," International Conference on Intelligent Systems Design and Applications (ISDA), pp.301-306, 2015.
[88] S. Matko, "Scene Text Segmentation using Low Variation Extremal Regions and Sorting Based Character Grouping," Neurocomputing, vol.266, pp.56-65, 2017.
[89] C. Hojin, M. Sung, and B. Jun. "Canny text detector: Fast and robust scene text localization algorithm," IEEE Conference on Computer Vision and Pattern Recognition, pp. 3566-3573, 2016.
[90] Y. Song, J. Chen, H. Xie, Z. Chen, X. Gao, and X. Chen, "Robust and parallel Uyghur text localization in complex background images," Machine Vision and Applications, vol.28, no.7, pp.755-69, 2017.
[91] Y. Chong, Y. Song, and Y. Zhan, "Scene text localization using edge analysis and feature pool," Neurocomputing, vol.175, pp. 652-661, 2016.
[92] C. Kai, F. Yin, and C. L. Liu. "Effective Candidate Component Extraction for Text Localization in Born-Digital Images by Combining Text Contours and Stroke Interior Regions," IAPR Workshop on Document Analysis Systems (DAS), pp. 352-357, 2016.
[93] Vidhya, K.A. and Geetha, T.V., "Rough set theory for document clustering: A review," Journal of Intelligent & Fuzzy Systems, vol.32, no.3, pp.2165-2185, 2017.
[94] Z. Pawlak, J. Grzymala-Busse, R. Slowinski, and W. Ziarko,"Rough Set," Communication of the ACM, vol.38, no.11, pp.88–95, 1995.
[95] H. Cho, and M.K. An, "Co-clustering-based clustering and segmentation for pattern discovery from time course data," International Journal of Information and Electronics Engineering, vol.4, no.5, pp.358, 2014.
[96] E. Elhamifar, and R. Vidal, "Sparse subspace clustering: Algorithm, theory, and applications," IEEE transactions on pattern analysis and machine intelligence, vol.35, no.11, pp.2765-2781, 2014.
[97] Z. Li, L. F. Cheong, S. Yang, and K. C. Toh, "Simultaneous Clustering and Model Selection: Algorithm, Theory and Applications," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.40, no.8, pp.1964-78, 2017.
[98] D. Bazazian, R. G´omez, A. Nicolaou, L. Gomez, D. Karatzas, and A. D. Bagdanov, "Improving Text Proposals for Scene Images with Fully Convolutional Networks," arXiv preprint arXiv:1702.05089, 2017.
[99] T. He, W. Huang, Y. Qiao, and J. Yao, " Accurate Text Localization in Natural Image with Cascaded Convolutional Text Network," arXiv preprint arXiv:1603.09423, 2016.
[100] Wei Y, Zhang Z, Shen W, Zeng D, Fang M, and Zhou S. "Text detection in scene images based on exhaustive segmentation," Signal Process: Image Communication, vol.50, pp.1–8, 2017.
[101] M. Jiang, J. Cheng, M. Chen, and X. Ku, " An Improved Text Localization Method for Natural Scene Images," Journal of physics: conference series, vol.960, no.1, p.012027, 2018, doi:10.1088/1742-6596/960/1/012027.
[102] N. Robert, A. Dicker, and K. Meyer-Wegener, "NEOCR: A configurable dataset for natural image text recognition," Camera-Based Document Analysis and Recognition, pp.150-163, 2011, Springer Berlin Heidelberg.
[103] L. SeongHun, M. Su Cho, K. Jung, and J. Kim, "Scene text extraction with edge constraint and text collinearity," international conference on pattern recognition, pp. 3983-3986, 2010.
[104] D. Campos, T. Emídio, B. R. Babu, and M. Varma, "Character Recognition in Natural Images," VISAPP, vol.2, pp.273-280, 2009.
[105] Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng. "Reading digits in natural images with unsupervised feature learning," NIPS workshop on deep learning and unsupervised feature learning, Granada, Spain, 2011.
[106] A. Mishra, A. Karteek, and C. V. Jawahar, "Scene text recognition using higher order language priors," BMVC 23rd British Machine Vision Conference BMVA, 2012.