Computer vision is a multidisciplinary field of study that attempts to help computers read and comprehend digital images similarly to the human optic system. Broadly speaking, it includes computational intelligence and machine learning. It is based on understanding visual context, not just written or catalogued content about an image or video (such as a man-made text description embedded in the image or video to help locate it in a computer system). Computer vision has been discussed in scientific communities since the 1960s, but it has struggled to make any significant advances, mainly because image analysis and context is very complex and the human optic system far outpaces any computational ability.
Recently, deep learning has allowed computer systems to better analyze images by showing them pictures. Over time, the computer learns to identify details from images that will help it to notice those details in other images (image recognition). The overall goal for computer vision is that a computer is able to understand the details of an image and interpret or explain it to humans. Deep learning helps that goal become more realistic, but computer vision is still far from where researchers would like it to be.
Two main problems with computer vision make it challenging to implement. The visual world inherently has much change and variety. It’s also very complex. Though the human brain is designed to quickly, involuntarily analyze the smallest details about an image or other piece of visual media, computers are not. Secondly, computer vision is generally based on the human optic system, and even scientists don’t understand it well enough to attempt to adequately recreate it.