MVPro interviews Jeff Bier, Founder of the Embedded Vision Alliance, about the possibilities of Deep Learning, the influence of this technology on computer vision and the first Deep Learning training event in Germany, based on Google´s open source framework, TensorFlow.
MVPro: Deep Learning seems to be the latest magic word in the computer vision industry. How would you describe this technology in short?
Jeff Bier: Classical visual perception algorithms are hand-crafted by engineers for very specific tasks. For example, to identify certain types of objects, algorithm designers typically specify small features like edges or corners for the algorithm to detect. Then the algorithm designer specifies how groups of these small features may be used to identify larger features, and so on. Such approaches can work very well when the objects of interest are uniform and the imaging conditions are favorable, for example when inspecting bottles on an assembly line to ensure the correct labels are properly affixed.
But these approaches often struggle when conditions are more challenging, such as when the objects of interest are deformable, when there can be significant variation in appearance from individual to individual, and when illumination is poor. With major recent improvements in processors and sensors, a case can be made that good algorithms are now the bottleneck in creating effective “machines that see.”
Deep neural networks are a very different approach to visual perception, and not only to visual perception as they are used in many other fields as well. In essence, instead of “telling” our machines how to recognize objects (“first look for edges, then look for edges that might make circles”, etc.), with artificial neural networks, it is possible to “train” algorithms by showing them large numbers of examples and using a feedback procedure that automatically adapts the functionality of the algorithm based on the examples.
More specifically, convolutional neural networks are massively parallel algorithms made up of layers of simple computation nodes, or “neurons”. Such networks do not execute programs. Instead, their behavior is governed by their structure (what is connected to what), the choice of simple computations that each node performs, and coefficients or “weights” determined via a training procedure.
So rather than trying to distinguish dogs from cats via a recipe-like series of steps, for example, a convolutional neural network is taught how to categorize images by being shown a large set of example images. Three things make this approach very exciting right now:
1) For many visual perception techniques, deep neural networks are outperforming the accuracy of the best previously known techniques by significant margins.
2) The rate of improvement of accuracy of deep neural network algorithms for visual perception tasks is significantly faster than the rate of improvement we had previously seen with classical techniques.
3) With deep neural networks, we are able to use a common set of techniques to solve a wide range of visual perception problems. This is a big breakthrough compared to classical techniques, where very different types of algorithms are typically used for different tasks.
MVPRo: How can computer vision developers benefit from this technology?
Jeff Bier: Deep neural network techniques are showing excellent results for a wide range of visual perception tasks, from face and object recognition to optical flow. Even very difficult problems like lip reading are yielding to these algorithms. So, developers who are trying to solve challenging visual perception tasks will want to carefully consider deep neural network techniques.
MVPro: What are the applications or systems where the use of Deep Learning technologies opens up new markets for computer vision?
Jeff Bier: Previously, computer vision has mainly been successful in applications such as the inspection of manufactured items, where imaging conditions can be controlled and pass/fail criteria can easily be quantified. But there are numerous opportunities for machine perception where imaging conditions can’t be controlled, and where there’s big variation in the objects of interest.
Deep neural network techniques are particularly helpful in these cases. For example, it’s quite simple for a human to distinguish a strawberry from other kinds of fruit, but not so simple for an algorithm, considering the variations in size and shape of strawberries, which can be exacerbated by variations in camera angle, lighting, surrounding objects, etc. Similarly, for an automotive safety system detecting pedestrians is very challenging because people come in different sizes, wear different clothing, have infinitely variable poses, etc.
MVPro: Google´s open source framework TensorFlow™ is based on Deep Learning. According to the latest survey of the Embedded Vision Alliance, it is currently the most popular deep learning framework for computer vision, having left behind Caffe, OpenCV and others in popularity. What do you think are the reasons for this success?
Jeff Bier: I think that one reason for TensorFlow’s popularity is that Google is a leading technology company, and Google uses TensorFlow extensively itself. Engineers in other companies are eager to use the same technology that one of the industry leaders is using. The fact that TensorFlow is open source is also a big factor – there’s no cost to use it. In addition, TensorFlow is the first deep learning framework to emphasize efficient deployment of deep neural networks not only in data centers, but also in embedded and mobile devices.
MVPro: The Embedded Vision Alliance is offering the first TensorFlow training event in Germany in Hamburg on the 7th of September 2017. Who should attend this training and what is on schedule?
Jeff Bier: This training is ideal for engineers working on all types of vision applications, creating algorithms and software for visual machine perception, e.g. in the industrial, medical, consumer, retail, public safety or automotive area, who want to quickly come up to speed on using TensorFlow for these applications. It’s also appropriate for managers who want to get a flavor for deep neural networks and TensorFlow. More generally said, the training will be applicable to people working on all forms of “machines that see”, whether they are implementing visual perception in the cloud, in a PC, on mobile devices or in an embedded system.
The course will provide a hands-on introduction to the TensorFlow framework, with particular emphasis on using TensorFlow to create, train, evaluate and deploy deep neural networks for visual perception tasks. For more details about the agenda, I recommend to visit https://tensorflow.embedded-vision.com .
MVPro: Who will be the trainer at the Hamburg event?
Jeff Bier: The training will be presented by Douglas Perry, who is uniquely qualified for this role. He has presented dozens of professional training classes to engineers in the electronics industry over the past five years, and he has hands-on experience with TensorFlow deep neural networks.
In preparing the training content and hands-on excercises, Douglas is assisted by my colleagues at BDTI, who contributed to the creation of an earlier deep learning training class that was very well received by attendees.
MVPro: How will attendees benefit from the Hamburg event?
Jeff Bier: Attendees will benefit from accelerated learning of practical techniques using TensorFlow for visual perception applications. After the training, attendees will be ready to begin using TensorFlow productively in their work.
MVPro: How can people being interested in attending register for that training?
Jeff Bier: We have prepared a web page with all information about the Hamburg and other training events at https://tensorflow.embedded-vision.com.
MVPro: Are attendees expected to already have an understanding of deep neural networks before attending the training?
Jeff Bier: Attendees will get the most out of the training if they are familiar with the basic concepts and terminology of deep neural networks. For attendees who require an introduction to deep neural network algorithms, the Embedded Vision Alliance will make available a two-hour video tutorial presentation online prior to the TensorFlow class at no additional cost.
MVPro: In 2011 you founded the Embedded Vision Alliance. What are the main tasks of that organization and why is it so actively driving Deep Learning technologies and the TensorFlow framework?
Jeff Bier: The Embedded Vision Alliance exists to facilitate the practical use of vision technology in all kinds of applications. We do this primarily through providing training and other educational resources for engineers and companies who are incorporating or want to incorporate visual perception into their devices, systems and applications. The Alliance also helps technology supplier companies, for example, suppliers of processors and sensors, to get the information and insights they need in order to succeed in vision markets.
Jeff Bier is founder of the Embedded Vision Alliance, an industry partnership formed to enable the market for embedded vision technology by inspiring and empowering design engineers to create more capable and responsive products through the integration of vision capabilities. Jeff is also co-founder and president of Berkeley Design Technology, Inc. (BDTI), a trusted resource for independent analysis and specialized engineering services in the realm of embedded digital signal processing technology. Jeff oversees BDTI’s benchmarking and analysis of chips, tools, and other technology. He is also a key contributor to BDTI’s consulting services, which focus on product-development, marketing, and strategic advice for companies using and developing embedded digital signal processing technologies.