TG Daily Special: The bleeding edge of 3D face recognition

Posted by Rick C. Hodgin

fHouston (TX) - Scientists at the University of Houston are researching an advanced form of 3D facial recognition.  They're hoping to prove this technology powerful enough to serve as a foundational security component, one possibly working alongside other biometrics.  More secure than PINs, faster and easier than swiping a card, and less invasive than being thumbprinted, 3D facial recognition systems could be the security breakthrough which keeps us all from being chipped.

TG Daily recently had the opportunity to visit the University of Houston to see state of the art research in this field.  Dr. Ioannis Kakadiaris leads the team and was on hand with Ann Holdsworth from the office of communication to give me a guided tour through his research facility.  Dr. Kakadiaris' work is considered to be the bleeding edge of this technology discipline.  And although it is a young science, it could well solve the problem of identity theft and provide enhanced personal security.

Data points

Most facial recognition systems today rely on data obtained from 2D images.  Emerging 3D prototype recognition systems are expected at a faster pace now that more parallel compute abilities exist.  In fact, in 2004 the U.S. government recognized this trend and issued a set of guidelines for the minimum requirements of facial recognition software.  These baseline a level to which all future offerings must equal or exceed.

The technology we observed is called "URxD", which stands for Ultimate Recognition in X Dimensions.  The name was chosen as Dr. Kakadiaris' technology can be extended to include additional data whenever new input abilities are made available.  This does not limit it to 2D or 3D images, but can include even future, currently unknown technologies.  According Dr. Kakadiaris, the basic rule for this extensible idea is simple: The more input data is provided, the more accurate the system is.

Dr. Kakadiaris' prototype system considers many factors when determining if you are really you. Today, these include an reconstructeded data set of more than 120 computed factors from 3D landscapes and images.  The factors describe many facets relating to the front half of your head as is observed by their image capture ability.

Facial scanning

The idea behind facial recognition is quite interesting.  Basically, the biggest advantage is that no matter where you are, your face is also there.  Dr. Kakadiaris said no two people look exactly alike, and that's especially true when observed through the infrared spectrum.  Even identical twins have some differences. His team is working on twin research right now, but he was unable to share his findings with us at the time of our visit (because the tests are incomplete).

When his technology is combined with another very simple form of non-invasive biometrics, such as voice recognition, the ability to show your mug and talk could become a very powerful way of identifying yourself.  It would be a security mechanism which could not be stolen, and one that's always with you without being invasive.


2D, 3D and xD

Many existing facial recognition systems rely on 2D images. They look for various data points visible through a 2D photograph. This 2D system is often the result of keeping costs down.  2D cameras and software algorithms are readily available at low cost making them very attractive.  But, they can be fooled.  For some of them, even holding up a large, quality photograph of someone can fool the system.

Recognition technologies include search criteria such as the distance between the eyes, eye socket height, width, eye angle, nose length, multi-point nose width, cheek position, size, shape, jawline, size, shape, temple shape, depth, overall face shape, relative distances across and up/down the face, lip thickness, height, width, contour, texture, and about 60 more attributes and various combinations thereof.  When assembled and pieced together these comprise a set of data points which is stored in a database and can be compared as necessary.

Dr. Kakadiaris' research effort takes this process a few steps further by moving from just 2D data into xD data.  He describes each new form of data as providing "additional dimensions". When infrared is added, it increases the dimensions to 5D (3D + 2D for the infrared image).  When the full shape is considered, and not just 3D rendered texture data, he calls it 7D (2D texture + 2D infrared + 3D shape).  It is 8D if time is added.

Data points in 3D are no longer limited to just those generally observable from two dimensions.  With 3D the entire surface structure of the face becomes a mine for data points.  That mine includes both textures and shapes.  In fact, when you see the full contour of your face shown on-screen in a movable manner, it's almost freaky how real it looks.  What was previously a technology using only a flat image with certain attributes known and understood from a single vantage point, now with 3D it becomes much more alive.  A raised image holds not only the same set of original attributes, but also a whole new set of attributes derived from the 3D facial landscape.

As additional data points are included, such as infrared and motion, an "average sample" can be constructed.  The database search then becomes much more representative of the real face as it is based on more data, all of which makes errors less pronounced.

3D cameras

The imaging process begins with the subject moving into a field of view where the camera system captures the image.  The 3D camera system is a set of 2D cameras arranged at a series of specifically spaced positions.  When their multi-view 2D images are run through special software algorithms, they reconstruct an amazingly accurate 3D model of the whole front half of your head from a relatively small camera box (about the size of a large box of breakfast cereal).

 

 

Dr. Kakadiaris also has a type of stereoscopic 3D camera, a system that includes a pair of 3D cameras pointing toward the subject's face.  It's connected by a rod and provides a greater field of vision.  Though, because its field of view is larger it can capture not only the front of your head, but also much of the sides as well.

All of this collected data is used in the prototype.  It includes high-definition textures, a full-frame wire mesh showing the shape, as well as a complete landscaping or topography view visible via shading.  The images it produces of the capture face, when rendered and moved around in 3D, are quite stunning.

The capture system is not limited to just 3D, however.  The X-Dimension system also allows this data to be merged with other sources.  The most obvious is infrared which reads the thermal data from your face.  On the day we were there the infrared cameras had been loaned out to another department so that information was not available to us.  However, from Dr. Kakadiaris' stock material we can see what it would look like.  By rendering the captured data into the visible spectrum, a person's face takes on a whole new appearance.

The thermal data provide not only additional data points, but it's really like a true facial fingerprint.  When combined with the 3D image, even if two people look almost identical on the outside, the aspects and qualities of their facial thermal terrain can be used to differentiate and identify them very easily.  This data set makes the process far more accurate than the 3D image data alone.  But, it's not even limited to infrared.

Laser scans are also available as an alternate source of 3D input.  And, as mentioned, motion capture can also add additional data points which allow the recognition system to become more accurate.  Again, the more points sampled the more accurate.

Read on the next page: Scan time, Operations, Accuracy, Capture software

 


 

 

 

 

 

Scan time

The process acquires the digital image data from the multiple cameras in 2 ms.  Anyone moving in front of the camera is automatically captured once the prototype realizes there is a human face in front of the system.  This actually caused a few test snapshots to occur before I was ready.  The 2 ms capture time includes both the 3D visible-light cameras and the infrared camera.

I was told the source for image data is really of no consequence.  Provided it adheres to a type of open standard they're currently developing, it could come from any source.  This could even include in-store video cameras or still-frame captures which are only 2D in nature.  The xD design of their system does not require 3D data, but can receive any 2D or greater data.

Once the data is received, the rendering from multi-point capture to 3D wireframe plus textures (and infrared, I'm told) took about ten seconds on their prototype using a standard dual-core desktop PC running Windows XP Professional.  A real-time progress indicator shows the patterns as they're found.  This actually makes the decode process quite interesting to watch.  The prototype contains this display information but future production systems would not even have to have a screen.  It could be as simple as a type of "red light, green light" camera box for all practical purposes.  As long as the light stays green, the ID of the person is known.

 

 

 

 

Scanning involves several filters.  Above are just two passes run across the data as the algorithms attempt to determine what's face, and what's not face.  As you can see they're not literal reverse images of one another.  There is actual logic involved in making the determination.  While those algorithms are proprietary, they do serve as the foundation for generating the 3D point data.  There are also several more algorithms used in the actual process.

Operations

The prototype they were using had the ability to do two things.  First, it could enroll someone in their database.  Second, it could lookup the face scan against their database to see who it was.  For the prototype, enrollment involved clicking the "Enrollment" option and having your face scanned.  Your name and other personally identified data could be input (though for our test we did not do this).  For lookup it's even easier.  Clicking the "Identify" option and stepping in front of the camera performs a search.  Attempts at identification can be done again and again - each taking about 10 seconds on their prototype.

Accuracy

The prototype process they have developed is quite accurate, even on my mug.  I had allowed my beard to grow to almost unseemly lengths in the hopes of fooling their system.  But I was unable to do so except for when I wore my glasses.  My beard geometry was simply introduced into their system as part of the shape of my face.  Even when I would contort my face it would still recognize me.  I tried puffing out my cheeks, raising one eyebrow, leaving my mouth open, etc.  In each case it found me.  Their 120 data point set are obviously sufficient enough to accommodate variable geometry.

The only problems it had were with my glasses.  If I enrolled in their database with my glasses on, and then took my glasses off it would not recognize me.  If, however, I had my glasses on it would recognize me.

While their infrared camera wasn't available when I visited their facility, I was told that its data helps identify "live" and "dead" aspects of an image.  Items like my beard and glasses, which do not convey a lot of heat information relative to my warm skin, would be isolated and removed from the data stream due to their infrared signature.  This would allow a more accurate picture of my real face without regard to beard or glasses, thereby correcting the geometry to more accurate levels.


Capture software and rendering

The system Dr. Kakadiaris and his team created allows any kind of capture front-end.  For the prototype they were using a system originally designed for the plastic surgery industry.  It was the tool responsible for taking the subject's image and converting it to 3D.  This would allow the plastic surgery patient to see the effects of their surgeries before and after, from any angle.  The technology was quite robust and whereas they didn't show what I would look with an eye-tuck, they did show the motion of moving the captured data around in three dimensions.

Dr. Kakadiaris was bound by the camera company from giving me any information about the cameras in use, their resolution, angles or really anything about them.  He did indicate, however, that any input devices which could convey digital camera info quickly could be used in a similar setup.  He did show us the actual camera boxes and allow us to photograph them.

 

It is the research team's hopes to reduce the size of the multi-camera capture device from its current large breakfast cereal box size down to something the size of an average set of desktop computer audio speakers.  Once miniaturized like that, they can be employed anywhere and very discretely.

 

Read on the next page: Extensibility and Recognition success


 

 

 

 

 

Extensibility

One of the research effort's goals has been to make everything as extensible as possible by design.  This relates also to some of Dr. Kakadiaris' other research efforts in bio-medicine imaging.  Whereas traditionally these kinds of "more generic" data abilities often result in slower computation, the reality is that the acquisition and format conversions represents only a very small portion of compute time.

The real heft of the algorithm comes from the 3D interpolation of data.  And it's in those areas that the extensible nature of their designs really weigh in.  And of course the advantages are that as new scanning techniques become available they can be used almost instantly by such a system.

The captured data can be channeled after capture and rendering to external programs for additional manipulation.  The team demonstrating the software was able to get me a 3D Studio Max data set of one of the scans I made.  I can now look at myself in 3D on my computer.

Recognition success

In 2004, the FRGC database (Face Recognition Grand Challenge) was released with 4007 facial maps from 466 different subjects in various forms including 2D and 3D.  The subjects had both neutral and non-neutral expressions, with 1 to 22 images per subject.  These images came with a minimum benchmark for software recognition percentages.  Basically, any new software hoping to enter the facial recognition field must operate at least at these levels on that data set.

In 2007, Dr. Kakadiaris's URxD algorithms achieved 100% recognition in "face neutral" positions.  When people contorted their face in some ways, such as raising an eyebrow, puffing out their cheeks, or opening their mouth, the numbers dropped to 97%.

This was notably better than any other contestant in the challenge, and something the research team was very proud of.  There were actually about 10 different data sets computed against the baselines based on real results.  In every slide, the URxD algorithm exceeded the baseline by significant margins with no recognition score below 95%, and that score came in facial recognition across an entire semester.

 

Read on the next page: Passive biometric security, faking the system 

 


 

 

 

 

 

Passive biometric security

Unlike invasive methods, such as blood samples, or active methods like placing your thumb on a pad, the completely passive facial recognition system would allow many more people to use it without feeling "invaded".  This is a real component of the human psyche as we are quite resistant to people "getting in our business."  And since this technology uses something we already have with us, and something that is uniquely tied to us, and because it can act passively, it can serve very adequately as a form of ID without causing similar concerns.

The only time this kind of system would become more than passive is if the people were wearing accessories which might prohibit a passive scan from obtaining their true facial geometry.  In such a case the person to ID might be required to remove the accessory to obtain a more accurate picture.  Some more common difficulties in facial recognition right now include the following.  Some of these have already been solved by Dr. Kakadiaris' algorithms.

Faking the system

One thing the team was quick to point out was that 3D facial recognition systems, especially those with infrared input data, is that they are nearly impossible to fake.  This is one of the biggest problems with 2D facial recognition systems.  Some of them can be fooled into thinking a high-res photo held in front of the camera is actually a person.  With URxD, this limitation is overcome in several ways.

First, the infrared capture has a data resolution of less than 40 mK per pixel.  Such high-res data would be extremely difficult to fake and would require making a full 3D mold or model with appropriate color, hair, eyelashes, etc., and then creating an appropriate heat landscape underneath.  It would be all but impossible to do on somebody (such as by wearing a mask).  And, it would be rather obvious if you were holding up a fake head trying to gain access into something.

Second, and even more powerful than that, is the possibility of adding another dimension to to the captured data; that of motion.  When motion is added, a 4D data set is captured.  By using pattern analyzing techniques, the software could determine if you were moving the way you move as compared to a previous sample.

Third, when coupled to a sample of your voice the system would be complete.  Each person's voice is unique.  And whereas a computer can make mistakes due to algorithm limitations in small samples, as each new dimension of data is added and the sample set grows, the possibility of mistakes being made decrease greatly.

 

 

 

Read on the next page: Practical use, storage, conclusion


 

 

 

 

Practical use

One of the most interesting aspects I found was that the system does not require a pose to acquire accurate data.  In fact, that's one of the desires for the technology.  Dr. Kakadiaris referred to this as "cooperative and uncooperative" capture abilities.  The uncooperative nature could come from people who do not want to be captured, but more often it would refer to the non-pose requirement.

Because the capture step occurs in 2 ms, the system could act passively during the normal course of activity in something like a checkout line, for example.  It would record the voice pattern and scan the face without having a specific, required pose.

In such a case, and provided there was no problem, the transaction would complete without any prompting for ID or anything else.  But, if there was a problem, then something along the lines of a more straight-forward pose and voiceprint would be requested.  Also, possibly bringing out other forms of ID if everything failed completely.  Still, if you think about this system, the maximum amount of difficulty or effort we would every have using this technology would really be no more than that which we all go through each time today.  "Can I see your ID."  "Please scan here."  "Please sign here."  When it worked, it would be much easier.

How to store?

When I asked Dr. Kakadiaris about how this technology could be employed by companies or governments, he gave me several possibilities.  But the truth is they are still researching which solution would be best for a given need.  Would it be better to have a centralized database with full image data?  That would be subject to theft.  Or what about a card system with data stored on the card with only a part of the cipher being stored locally.  Without the card, the merchant and the third part of the cipher then any other piece would be useless.  This one seems the most promising, though there are still details to be worked out.

What's really needed is reliability, accuracy and speed.  And right now this technology is too young to know what will work best on the commercial end.  While the technology itself works today, and will be made practical for commercial applications in the near future, it will likely be some time before it will be used for something like a National ID card replacement system.  Though there are so many advantages to this form of security that such a system is believed to be coming.

Thanks and other research efforts

I would like to thank the team at the University of Houston for their time and dedication, as well as Ms. Holdsworth from the office of communication for her excellent assistance.  All of it made for a most enjoyable visit.  The campus itself has a lot of history and I wish I could've spent more time taking it in.  There are some beautiful fountains and buildings and the back-to-school atmosphere was really contagious.

 

As I mentioned briefly above, Dr. Kakadiaris is also heading up additional research efforts in the field of bio-medicine.  His imaging technology research used for facial recognition is actually quite multi-discipline, hence its extensibility. What I saw of his early research in these other areas indicates some truly amazing things.  And while you won't find The University of Houston on any top 50 list of science academia, the research I had the opportunity to work with is at the bleeding edge, the best in the field.

And, as is now becoming a very common theme in my interviews and visits, Dr. Kakadiaris wanted to make sure I acknowledged his team.  He made it very clear that it takes a village to raise a research effort.

 

 

 

 

Additional research

If you'd like to read more about the specific technologies involved in this research effort, visit some of these sites.

Dr. Kakadiaris' lab: www.cbl.uh.edu
URxD: www.cbl.uh.edu/URxD
FRGC: www.frvt.org/FRGC/
FRVT: www.frvt.org
 

 

Conclusion

If we consider the unique qualities of our face and voice, those things which are garnered from the outside just by us being us, then this type of recognition software technology will have to eventually come to light and will be wide spread.  It's just a matter of research, funding and time it takes until the details are flushed out making this form of non-invasive technology a desirable alternative to being chipped.