Fraud Analysis – Deep Learning ‘n’ Offline Signature Verification

Deep Learning & Signature Verification – Fraud Analysis

Deep learning – An Introduction

In machine learning typically, you would start with a set of features that is extracted from the data and then you build models on top of it. In Supervised learning, you have labels associated with it. In a real world application, that extraction of features can become very challenging. For instance, in the case of autonomous driving, different conditions during the driving patterns people have, the weather patterns, road conditions,illuminations, local roads versus highways, make the feature extraction part extremely complicated.

In today’s day and age, it’s not possible to live without web search engines. And these web search engines are increasingly becoming sophisticated and involve machine translation and document comprehension. Machine translation pops up in various other places as well.

Deep learning solves the problem of learning higher order abstractions. Increasingly, these types of models, because of the higher level abstractions that they are able to build, are becoming pervasive in our day-to-day living. Deep learning enables building complex or higher-level constructs, using simpler constructs. Starting with the raw data, simple constructs, to a range of higher order constructs or abstractions. Deep learning is powered by deep neural networks, also referred to as DNN. Deep learning embodies many of the constructs that is inherently built into our biology, particularly the brain. Deep neural networks may make several layers in the brain.

If you see the cross section of a brain in available internet images, you can see different layers within the brain.Similarly, deep neural nets have multiple layers. Each layer learns a higher abstraction on the input from the layer before it. And many practical deep neural nets have large number of parameters.

These type of deep networks, which have huge number of parameters, were previously difficult to build, largely because of lack of large amount of data to train these models, as well as the computing capabilities needed to process that amount of data and build models out of it. With recent advances in computer science, such cross computing power and handling of large data is made feasible.

And with increasing number of devices generating data, the amount of data is also becoming abundant, helping us build really, really complex models that mimic natural behaviors.So that autonomous driving, document comprehension, and speech recognition tasks are becoming more and more common.

Application domains for deep learning thus are in image and video processing, speech processing, text. And combining these three, Increasing multi-modality and data coming from Internet of Things,is becoming increasingly common.


A convolutional neural network (CNN, or ConvNet) is a type of feed-forward artificial neural network made up of neurons that have learnable weights and biases. The CNNs take advantage of the spatial nature of the data. In nature, we perceive different objects by their shapes, size and colors. For example, objects in a natural scene are typically edges, corners/vertices (defined by two of more edges), color patches etc. These primitives are often identified using different detectors (e.g., edge detection, color detector) or combination of detectors interacting to facilitate image interpretation (object classification, region of interest detection, scene description etc.) in real world vision related tasks. These detectors are also known as filters. Convolution is a mathematical operator that takes an image and a filter as input and produces a filtered output (representing say edges, corners, colors etc in the input image). Historically, these filters are a set of weights that were often hand crafted or modeled with mathematical functions (e.g., Gaussian / Laplacian / Canny filter). The filter outputs are mapped through non-linear activation functions mimicking human brain cells called neurons.

Convolutional networks provide a machinery to learn these filters from the data directly instead of explicit mathematical models and have been found to be superior (in real world tasks) compared to historically crafted filters. With convolutional networks, the focus is on learning the filter weights instead of learning individually fully connected pair-wise (between inputs and outputs) weights. In this way, the number of weights to learn is reduced when compared with the traditional MLP networks from the previous tutorials. In a convolutional network, one learns several filters ranging from few single digits to few thousands depending on the network complexity.

Simple CNN image

Some of the popular convolutional networks CNNS are

LeNet – This was the first successful convolutional neural net by Yann LeCun in 1990.So convolutional neural nets have been there for a long time.It used a bunch of convolutional operations combined with sub sampling which effectively reduces the size of the image by picking every alternate whatever number of pixels you want to try but you pick every other pixel if you want to down sample by a size of two. And then you connect it to a whole bunch of full connections.

The next one is AlexNet – This was by far the one that really revolutionized and made deep learning, popular and rotted to the forefront. It was developed by Alex Krizhevsky, Ilya Sutskever and Geoffrey Hinton.In 2012, ImageNet challenge which is a prediction challenge around every year, which has a large number of images and with label data sets. And in this case with ImageNet using AlexNet data, it outperformed the state of the art by reducing the error from 26% to 16%. In computer vision literature this is huge huge reduction.This first introduced the use deeper bigger stacked convolutional layers.

GoogLeNet was the winner of the challenge later, ILSVRC challenge, the same challenge that was won by AlexNet two years before.GoogLeNet won it in 2014 by Szegedy et al from Google. It introduced the inception module.The key contribution of this module was the dramatic reduction in parameters from 60 million to 4 million.It used average pooling instead of fully connected layers.

The next one was the VGGNet, which was also very very popular.It was the runner up in the same year by Simonyan and Zisserman.It showed depth of network is key to performance. It had 16 convolution/fully connected layers, an extremely homogeneous architecture.It had a repeating set of 3 by 3 convolutions and 2 by 2 pooling.It’s more expensive to evaluate and requires a large amount of memory.It has roughly 140 million parameters compared to 60 million from the AlexNet.Most of the parameters however are in the fully connected layer.And later on people have found out that if these are removed, they do not cause a significant performance drop so thereby reducing the number of parameters in the VGGNet module.

ResNet, this was a winner of the ILSVRC 2015, by Kaiming He from Microsoft.It was the state of-the-art,as of May 2016 and a default choice.The original implementation had 152 layers and it introduced the concept of residual learning.

In conclusion, CNNs are widely used in computer vision.Convolutions allow for deeper architectures that affects performance and it improves the performance quite a bit..

Learning Deep Learning and CNN

The Massive Online Course MOOC Microsoft: DAT236x Deep Learning in EDX is an amazing course where Deep learning with CNTK concepts are explained in details with Python in iPython notebooks. A typical hello world problem statement covered throughout the course on OCR and MNIST data is below :


Optical Character Recognition (OCR) is a hot area research and there is a great demand for automation. The MNIST data comprises of handwritten digits with little background noise making it a nice dataset to create, experiment and learn deep learning models with reasonably small computing resources.

Sample CNN

Explained in detail:

Typical use case which was replicated:

Signature verification is used extensively in fraud analysis and an common biometric technique that aims to detect whether a given signature is forged or genuine. It is integral in preventing falsification of documents in numerous financial, legal, and other commercial settings. Herein the process of signature verification by using convolutional neural networks (CNNs) is explored and found perfectly suitable.

Problem statement:

Biometric authentication is the process of verifying the identity of individuals based on their unique biological characteristics. It has become a ubiquitous standard for access to high security systems. Current methods in machine learning and statistics have allowed for the reliable automation of many of these tasks (face verification, fingerprinting and iris recognition). Among the numerous tasks used for biometric authentication is signature verification, which aims to detect whether a given signature is genuine or forged. Signature verification is essential in preventing falsification of documents in numerous financial, legal, and other commercial settings.

The task presents several unique difficulties: high intra-class variability (an individual’s signature may vary greatly day-to-day), large temporal variation (signature may change completely over time), and high inter-class similarity (forgeries, by nature, attempt to be as indistinguishable from genuine signatures as possible).

There exist two types of signature verification: online and offline. Online verification requires an electronic signing system which provides data such as the pen’s position, azimuth/altitude angle, and pressure at each time-step of the signing. By contrast, offline verification uses solely 2D visual (pixel) data acquired from scanning signed documents. While online systems provide more information to verify identity, they are less versatile and can only be used in certain contexts (e.g. transaction authorization) because they require specific input systems.


In the CNN, the features fed into the final linear classifier are all learned from the dataset. The CNN consisted of a number of layers, starting at the raw image pixels, which each perform a simple computation and feed the result to the next layer, with the final result being fed to a linear classifier. The layers’ computations are based on a number of parameters which are learned through the process of backpropagation, in which for each parameter, the gradient of the classification loss with respect to that parameter is computed and the parameter is updated with the goal of minimizing the loss function.

Challenges & Learnings

When classifying whether a given signature was a forgery or genuine, we achieved accuracies of 96%. Similar work has already been done and existing models can be reused as in Ref4 and Ref2 (in references)

1.Storage – The deep neural network architecture with 70 visible units and two layers with 70 hidden units each was able to extract layer-by-layer high-level representations of the images. The images were cropped to the reasonable size of (150 by 250).

2.Big data – Large amounts of signature specimen are needed as input

3.Cost – The cost of training was very high increasing with the number of epochs and with the number of hidden units. Therefore it was only possible to test with 7 folders of signatures although there is room for improvements.

4.GPU – Despite the great prospect of deep learning technologies future work will involve an extensive study to cope with the millions of parameters that need to be adjusted, in particular, with the use of Graphics Processing Units (GPU)


  • Ref1: Microsoft: DAT236x Deep Learning Explained
  • Ref2:
  • Ref3: Offline Signature Verification with Convolutional Neural Networks by Gabe Alvarez Blue Sheffer and Morgan Bryant
  • Ref4: Deep Learning Networks for Off-Line Handwritten Signature Recognition Bernardete Ribeiro, Ivo Gonçalves, and Sérgio Santos, and Alexander Kovacec