Launchpad.ai
HomeServices
Training
FellowshipUpskill
SolutionsBlogTeamLevi's UpskillContact
Contact us
Posted on 
May 7, 2017

Deep Learning Approach to Fraud

Data scientists rely on domain knowledge and intuition to create feature spaces for adversarial use cases like fraud and infosec. The hope is that feature spaces made up of business attributes and statistical summaries can be used for anomaly / outlier detection by delineating normal and extraordinary user behaviors (Hooi).

Hooi offers the following feature space for "online commerce, in which fraudulent sellers write or purchase fake reviews to manipulate perception of their products and services."

Once the feature space is defined, a variety of statistical and unsupervised machine learning approaches are available for anomaly detection including:

DBSCAN
MeanShift
Gaussian Mixture Model (GMM)
Principal components analysis (PCA)
One-class SVM
Z-Score and Median absolute deviation (MAD)
Hierarchical clustering
Hidden Markov Model (HMM)
Self-Organizing Maps (SOM)

However when fully labeled datasets are compared to the anomaly scores of these methods, it is observed that their detection rate is rather poor; measured as an F1 score, often less then 0.5 (Domingues).  

Sölch points out one of the biggest drawbacks of such anomaly detection approaches is the common assumption "that data streams are i.i.d. in time and/or space."  Creating a feature space that breaks this assumption may be beyond the ability of even the most skilled practitioners.

In many other domains like computer vision, speech recognition, language translation it has been observed that feature learning can produce significantly better results compared to manually engineered inputs. However these domains have the benefit of labeled datasets.

In fraud labeled datasets are often hard or impossible to come by. Instead autoencoders and other generative approaches are being studied as a possible mechanism for anomaly detection. Autoencoder is a neural network that is trained by reconstructing the original inputs in a smaller hidden space. Anomaly detection can be accomplished with an autoencoder by using the reconstruction probability or reconstruction error (An).

Illustration of a stacked auto-encoder trained on high dimensional data (Berniker)

To capture the spatial and temporal aspects of the data within the autoencoder, more sophisticated neural network architectures can be used including convnets and recurrent neural networks (Xie).

From limited testing, An and Cho show significant improvement of F1 scores using a variational autoencoder over PCA. Thus the hidden units that these autoencoders learn may be a promising way to model the ingenuity of an active adversary.  

References

An, Jinwon, and Sungzoon Cho. "Variational Autoencoder based Anomaly Detection using Reconstruction Probability." (2015).

Berniker, Max, and Konrad P. Kording. "Deep networks for motor control functions." Frontiers in computational neuroscience 9 (2015).

Domingues, Rémi. "Machine Learning for Unsupervised Fraud Detection." (2015).

Hooi, Bryan, et al. "BIRDNEST: Bayesian Inference for Ratings-Fraud Detection." arXiv preprint arXiv:1511.06030 (2015).

Sölch, Maximilian, et al. "Variational Inference for On-line Anomaly Detection in High-Dimensional Time Series." arXiv preprint arXiv:1602.07109 (2016).

Xie, Jianwen, et al. "A theory of generative convnet." arXiv preprint arXiv:1602.03264 (2016).

‍

Tagged:
Deep Learning
Fraud
CNN
Arshak Navruzyan
CEO
view All Posts
Featured Posts
Reimagining Online Shopping Using Multimodal Search Engines
How to Make Data-Driven Decisions with Contextual Bandits: The Case for Bayesian Inference
Will AI Change the Role of Cybersecurity?
Deep Learning Approach to Fraud
Tags
CycleGAN
CNN
Cyber-Security
Deep Learning
Fraud
Generative Adversarial Networks
Image Translation
LSTM
Machine Learning
Multi-modal
Productivity
Reinforcement-learning
More Posts

You Might Also Like

PSOL for Weakly Supervised Object Localization
Dec 21, 2020
 by 
Fellowship Team
The Usage of CycleGAN for Image Translation to Increase the Size of Fridge Food Types Dataset
Dec 19, 2020
 by 
Fellowship Team
A case for custom deep learning solutions
Dec 17, 2020
 by 
Fellowship Team
Reimagining Online Shopping Using Multimodal Search Engines
Nov 2, 2020
 by 
Divyam Malay Shah
Understanding Evolved Policy Gradients
Nov 2, 2018
 by 
Michael Klear
How to Make Data-Driven Decisions with Contextual Bandits: The Case for Bayesian Inference
Oct 11, 2018
 by 
Michael Klear
Launchpad.AI

We create transformative solutions that deliver valuable business results

HomeTeamServicesBlog
SolutionsFellowshipUpskill
© 2014-2020 Launchpad.AI, Inc.
Privacy
Terms of Use
Credits