Bulk Extractor:
is to locate potentially sensitive information such as email addresses and credit card numbers, as well as other types of information such as GPS coordinates and image file types.Bulk extractor ignores the file system and scans it linearly. This, in combination with parallel processing, makes the tool very fast. It will have an issue with fragmented files, but typically, files aren’t fragmented.
bulk_extractor can be used on Windows, Linux, and Macintosh OS X platforms.
This page contains instructions for downloading, building and installing bulk_extractor on Linux and OS X, and for downloading and installing the bulk_extractor binary on Windows. If you would like to build your own Windows binary
bulk_extractor is a C++ program that scans a disk image, a file, or a directory of files and extracts useful information without parsing the file system or file system structures. The results are stored in feature files that can be easily inspected, parsed, or processed with automated tools. bulk_extractor also creates histograms of features that it finds, as features that are more common tend to be more important.
We have made the following tools available for processing feature files generated by bulk_extractor:
- A a small number of python programs that perform automated processing on feature files.
- A Bulk Extractor Viewer User Interface (BEViewer) for browsing features stored in feature files and for launching bulk_extractor scans. Please see page BEViewer.
Installation Steps for Windows / Linux :
Output Feature Files
bulk_extractor now creates an output directory that has the following layout:alerts.txt | Processing errors. |
ccn.txt | Credit card numbers |
ccn_track2.txt | Credit card “track 2″ informaiton, which has previously been found in some bank card fraud cases. |
domain.txt | Internet domains found on the drive, including dotted-quad addresses found in text. |
email.txt | Email addresses. |
ether.txt | Ethernet MAC addresses found through IP packet carving of swap files and compressed system hibernation files and file fragments. |
exif.txt | EXIFs from JPEGs and video segments. This feature file contains all of the EXIF fields, expanded as XML records. |
find.txt | The results of specific regular expression search requests. |
identified_blocks.txt | Block hash values that match hash values in a hash database that the scan was run against. |
ip.txt | IP addresses found through IP packet carving. |
rfc822.txt | Email message headers including Date:, Subject: and Message-ID: fields. |
tcp.txt | TCP flow information found through IP packet carving. |
telephone.txt | US and international telephone numbers. |
url.txt | URLs, typically found in browser caches, email messages, and pre-compiled into executables. |
url_searches.txt | A histogram of terms used in Internet searches from services such as Google, Bing, Yahoo, and others. |
url_services.txt | A histogram of the domain name portion of all the URLs found on the media. |
wordlist.txt | A list of all “words” extracted from the disk, useful for password cracking. |
wordlist_*.txt | The wordlist with duplicates removed, formatted in a form that can be easily imported into a popular password-cracking program. |
zip.txt | A file containing information regarding every ZIP file component found on the media. This is exceptionally useful as ZIP files contain internal structure and ZIP is increasingly the compound file format of choice for a variety of products such as Microsoft Office |
Download Link :
http://downloads.digitalcorpora.org/downloads/bulk_extractor/
https://www.kazamiya.net/en/bulk_extractor-rec
https://github.com/simsong/bulk_extractor