I have an idea for a small research paper; I wonder if you or one of your research members would be interested in helping and supporting this idea in at any research centers that you collaborate with.
The idea basically is about hash functions and files indexing for search engine, AI, media recognition,...etc.
The hash function such as SHA, MD5,...etc gives you the hash of the created files. For example, if you have identical files contents but created in different times, you will have to different hashes from the existing hash functions.
The idea is to read the bytes for each file format and give its hash. In case, we have two different files with the same contents, we will end up with the same hash.
This will make it easy when we compare and contrast files during searches. It is basically like adding a special case in the search to enhance accuracy. Similar to the different implementation of quick sort algorithm; if the collection items are less than a threshold a non-recursive algorithm is used like insertion, selection, or bubble sort. Even though the new implementation uses dual-pivot algorithm and merge sort algorithm as improvement for parallelism, the non-recursive algorithm for few items in collections is preferable.
I want this paragraph to be found in the cyber no matter what application that it was written by or which file format it has PDF,DOCX,JPG,..ect.
If we hash the files we get
To have an idea about what I mean. Check this photo ME or this ME
On of them is the same as the one in my LinkedIn profile LinkedIN http://linkedin.com/in/mohammad-alkahtani-856973185
I uploaded the photo but I was reformatted to conform to LinkedIn style when searching for the same photo in image reserve search, you will get many images but if we used the hash for both images you, will get the exact image and the search engine will execute the AI(Artificial Intelligent) and ML(machine learning) to search for other images that has similarities.