Friday, January 4, 2013

Mess with reCAPTCHA - All you need to know about these time-wasters.

The fault of reCAPTCHA lies in the fact that it is used to digitize non-digital texts maybe to OCR the web, as well as stop spam. What this means is that in every captcha, there will be two words: One that the computer knows is right and will compare and check your text against and one that it hopes to use to digitize text. In other words, reCAPTCHA only needs one word of your captcha to be correct for your captcha to be accepted.
You will be given two words: [Real, Fake] or [Fake, Real]. The fake word is unknown to the computer and can be replaced with anything, or you don't have to type it at all.
Try it here:

1. The fake word is usually the one which is blurrier and harder to read, even if by a little. However, sometimes it is the one which is unusually clean and easy to read though the quality of the scanned words varies greatly.
2. Real words usually use the same type of font throughout as they are computer generated, while the appearance and font for fake words can vary greatly as they are scanned from multiple sources.
3. The fake word is usually thicker, bolder, and blacker. But sometimes it is also thin and long.
4. A fake word's alignment of letters are more likely to be in a straighter line or a smoother curve as it is scanned from a printed material. A real word's alignment of letters are more likely to be wavy and a bit jumbled up due to being distorted by a computer.
5. You'll sometimes get words with lots of noticeable dots around them. They are obviously scanned from books and therefore, fake.
6. The fake word came out of a book. Words like "chiteHa" or "eriATV" are obviously real.
7. Practice! Once you start out, You'll have difficulty identifying which captcha is real, But after doing a few dozen, You'll be proficient in picking out fakes and this will be such a great time saver for you.
Important info from reCAPTCHA Science:
CAPTCHA = Completely Automated Public Turing test to tell Computers and Humans Apart
OCR = Optical Character Recognition software
To account for human error in the digitization process, reCAPTCHA sends every suspicious word to multiple users, each time with a different random distortion. At first, it is displayed as an unknown word. If a user enters the correct answer to the associated control word, the user's other answer is recorded as a plausible guess for the unknown word. If the first three human guesses match each other, but differ from both of the OCRs' guesses, then (and only then) the word becomes a control word in other challenges. In case of discrepancies among human answers, reCAPTCHA sends the word to more humans as an "unknown word" and picks the answer with the highest number of "votes," where each human answer counts as one vote and each OCR guess counts as one half of a vote (recall that these words all have been previously processed by OCR).

Important info from reCAPTCHA Security:

Our service also includes IP address filtering and detection. If we determine that a given IP address is successfully solving too many CAPTCHAs in a certain period of time, the address is immediately flagged for review. In addition, by providing CAPTCHA services to many customers we obtain a global view of spamming attacks, allowing us to react quickly to security threats.

No comments: