Archive for the ‘Unicode’ Category

X5S V2.0…. its coming!

January 3rd, 2011 by

So, It’s been awhile since we’ve done any public updates to X5S. Over the last year, I’ve improved the algorithm and process significantly. Be on the look out, it should be released within the next couple of weeks (Sometime in Jan. 11).

Some of the improvements include:
* Better Algorithms for doing checks
* Better output format .. Now uses a tree view.. Going to add better support for reporting too..
* Cleaner UI (Easier to use)
* Re-factored the code to be cleaner/make more sense and easier to maintain. It’s much easier to understand/work with.. before was mostly prototyped code/ Alpha code.
* Changed how test cases are defined for more control over the types of injects
* Added a fuzzing mode that will take data from a file and inject it where canaries would normally be injected. (This can be slow with lots of injections)
* Added a replay from Fiddler capture.. (Replays the capture while fuzzing/injecting on the requests).

* many many more minor/significant changes.. =)

Check back soon for a release date!

List of characters for testing Unicode transformations and best-fit mapping to dangerous ASCII

December 20th, 2010 by

I’m attaching two CSV files for use in test cases and tools.  The uni2asc.csv contains all of the Unicode characters that map to something ASCII < 0×80.  The bestfit.csv  contains all of the known best-fit  mappings to dangerous ASCII between legacy charsets and Unicode.

uni2asc.csv – for straight Unicode to Unicode mappings
bestfit.csv – for legacy charset to Unicode mappings

I gave these to Gareth so they may wind up in HackVertor.

The Unicode database contains meta data about every character, including compatibility mappings, normalization mappings, case mappings, and other decomposition data.  It’s useful for testing to know what special Unicode characters may transform to dangerous ASCII.  For example:

  • U+2134 SCRIPT SMALL O character will transform to the U+006F LATIN SMALL LETTER in certain cases

Of course, if you’re testing for SQL injection or XSS you probably want to know what transforms to dangerous characters like ‘ and <.  We attempted to automate some of this in our x5s tool which has done a good job so far, and we have a big update for that coming soon.

In the bestfit.csv file you’ll find all of best-fit mappings from Unicode to dangerous ASCII < 0×80 (and vice versa) in many of the legacy charsets from http://unicode.org/Public/MAPPINGS/.  There’s some wild legacy stuff in here.  For example:

  • In APL-ISO-IR-68, 0×27 maps to 0x5D in Unicode, and vice versa.

If you put these to use anywhere please let me know so I can pass the word along.

In APL-ISO-IR-68, 0×27 maps to 0x5D in Unicode, and vice versa.

Microsoft wins legal dispute over Bing.com IDN lookalike

November 3rd, 2010 by

A couple years ago I tried registering IDNs (Internationalized Domain Names) that were visually identical or similar to popular sites like mozilla.org, bing.com, and google.com. What I found was that I wasn’t the only one doing this. For me, it was just to demonstrate the possibilities for visual spoofing in modern user-agents, similar to what we saw in 2005 with the paypal.com spoof.

I don’t think this recent legal decision made the news anywhere, but Microsoft filed a complaint that a registered domain name www.bıng.com was confusingly similar to its www.bing.com brand. In case it’s hard to see, the issue here is with the dotless ‘i’ in the lookalike domain. In that domain, the registrant used Unicode character U+0131 LATIN SMALL LETTER DOTLESS I in place of the usual U+0069 LATIN SMALL LETTER I in bing.com.

Microsoft won the case on valid merits, and as far as we know there was no harm done. That is, I haven’t heard any news of a phishing attack that utilized this domain name. It’s easy to imagine the extent of harm possible through a phishing/luring/schmoozing/whatever attack that utilizes confusing IDNs across the context of email clients, web browsers, and other user-agents. A well-thought attack could be surprisingly effective.

Detecting malicious URL obfuscation techniques in spam

October 12th, 2010 by

URLs offer loads of fun for pranks, hacks, and spam.  The reasons are numerous and inherent in their structural and visual complexity.  Add IDNs to the mix and the fun-factor just doubled.  But this isn’t about IDNs.  It’s recently been noted by Symantec that spammers are using the soft hyphen character to obfuscate URLs and bypass anti-spam filters.

It’s a neat trick that plays into the widely divergent implementation details of this specific character.  In Unicode the soft hyphen is U+00AD but its problem handling in browsers and email clients involves some confusions around its specification in other character sets such as ISO-8859-1 as well as HTML 4. 

The fun shouldn’t stop with soft hyphens though.  There seem to be many interesting ways content inspection filters could be bypassed using characters with special meanings and others with special transformative properties.  I haven’t taken the time to do any thorough testing here, but my IDN and IRI spoofing test page has some examples of what I’m talking about.  If you think of the test cases as plain string content instead of IDNs you can imagine some of the other ways which content filters might be confused.

Looking at the Normalization tests on that page one can see that valid Unicode characters like the Ⓞ get normalized (as hyperlinks) to a Latin small letter ‘o’ by Web browsers through a standard process defined by IDNA2003, namely stringprep with a nameprep profile applied.  That’s just the tip of the iceberg, and still more possibilities for abuse exist.

These issues are why we created the UCAPI library for detecting string confusability.  I wonder how many content inspection products are looking at strings in this way?

IDNA2008 hits the standards track – visually confusing strings remain a threat

August 31st, 2010 by

After many years of engineering efforts, the Internationalizing Domain Names in Applications (IDNA) protocol had a major update released from its original 2003 standard. Although named IDNA2008, it hit the standards track in August 2010. It’s worth noting in section “4.4 Visually Confusable Characters” of RFC 5890:

It is worth noting that there are no comprehensive technical solutions to the problems of confusable characters. One can reduce the extent of the problems in various ways, but probably never eliminate it.

Taken out of context this may sound hopeless, but the RFC goes on to reference Unicode TR36 as providing a set of suggestions for mitigating string confusability. It’s in this vein that Casaba has built UCAPI which provides an implementation of the Unicode Consortium’s suggestions as well as defensive techniques from our own learnings.

I can imagine that we will one day see a wide-spread attack that leverages string confusability – or maybe – we won’t see it because it’ll blend in so well as to be undetectable.

New registrations of Internationalized Domain Names are expected to increase radicallly over time as ICANN has opened up ccTLD support for Unicode and IDN, as well as gTLD. As more TLDs become provisioned in native scripts, it’s expected that they will support the expansion of many more internationalized domain names.

What are registrars doing now to protect customers from lookalike attacks on their brand? Is it their responsibility? Who’s is it? Many organizations including ICANN are making suggestions, but is anyone listening?

Unicode security vulnerabilities – presentation from Internationalization and Unicode Conference 33

October 20th, 2009 by

I'm attaching my slides from the Unicode conference last week in San Jose, California. I'm getting much feedback for code-level action items. Providing details for code review and static analysis is in the works, with a focus on major frameworks such as ICU, .NET, and Java.

You can download the presentation here.

Unibomber tool for specialized XSS testing

July 28th, 2009 by

John Hernandez has been working hard at Casaba to build a specialized testing tool that automates some of the unique techniques we use to find cross-sites scripting bugs (XSS). At Black Hat I'm planning to demo what we have so far. It automates the testing process greatly, by auto-injecting a canary and ID into each input be it query string, HTTP header, or POST parameter. By combining injection with 'output encoding' detection, you get automation that assists pen-testers in finding vulnerability hotspots.

Because it basically bombs a Web-app with a slew of Unicode characters to find XSS bugs we named it the Unibomber.

Appended to the canary is a special character – special because it can transform into a 'dangerous' character through normalization, casing, or best-fit mapping operations. So we end up injecting these special characters all over the place and then detecting where they get transformed and displayed as output.

The beauty is that we can find both reflected and persistent XSS bugs this way. It's not a one-click tool though, this is intended for use by an experienced person who knows how to find and exploit a clever XSS bug.

Anyone who looks for XSS will likely find some good bugs with the Unibomber. We sure have!

32nd Internationalization and Unicode Conference presentation on Exploiting Unicode-enabled Software

September 11th, 2008 by

I'm glad to have had the chance to present at the Unicode conference yesterday, and meet all the wonderful people there.
You can download the presentation slides here for Exploiting Unicode-enabled software.

 

Generating test cases for Unicode-enabled software

September 10th, 2008 by

When it comes to Unicode implementations, there’s a rich set of test
cases to perform. Realizing it is the start. Automating it is the next
step.

At a high-level Unicode-related security bugs can be categorized into the following root-causes:

Canonicalization

  • Interpreting non-shortest form (e.g .UTF-8 encoding trickery)
  • Other decoding issues

Absorption (over-consumption)

  • Over-consuming invalid byte sequences or correcting rather than failing
  • When <41 C2 C3 B1 42>  becomes <41 42>

Character deletion and swallowing

  • “deletion of noncharacters” (UTR-36)
  • <scr[U+FEFF]ipt> becomes <script>
  • Use replacement characters instead!

Interpreting Syntax replacements

  • white space and line feeds
  • E.g. when U+180E acts like U+0020

Best-fit mappings

  • When σ becomes s
  • When ′ becomes ‘

Buffer overruns

  • Incorrect assumptions about string sizes (chars vs. bytes)
  • Improper width calculations

Timing issues

  • handling Unicode after security gates
  • Sometimes handling Unicode before a gate can be a problem too! E.g. BOM handling

Unicode formatter characters lead to cross-site scripting in popular browsers

September 5th, 2008 by

I'll be discussing some of the issues recently reported to Opera, Apple, and Mozilla at the 32nd Unicode Conference in San Jose next week. We discovered some issues with the way certain Unicode characters could be leveraged to enable cross-site scripting attacks in popular web browsers (aka User-Agents). These issues involve utilizing Unicode characters in ways which might bypass most filters, IPS, and IDS systems.