<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Casaba Security &#187; Unicode</title>
	<atom:link href="http://www.casaba.com/blog/category/unicode/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.casaba.com/blog</link>
	<description>Building and breaking software and robots</description>
	<lastBuildDate>Wed, 11 Jan 2012 18:08:45 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>X5S V2.0&#8230;. its coming!</title>
		<link>http://www.casaba.com/blog/2011/01/x5s-v2-0-its-coming/</link>
		<comments>http://www.casaba.com/blog/2011/01/x5s-v2-0-its-coming/#comments</comments>
		<pubDate>Mon, 03 Jan 2011 16:59:22 +0000</pubDate>
		<dc:creator>John Hernandez</dc:creator>
				<category><![CDATA[Code Review]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[Nebulous]]></category>
		<category><![CDATA[Security Testing]]></category>
		<category><![CDATA[Tools]]></category>
		<category><![CDATA[Unicode]]></category>

		<guid isPermaLink="false">http://www.casaba.com/blog/?p=233</guid>
		<description><![CDATA[So, It&#8217;s been awhile since we&#8217;ve done any public updates to X5S. Over the last year, I&#8217;ve improved the algorithm and process significantly. Be on the look out, it should be released within the next couple of weeks (Sometime in Jan. 11). Some of the improvements include: * Better Algorithms for doing checks * Better [...]]]></description>
			<content:encoded><![CDATA[<p>So, It&#8217;s been awhile since we&#8217;ve done any public updates to X5S. Over the last year, I&#8217;ve improved the algorithm and process significantly. Be on the look out, it should be released within the next couple of weeks (Sometime in Jan. 11).</p>
<p>Some of the improvements include:<br />
* Better Algorithms for doing checks<br />
* Better output format .. Now uses a tree view.. Going to add better support for reporting too..<br />
* Cleaner UI (Easier to use)<br />
* Re-factored the code to be cleaner/make more sense and easier to maintain. It&#8217;s much easier to understand/work with.. before was mostly prototyped code/ Alpha code.<br />
* Changed how test cases are defined for more control over the types of injects<br />
* Added a fuzzing mode that will take data from a file and inject it where canaries would normally be injected. (This can be slow with lots of injections)<br />
* Added a replay from Fiddler capture.. (Replays the capture while fuzzing/injecting on the requests). </p>
<p>* many many more minor/significant changes..  =)</p>
<p>Check back soon for a release date!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.casaba.com/blog/2011/01/x5s-v2-0-its-coming/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>List of characters for testing Unicode transformations and best-fit mapping to dangerous ASCII</title>
		<link>http://www.casaba.com/blog/2010/12/list-of-characters-for-testing-unicode-transformations-and-best-fit-mapping-to-dangerous-ascii/</link>
		<comments>http://www.casaba.com/blog/2010/12/list-of-characters-for-testing-unicode-transformations-and-best-fit-mapping-to-dangerous-ascii/#comments</comments>
		<pubDate>Mon, 20 Dec 2010 18:54:58 +0000</pubDate>
		<dc:creator>Chris Weber</dc:creator>
				<category><![CDATA[Security Testing]]></category>
		<category><![CDATA[Unicode]]></category>
		<category><![CDATA[bestfit]]></category>
		<category><![CDATA[SQL injection]]></category>
		<category><![CDATA[XSS]]></category>

		<guid isPermaLink="false">http://www.casaba.com/blog/?p=239</guid>
		<description><![CDATA[I&#8217;m attaching two CSV files for use in test cases and tools.  The uni2asc.csv contains all of the Unicode characters that map to something ASCII &#60; 0&#215;80.  The bestfit.csv  contains all of the known best-fit  mappings to dangerous ASCII between legacy charsets and Unicode. uni2asc.csv &#8211; for straight Unicode to Unicode mappings bestfit.csv &#8211; for [...]]]></description>
			<content:encoded><![CDATA[<div>
<p>I&#8217;m attaching two CSV files for use in test cases and tools.  The uni2asc.csv contains all of the Unicode characters that map to something ASCII &lt; 0&#215;80.  The bestfit.csv  contains all of the known best-fit  mappings to dangerous ASCII between legacy charsets and Unicode.</p>
<p><a href="http://www.lookout.net/wp-content/uploads/2010/12/uni2asc.csv">uni2asc.csv</a> &#8211; for straight Unicode to Unicode mappings<br />
<a href="http://www.lookout.net/wp-content/uploads/2010/12/bestfit.csv">bestfit.csv</a> &#8211; for legacy charset to Unicode mappings</p>
<p>I gave these to Gareth so they may wind up in <a href="http://hackvertor.co.uk/public">HackVertor</a>.</p>
<p>The Unicode database contains meta data about every character, including compatibility mappings, normalization mappings, case mappings, and other decomposition data.  It&#8217;s useful for testing to know what special Unicode characters may transform to dangerous ASCII.  For example:</p>
<ul>
<li>U+2134 SCRIPT SMALL O character will transform to the U+006F LATIN SMALL LETTER in certain cases</li>
</ul>
<p>Of course, if you&#8217;re testing for SQL injection or XSS you probably want to know what transforms to dangerous characters like &#8216; and &lt;.  We attempted to automate some of this in our <a href="http://xss.codeplex.com/">x5s tool</a> which has done a good job so far, and we have a big update for that coming soon.</p>
<p>In the bestfit.csv file you&#8217;ll find all of best-fit mappings from Unicode to dangerous ASCII &lt; 0&#215;80 (and vice versa) in many of the legacy charsets from <a href="http://unicode.org/Public/MAPPINGS/">http://unicode.org/Public/MAPPINGS/</a>.  There&#8217;s some wild legacy stuff in here.  For example:</p>
<ul>
<li>
<div id="_mcePaste">In APL-ISO-IR-68, 0&#215;27 maps to 0x5D in Unicode, and vice versa.</div>
</li>
</ul>
<p>If you put these to use anywhere please let me know so I can pass the word along.</p>
</div>
<p>In APL-ISO-IR-68, 0&#215;27 maps to 0x5D in Unicode, and vice versa.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.casaba.com/blog/2010/12/list-of-characters-for-testing-unicode-transformations-and-best-fit-mapping-to-dangerous-ascii/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Microsoft wins legal dispute over Bing.com IDN lookalike</title>
		<link>http://www.casaba.com/blog/2010/11/microsoft-wins-legal-dispute-over-bing-com-idn-lookalike/</link>
		<comments>http://www.casaba.com/blog/2010/11/microsoft-wins-legal-dispute-over-bing-com-idn-lookalike/#comments</comments>
		<pubDate>Wed, 03 Nov 2010 19:52:25 +0000</pubDate>
		<dc:creator>Chris Weber</dc:creator>
				<category><![CDATA[Unicode]]></category>
		<category><![CDATA[confusables]]></category>
		<category><![CDATA[IDN]]></category>

		<guid isPermaLink="false">http://www.casaba.com/blog/?p=228</guid>
		<description><![CDATA[A couple years ago I tried registering IDNs (Internationalized Domain Names) that were visually identical or similar to popular sites like mozilla.org, bing.com, and google.com. What I found was that I wasn&#8217;t the only one doing this. For me, it was just to demonstrate the possibilities for visual spoofing in modern user-agents, similar to what [...]]]></description>
			<content:encoded><![CDATA[<p>A couple years ago I tried registering IDNs (Internationalized Domain Names) that were visually identical or similar to popular sites like mozilla.org, bing.com, and google.com.  What I found was that I wasn&#8217;t the only one doing this.  For me, it was just to demonstrate the possibilities for visual spoofing in modern user-agents, similar to what we saw in 2005 with the paypal.com spoof.</p>
<p>I don&#8217;t think this recent legal decision made the news anywhere, but Microsoft filed a complaint that a registered domain name <a href="http://www.bıng.com">www.bıng.com</a> was <a href="http://domains.adrforum.com/domains/decisions/1305319.htm">confusingly similar to its <a href="http://www.bing.com">www.bing.com</a> brand</a>.  In case it&#8217;s hard to see, the issue here is with the dotless &#8216;i&#8217; in the lookalike domain.  In that domain, the registrant used Unicode character U+0131 LATIN SMALL LETTER DOTLESS I in place of the usual U+0069 LATIN SMALL LETTER I in bing.com.  </p>
<p>Microsoft won the case on valid merits, and as far as we know there was no harm done.  That is, I haven&#8217;t heard any news of a phishing attack that utilized this domain name.  It&#8217;s easy to imagine the extent of harm possible through a phishing/luring/schmoozing/whatever attack that utilizes confusing IDNs across the context of email clients, web browsers, and other user-agents.  A well-thought attack could be surprisingly effective.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.casaba.com/blog/2010/11/microsoft-wins-legal-dispute-over-bing-com-idn-lookalike/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Detecting malicious URL obfuscation techniques in spam</title>
		<link>http://www.casaba.com/blog/2010/10/detecting-malicious-url-obfuscation-techniques-in-spam/</link>
		<comments>http://www.casaba.com/blog/2010/10/detecting-malicious-url-obfuscation-techniques-in-spam/#comments</comments>
		<pubDate>Tue, 12 Oct 2010 20:12:19 +0000</pubDate>
		<dc:creator>Chris Weber</dc:creator>
				<category><![CDATA[Malware]]></category>
		<category><![CDATA[Unicode]]></category>
		<category><![CDATA[confusables]]></category>
		<category><![CDATA[spam]]></category>
		<category><![CDATA[UCAPI]]></category>

		<guid isPermaLink="false">https://www.casabasecurity.com/blog/2010/10/detecting-malicious-url-obfuscation-techniques-in-spam/</guid>
		<description><![CDATA[URLs offer loads of fun for pranks, hacks, and spam.&#160; The reasons are numerous and inherent in their structural and visual complexity.&#160; Add IDNs to the mix and the fun-factor just doubled.&#160; But this isn’t about IDNs.&#160; It’s recently been noted by Symantec that spammers are using the soft hyphen character to obfuscate URLs and [...]]]></description>
			<content:encoded><![CDATA[<p>URLs offer loads of fun for pranks, hacks, and spam.&#160; The reasons are numerous and inherent in their structural and visual complexity.&#160; Add IDNs to the mix and the fun-factor just doubled.&#160; But this isn’t about IDNs.&#160; It’s recently been noted by Symantec that spammers are using the <a href="http://www.symantec.com/connect/blogs/soft-hyphen-new-url-obfuscation-technique">soft hyphen character to obfuscate URLs</a> and bypass anti-spam filters.</p>
<p>It’s a neat trick that plays into the widely divergent <a href="http://www.cs.tut.fi/~jkorpela/shy.html">implementation details of this specific character</a>.&#160; In Unicode the soft hyphen is U+00AD but its problem handling in browsers and email clients involves some confusions around its specification in other character sets such as ISO-8859-1 as well as HTML 4.&#160; </p>
<p>The fun shouldn’t stop with soft hyphens though.&#160; There seem to be many interesting ways content inspection filters could be bypassed using characters with special meanings and others with special transformative properties.&#160; I haven’t taken the time to do any thorough testing here, but my <a href="http://www.lookout.net/test-cases/idn-and-iri-spoofing-tests/">IDN and IRI spoofing test page</a> has some examples of what I’m talking about.&#160; If you think of the test cases as plain string content instead of IDNs you can imagine some of the other ways which content filters might be confused.</p>
<p>Looking at the Normalization tests on that page one can see that valid Unicode characters like the Ⓞ get normalized (as hyperlinks) to a Latin small letter ‘o’ by Web browsers through a standard process defined by IDNA2003, namely stringprep with a nameprep profile applied.&#160; That’s just the tip of the iceberg, and still more possibilities for abuse exist.</p>
<p>These issues are why we created the UCAPI library for <a href="http://www.casabasecurity.com/products/UCAPI/">detecting string confusability</a>.&#160; I wonder how many content inspection products are looking at strings in this way?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.casaba.com/blog/2010/10/detecting-malicious-url-obfuscation-techniques-in-spam/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>IDNA2008 hits the standards track &#8211; visually confusing strings remain a threat</title>
		<link>http://www.casaba.com/blog/2010/08/idna2008-hits-the-standards-track-visually-confusing-strings-remain-a-threat/</link>
		<comments>http://www.casaba.com/blog/2010/08/idna2008-hits-the-standards-track-visually-confusing-strings-remain-a-threat/#comments</comments>
		<pubDate>Tue, 31 Aug 2010 18:27:09 +0000</pubDate>
		<dc:creator>Chris Weber</dc:creator>
				<category><![CDATA[Tools]]></category>
		<category><![CDATA[Unicode]]></category>
		<category><![CDATA[confusables]]></category>
		<category><![CDATA[IDN]]></category>

		<guid isPermaLink="false">http://www.casabasecurity.com/blog/?p=216</guid>
		<description><![CDATA[After many years of engineering efforts, the Internationalizing Domain Names in Applications (IDNA) protocol had a major update released from its original 2003 standard. Although named IDNA2008, it hit the standards track in August 2010. It&#8217;s worth noting in section &#8220;4.4 Visually Confusable Characters&#8221; of RFC 5890: It is worth noting that there are no [...]]]></description>
			<content:encoded><![CDATA[<p>After many years of engineering efforts, the Internationalizing Domain Names in Applications (IDNA) protocol had a major update released from its original 2003 standard.  Although named IDNA2008, it hit the standards track in August 2010.  It&#8217;s worth noting in section &#8220;<a href="http://tools.ietf.org/html/rfc5890#section-4.4">4.4 Visually Confusable Characters</a>&#8221; of <a href="http://tools.ietf.org/html/rfc5890">RFC 5890</a>:</p>
<blockquote><p>It is worth noting that there are no comprehensive technical solutions to the problems of confusable characters.  One can reduce the extent of the problems in various ways, but probably never eliminate it.</p></blockquote>
<p>Taken out of context this may sound hopeless, but the RFC goes on to reference Unicode TR36 as providing a set of suggestions for mitigating <a href="http://www.casabasecurity.com/products/UCAPI/">string confusability</a>.  It&#8217;s in this vein that Casaba has built <a href="http://www.casabasecurity.com/products/UCAPI">UCAPI </a>which provides an implementation of the Unicode Consortium&#8217;s suggestions as well as defensive techniques from our own learnings.</p>
<p>I can imagine that we will one day see a wide-spread attack that leverages string confusability &#8211; or maybe &#8211; we won&#8217;t see it because it&#8217;ll blend in so well as to be undetectable.</p>
<p>New registrations of Internationalized Domain Names are expected to increase radicallly over time as ICANN has opened up ccTLD support for Unicode and IDN, as well as gTLD.   As more TLDs become provisioned in native scripts, it&#8217;s expected that they will support the expansion of many more internationalized domain names.</p>
<p>What are registrars doing now to protect customers from lookalike attacks on their brand?  Is it their responsibility?  Who&#8217;s is it?  Many organizations including ICANN are making suggestions, but is anyone listening?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.casaba.com/blog/2010/08/idna2008-hits-the-standards-track-visually-confusing-strings-remain-a-threat/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Unicode security vulnerabilities &#8211; presentation from Internationalization and Unicode Conference 33</title>
		<link>http://www.casaba.com/blog/2009/10/unicode-security-vulnerabilities-presentation-from-internationalization-and-unicode-conference-33/</link>
		<comments>http://www.casaba.com/blog/2009/10/unicode-security-vulnerabilities-presentation-from-internationalization-and-unicode-conference-33/#comments</comments>
		<pubDate>Tue, 20 Oct 2009 19:24:44 +0000</pubDate>
		<dc:creator>Chris Weber</dc:creator>
				<category><![CDATA[Unicode]]></category>
		<category><![CDATA[presentation]]></category>

		<guid isPermaLink="false"></guid>
		<description><![CDATA[I&#039;m attaching my slides from the Unicode conference last week in San Jose, California. I&#039;m getting much feedback for code-level action items. Providing details for code review and static analysis is in the works, with a focus on major frameworks such as ICU, .NET, and Java. You can download the presentation here.]]></description>
			<content:encoded><![CDATA[<p>I&#039;m attaching my slides from the Unicode conference last week in San Jose, California.  I&#039;m getting much feedback for code-level action items.  Providing details for code review and static analysis is in the works, with a focus on major frameworks such as ICU, .NET, and Java.</p>
<p>You can <a href="http://www.casabasecurity.com/files/Chris_Weber_Character%20Transformations%20v1.7_IUC33.pdf"> download the presentation here</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.casaba.com/blog/2009/10/unicode-security-vulnerabilities-presentation-from-internationalization-and-unicode-conference-33/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Unibomber tool for specialized XSS testing</title>
		<link>http://www.casaba.com/blog/2009/07/unibomber-tool-for-specialized-xss-testing/</link>
		<comments>http://www.casaba.com/blog/2009/07/unibomber-tool-for-specialized-xss-testing/#comments</comments>
		<pubDate>Tue, 28 Jul 2009 01:04:31 +0000</pubDate>
		<dc:creator>Chris Weber</dc:creator>
				<category><![CDATA[Security Testing]]></category>
		<category><![CDATA[Tools]]></category>
		<category><![CDATA[Unicode]]></category>
		<category><![CDATA[XSS]]></category>

		<guid isPermaLink="false"></guid>
		<description><![CDATA[John Hernandez has been working hard at Casaba to build a specialized testing tool that automates some of the unique techniques we use to find cross-sites scripting bugs (XSS). At Black Hat I&#039;m planning to demo what we have so far. It automates the testing process greatly, by auto-injecting a canary and ID into each [...]]]></description>
			<content:encoded><![CDATA[<p>John Hernandez has been working hard at Casaba to build a specialized testing tool that automates some of the unique techniques we use to find cross-sites scripting bugs (XSS).  At Black Hat I&#039;m planning to demo what we have so far.  It automates the testing process greatly, by auto-injecting a canary and ID into each input be it query string, HTTP header, or POST parameter.  By combining injection with &#039;output encoding&#039; detection, you get automation that assists pen-testers in finding vulnerability hotspots.</p>
<p>Because it basically bombs a Web-app with a slew of Unicode characters to find XSS bugs we named it the <strong>Unibomber</strong>.</p>
<p>Appended to the canary is a special character &#8211; special because it can transform into a &#039;dangerous&#039; character through normalization, casing, or best-fit mapping operations.  So we end up injecting these special characters all over the place and then detecting where they get transformed and displayed as output.</p>
<p>The beauty is that we can find both reflected and persistent XSS bugs this way.  It&#039;s not a one-click tool though, this is intended for use by an experienced person who knows how to find and exploit a clever XSS bug.  </p>
<p>Anyone who looks for XSS will likely find some good bugs with the Unibomber.  We sure have!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.casaba.com/blog/2009/07/unibomber-tool-for-specialized-xss-testing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>32nd Internationalization and Unicode Conference presentation on Exploiting Unicode-enabled Software</title>
		<link>http://www.casaba.com/blog/2008/09/32nd-internationalization-and-unicode-conference-presentation-on-exploiting-unicode-enabled-software/</link>
		<comments>http://www.casaba.com/blog/2008/09/32nd-internationalization-and-unicode-conference-presentation-on-exploiting-unicode-enabled-software/#comments</comments>
		<pubDate>Thu, 11 Sep 2008 18:37:18 +0000</pubDate>
		<dc:creator>Chris Weber</dc:creator>
				<category><![CDATA[Unicode]]></category>
		<category><![CDATA[presentation]]></category>

		<guid isPermaLink="false"></guid>
		<description><![CDATA[I&#39;m glad to have had the chance to present at the Unicode conference yesterday, and meet all the wonderful people there. You can download the presentation slides here for Exploiting Unicode-enabled software. &#160;]]></description>
			<content:encoded><![CDATA[<p>
I&#39;m glad to have had the chance to present at the Unicode conference yesterday, and meet all the wonderful people there.<br />
You can download the presentation slides here for <a href="http://www.casabasecurity.com/files/Exploiting%20Unicode-enabled%20Software.pdf">Exploiting Unicode-enabled software</a>.</p>
<p><img src="/images/exploit-unicode.jpg" alt="" title="exploiting-unicode" width="500" height="375" />
</p>
<p>
&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.casaba.com/blog/2008/09/32nd-internationalization-and-unicode-conference-presentation-on-exploiting-unicode-enabled-software/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Generating test cases for Unicode-enabled software</title>
		<link>http://www.casaba.com/blog/2008/09/generating-test-cases-for-unicode-enabled-software/</link>
		<comments>http://www.casaba.com/blog/2008/09/generating-test-cases-for-unicode-enabled-software/#comments</comments>
		<pubDate>Wed, 10 Sep 2008 07:00:00 +0000</pubDate>
		<dc:creator>Chris Weber</dc:creator>
				<category><![CDATA[Security Testing]]></category>
		<category><![CDATA[Unicode]]></category>
		<category><![CDATA[test cases]]></category>

		<guid isPermaLink="false"></guid>
		<description><![CDATA[When it comes to Unicode implementations, there’s a rich set of test cases to perform. Realizing it is the start. Automating it is the next step. At a high-level Unicode-related security bugs can be categorized into the following root-causes: Canonicalization Interpreting non-shortest form (e.g .UTF-8 encoding trickery) Other decoding issues Absorption (over-consumption) Over-consuming invalid byte [...]]]></description>
			<content:encoded><![CDATA[<p>When it comes to Unicode implementations, there’s a rich set of test<br />
cases to perform. Realizing it is the start. Automating it is the next<br />
step.</p>
<p>At a high-level Unicode-related security bugs can be categorized into the following root-causes:</p>
<p>Canonicalization</p>
<ul>
<li>Interpreting non-shortest form (e.g .UTF-8 encoding trickery)</li>
<li>Other decoding issues</li>
</ul>
<p>Absorption (over-consumption)</p>
<ul>
<li>Over-consuming invalid byte sequences or correcting rather than failing</li>
<li>When &lt;41 C2 C3 B1 42&gt;  becomes &lt;41 42&gt;</li>
</ul>
<p>Character deletion and swallowing</p>
<ul>
<li>“deletion of noncharacters” (UTR-36)</li>
<li>&lt;scr[U+FEFF]ipt&gt; becomes &lt;script&gt;</li>
<li>Use replacement characters instead!</li>
</ul>
<p>Interpreting Syntax replacements</p>
<ul>
<li>white space and line feeds</li>
<li>E.g. when U+180E acts like U+0020</li>
</ul>
<p>Best-fit mappings</p>
<ul>
<li>When σ becomes s</li>
<li>When ′ becomes ‘</li>
</ul>
<p>Buffer overruns</p>
<ul>
<li>Incorrect assumptions about string sizes (chars vs. bytes)</li>
<li>Improper width calculations</li>
</ul>
<p>Timing issues</p>
<ul>
<li>handling Unicode after security gates</li>
<li>Sometimes handling Unicode before a gate can be a problem too!  E.g. BOM handling</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.casaba.com/blog/2008/09/generating-test-cases-for-unicode-enabled-software/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Unicode formatter characters lead to cross-site scripting in popular browsers</title>
		<link>http://www.casaba.com/blog/2008/09/unicode-formatter-characters-lead-to-cross-site-scripting-in-popular-browsers/</link>
		<comments>http://www.casaba.com/blog/2008/09/unicode-formatter-characters-lead-to-cross-site-scripting-in-popular-browsers/#comments</comments>
		<pubDate>Fri, 05 Sep 2008 21:25:41 +0000</pubDate>
		<dc:creator>Chris Weber</dc:creator>
				<category><![CDATA[Security Testing]]></category>
		<category><![CDATA[Unicode]]></category>
		<category><![CDATA[bug]]></category>
		<category><![CDATA[test cases]]></category>
		<category><![CDATA[vulnerability]]></category>

		<guid isPermaLink="false"></guid>
		<description><![CDATA[I&#039;ll be discussing some of the issues recently reported to Opera, Apple, and Mozilla at the 32nd Unicode Conference in San Jose next week. We discovered some issues with the way certain Unicode characters could be leveraged to enable cross-site scripting attacks in popular web browsers (aka User-Agents). These issues involve utilizing Unicode characters in [...]]]></description>
			<content:encoded><![CDATA[<p>I&#039;ll be discussing some of the issues recently reported to Opera, Apple, and Mozilla at the 32nd Unicode Conference in San Jose next week.  We discovered some issues with the way certain Unicode characters could be leveraged to enable cross-site scripting attacks in popular web browsers (aka User-Agents).  These issues involve utilizing Unicode characters in ways which might bypass most filters, IPS, and IDS systems.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.casaba.com/blog/2008/09/unicode-formatter-characters-lead-to-cross-site-scripting-in-popular-browsers/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Handling Unicode when marshalling from .Net to a platform invoke</title>
		<link>http://www.casaba.com/blog/2008/04/handling-unicode-when-marshalling-from-net-to-a-platform-invoke/</link>
		<comments>http://www.casaba.com/blog/2008/04/handling-unicode-when-marshalling-from-net-to-a-platform-invoke/#comments</comments>
		<pubDate>Tue, 22 Apr 2008 05:09:56 +0000</pubDate>
		<dc:creator>Chris Weber</dc:creator>
				<category><![CDATA[Code Review]]></category>
		<category><![CDATA[Security Testing]]></category>
		<category><![CDATA[Unicode]]></category>
		<category><![CDATA[.NET]]></category>

		<guid isPermaLink="false"></guid>
		<description><![CDATA[By default, the .Net runtime will marshall a string (and files in a value type) as a LPStr to a platform invoke (p/invoke) function. By default the .Net framework and runtime handles strings as UTF-16. That&#39;s two bytes representing a single Unicode &#39;code point&#39;, and more familiar, a single character. An LPStr on the other [...]]]></description>
			<content:encoded><![CDATA[<p>By default, the .Net runtime will marshall a string (and files in a value type) as a LPStr to a platform invoke (p/invoke) function. By default the .Net framework and runtime handles strings as UTF-16.  That&#39;s two bytes representing a single Unicode &#39;code point&#39;, and more familiar, a single character. An LPStr on the other hand, is an ANSI character, so in order to convert, the runtime will perform a <strong>best-fit conversion</strong> to the classic windows-1252 code page.  This conversion is well-documented here:</p>
<p><a href="http://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1252.txt">http://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1252.txt</a></p>
<p>This might not be so surprising to people in tune with Unicode, but it&#39;s can lead to huge security problems when security filters are at risk. For example, if you&#39;re performing HTML filtering or file canonicalization, you need to perform so <strong>after the conversion </strong>to LPStr.</p>
<p>This default marshalling behavior is documented at:  <a href="http://msdn2.microsoft.com/en-us/library/system.runtime.interopservices.marshalasattribute(VS.71).aspx">http://msdn2.microsoft.com/en-us/library/system.runtime.interopservices.marshalasattribute(VS.71).aspx</a></p>
<p>To properly and more safely <strong>deal with this</strong>, you can use the MarshallAsAttribute class to specify a <strong>LPWStr </strong>type instead of a LPStr.  For example:</p>
<p>	[MarshalAs(UnmanagedType.LPWStr)]</p>
<p>Because LPWStr is a pointer to a null-terminated array of Unicode characters, this ensures the Unicode code points are preserved across the marshalling.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.casaba.com/blog/2008/04/handling-unicode-when-marshalling-from-net-to-a-platform-invoke/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>I18N input validation whitelist filter with System.Globalization and GetUnicodeCategory</title>
		<link>http://www.casaba.com/blog/2007/04/i18n-input-validation-whitelist-filter-with-system-globalization-and-getunicodecategory/</link>
		<comments>http://www.casaba.com/blog/2007/04/i18n-input-validation-whitelist-filter-with-system-globalization-and-getunicodecategory/#comments</comments>
		<pubDate>Tue, 24 Apr 2007 05:33:20 +0000</pubDate>
		<dc:creator>Chris Weber</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[Unicode]]></category>
		<category><![CDATA[whitelist]]></category>

		<guid isPermaLink="false"></guid>
		<description><![CDATA[Maybe you’re building internationalized code and wondering how to build a whitelist filter that will support all the different character sets your planning to support. If you support more than ten, especially some of the larger east Asian sets, this might seem like an unwieldy or tricky process. Well luckily it’s easier than most people [...]]]></description>
			<content:encoded><![CDATA[<p>Maybe you’re building internationalized code and wondering how to build a whitelist filter that will support all the different character sets your planning to support. If you support more than ten, especially some of the larger east Asian sets, this might seem like an unwieldy or tricky process.<br />
Well luckily it’s easier than most people would think. Building a good input validation filter can be simplified with .Net’s <a linkindex="84" href="http://msdn2.microsoft.com/en-us/library/system.globalization.charunicodeinfo.getunicodecategory.aspx">GetUnicodeCategory</a>. But use the method from the <strong>System.Globalization</strong> namespace as the other one in System.Char looks like it may become the subordinate. </p>
<p>With <strong>GetUnicodeCategory </strong>you can simply build a <strong>whitelist </strong>supporting the character <em><strong>categories </strong></em>you want to allow. So get away from thinking you have to write a regEx filter and list out all the character ranges you want to allow in each character set, it’s much simpler than that! </p>
<p>The Unicode standard assigns ever character to one of about <strong>31 categories</strong>. They make sense too, for example Other Control charactes (Cc) , Lowercase Letter (Ll), Uppercase Letter (Lu), Math Symbol (Sm). So for example you might want to only allow letters, numbers, and punctuation in your whitelist. This could be achieved with the following snippet: </p>
<p><code><br />
char cUntrustedInput; // the untrusted user-input<br />
UnicodeCategory cInputTest = CharUnicodeInfo.GetUnicodeCategory(cUntrustedInput);<br />
if (cTestCategory == UnicodeCategory.LowercaseLetter ||<br />
cTestCategory == UnicodeCategory.UppercaseLetter ||<br />
cTestCategory == UnicodeCategory.DecimalDigitNumber ||<br />
cTestCategory == UnicodeCategory.TitlecaseLetter ||<br />
cTestCategory == UnicodeCategory.OtherLetter ||<br />
cTestCategory == UnicodeCategory.NonSpacingMark ||<br />
cTestCategory == UnicodeCategory.DashPunctuation ||<br />
cTestCategory == UnicodeCategory.ConnectorPunctuation)<br />
{<br />
// character looks safe, continue<br />
}<br />
else<br />
{<br />
// character is not allowed, fail<br />
}<br />
</code></p>
<p>Not too bad eh.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.casaba.com/blog/2007/04/i18n-input-validation-whitelist-filter-with-system-globalization-and-getunicodecategory/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

