com.waveset.util
Class Encoding

java.lang.Object
  extended bycom.waveset.util.Encoding

public class Encoding
extends java.lang.Object

Utilities related to character encodings.


Field Summary
static java.lang.String ASCII
           
static java.lang.String code_id
           
static java.lang.String LATIN1
           
static java.lang.String LATIN2
           
static java.lang.String UCS2
           
static java.lang.String UCS4
           
static java.lang.String UTF16
           
static java.lang.String UTF8
           
 
Constructor Summary
Encoding()
           
 
Method Summary
static java.lang.String decode(java.lang.String psz, java.lang.String encoding)
          Convert a string in the specified encoding to the default encoding.
static java.lang.String decodePseudoUTF8(java.lang.String pseudoUTF8)
           
static java.lang.String encode(java.lang.String psz, java.lang.String encoding)
          Convert an string in the default encoding to the specified encoding.
static java.lang.String encodePseudoUTF8(java.lang.String psz)
           
static java.lang.String getDefaultEncoding()
           
static boolean isValidASCII(byte[] bytes)
           
static boolean isValidASCII(java.lang.String s)
           
static boolean isValidUTF8(byte[] bytes, boolean beStrict)
           
static int lengthInUTF8(java.lang.String psz)
           
static void main(java.lang.String[] args)
           
static java.lang.String toUnicodeEscapedAscii(java.lang.String str)
          If you are concerned with how fast this is, it goes at about 6MB/sec for ASCII, and almost 1MB/sec for non-ASCII.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

code_id

public static final java.lang.String code_id
See Also:
Constant Field Values

UTF8

public static final java.lang.String UTF8
See Also:
Constant Field Values

UTF16

public static final java.lang.String UTF16
See Also:
Constant Field Values

UCS2

public static final java.lang.String UCS2
See Also:
Constant Field Values

UCS4

public static final java.lang.String UCS4
See Also:
Constant Field Values

LATIN1

public static final java.lang.String LATIN1
See Also:
Constant Field Values

LATIN2

public static final java.lang.String LATIN2
See Also:
Constant Field Values

ASCII

public static final java.lang.String ASCII
See Also:
Constant Field Values
Constructor Detail

Encoding

public Encoding()
Method Detail

lengthInUTF8

public static int lengthInUTF8(java.lang.String psz)
                        throws InternalError
Throws:
InternalError

getDefaultEncoding

public static java.lang.String getDefaultEncoding()
Returns:
the default character encoding for this JVM.

decodePseudoUTF8

public static java.lang.String decodePseudoUTF8(java.lang.String pseudoUTF8)
Returns:
a properly encoded String that contains the characters that are encoded as pseudo-UTF8 in the specified String.

This method assumes that the lower-order eight bits of each character in the input string represent a properly encoded UTF-8 byte.


encodePseudoUTF8

public static java.lang.String encodePseudoUTF8(java.lang.String psz)
Returns:
a "pseudo-UTF8" String, of which each character represents one UTF8 byte of (the UTF8 encoding of) the specified String.

encode

public static java.lang.String encode(java.lang.String psz,
                                      java.lang.String encoding)
                               throws InvalidArgument
Convert an string in the default encoding to the specified encoding.

Parameters:
psz - a String in the default encoding.
encoding - the name of a character encoding scheme.
Returns:
a string with the contents of psz converted to the specified encoding.

For example, a call to encode("pe�a", Encoding.UTF8); would return "peña".

Inverse of {@link #decode).

Throws:
InvalidArgument
See Also:
getDefaultEncoding()

decode

public static java.lang.String decode(java.lang.String psz,
                                      java.lang.String encoding)
                               throws InvalidArgument
Convert a string in the specified encoding to the default encoding.

Parameters:
psz - a String containing characters constructed in the default encoding scheme from bytes in the specified encoding.
encoding - the name of a character encoding scheme.
Returns:
a string with the contents of psz converted back to the default encoding.

For example, a call to decode("peña", Encoding.UTF8); would return "pe�a".

Throws:
InvalidArgument
See Also:
getDefaultEncoding()

isValidASCII

public static boolean isValidASCII(byte[] bytes)

isValidASCII

public static boolean isValidASCII(java.lang.String s)
Returns:
true if the specified string contains only characters in the ASCII range; otherwise return false.

isValidUTF8

public static boolean isValidUTF8(byte[] bytes,
                                  boolean beStrict)
Parameters:
bytes - byte array to check
beStrict - if true, specifies rigorous validation. Specifically, disallows "pseudo-UTF8".
Returns:
true if the given sequence of bytes is valid UTF-8; otherwise false.


toUnicodeEscapedAscii

public static java.lang.String toUnicodeEscapedAscii(java.lang.String str)
If you are concerned with how fast this is, it goes at about 6MB/sec for ASCII, and almost 1MB/sec for non-ASCII.

Returns:
an ASCII String with all Unicode characters represented as "\\uXXXX" the '\\' is also represented as Unicode if it's followed by a u character

main

public static void main(java.lang.String[] args)
                 throws java.lang.Exception
Throws:
java.lang.Exception