Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion GroupsVB SyntaxEnterprise DevelopmentDatabase AccessControlsCOMWin APICrystal ReportDeploymentGeneralGeneral 2
Related Topics
VB.NET / ASP.NETMS SQL ServerMS AccessOther Database ProductsMore Topics ...

VB Forum / Win API / June 2008



Tip: Looking for answers? Try searching our database.

Is IsTextUnicode reliable?

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
RB Smissaert - 20 Jun 2008 19:49 GMT
It looks the API IsTextUnicode is not reliable even when it repeatedly runs
on the same string.
So, it can give False or True on the same string. Am I doing something wrong
or is this API indeed
not reliable?

Private Declare Function IsTextUnicode Lib "advapi32" _
                                      (lpBuffer As Any, _
                                       ByVal cb As Long, _
                                       lpi As Long) As Long

Public Function IsUnicodeStr(sBuffer As String) As Boolean

 Const IS_TEXT_UNICODE_UNICODE_MASK = &HF

 'Returns True if sBuffer evaluates to a Unicode string
  Dim dwRtnFlags As Long

  'note we need a variable dwRtnFlags here as dwRtnFlags is [in] [out]
  '-------------------------------------------------------------------
  dwRtnFlags = IS_TEXT_UNICODE_UNICODE_MASK
  IsUnicodeStr = IsTextUnicode(StrPtr(sBuffer), Len(sBuffer), dwRtnFlags)

End Function

RBS
MikeD - 20 Jun 2008 20:08 GMT
> It looks the API IsTextUnicode is not reliable even when it repeatedly runs on the same string.
> So, it can give False or True on the same string. Am I doing something wrong or is this API indeed
[quoted text clipped - 18 lines]
>
> End Function

Just a guess (having never used this particular function before and not bothering to look at its docs), try changing the declaration
to this:

Private Declare Function IsTextUnicode Lib "advapi32" _
                                      (ByVal lpBuffer As Long, _
                                       ByVal cb As Long, _
                                       lpi As Long) As Long

However, I'd think that would ALWAYS return 1 because internally all strings in VB are unicode and you're passing a string pointer.
This is just a guess, though.

Did you try googling on that function name to see if anybody's already posted an example?

Signature

Mike
Microsoft Visual Basic MVP

RB Smissaert - 20 Jun 2008 20:51 GMT
I think the trouble was somewhere else as I can't reproduce this when I call
it plain and simple:

 Dim str As String

 str = "test"

 MsgBox IsUnicodeStr(str), , "IsTextUnicode"

This is always giving False.

RBS

>> It looks the API IsTextUnicode is not reliable even when it repeatedly
>> runs on the same string.
[quoted text clipped - 35 lines]
> Did you try googling on that function name to see if anybody's already
> posted an example?
MikeD - 20 Jun 2008 23:11 GMT
>I think the trouble was somewhere else as I can't reproduce this when I
>call it plain and simple:
[quoted text clipped - 6 lines]
>
> This is always giving False.

You didn't mention if you changed your declaration to what I suggested
(which Scott's post also claims would be correct). It's the ByVal in the
declaration that's important. I just made the data type Long because you're
passing a pointer, which is a Long. It should "work" whether you use Any or
Long as the data type in the declaration provided you're passing it ByVal.

Scott and I are also both in agreement that this is a pointless call to make
because strings in VB are unicode. Why you're getting a return value of
False, to me, only indicates you're not calling the API function correctly
OR this particular API function simply won't work properly in VB due to the
way VB handles strings.

Only other thing I can suggest is to use the StrConv function to explictly
convert the string to unicode and compare the number of bytes to the
original string. If the number of bytes is double, the original string is
ANSI. Now, I don't know how reliable this would be for certain code pages,
if reliable at all, but that's the best alternative I can think of since I
just don't believe that this API function is going to be reliable in VB (if
called properly, my expectation would be that it should *always* indicate
the string is unicode). But again, I've never used this function in VB, let
alone read the docs on it. So I still suggest that you do a google search,
as I'd bet dollars to doughnuts that this function has been discussed (and
even after I suggested it once, you STILL never said if you bothered to
search and what you found, assuming you did search).

Signature

Mike
Microsoft MVP Visual Basic

Scott Seligman - 20 Jun 2008 21:59 GMT
>It looks the API IsTextUnicode is not reliable even when it repeatedly
>runs on the same string. So, it can give False or True on the same
>string. Am I doing something wrong or is this API indeed not reliable?

You need to pass the pointer ByVal and double the count of characters
since the function needs a count of bytes.

That said, it's a pointless call, since VB strings are Unicode.

Signature

--------- Scott Seligman <scott at <firstname> and michelle dot net> ---------
  There are fewer great satisfactions than that of self.
  -- Calhoun in Star Trek: New Frontier: Being Human by Peter David

RB Smissaert - 20 Jun 2008 23:06 GMT
It sure is confusing this. I was just playing with the code posted here:
http://vbnet.mvps.org/index.html?code/shell/undocshelldlgs.htm

You say VB strings are Unicode, but is I take it that applies to plain and
simple strings, but
maybe it is a different matter if code is involved as on the above site.

Your suggestions seem to make it work fine now:

Option Explicit
Private Declare Function IsTextUnicode Lib "advapi32" _
                                      (lpBuffer As Long, _
                                       ByVal cb As Long, _
                                       lpi As Long) As Long

Function ShowByteArray(ByteArray() As Byte) As String

 Dim i As Long
 Dim LB As Long
 Dim UB As Long

 LB = LBound(ByteArray)
 UB = UBound(ByteArray)

 ShowByteArray = ByteArray(LB)

 If UBound(ByteArray) > LB Then
   For i = LB + 1 To UB
     ShowByteArray = ShowByteArray & vbCrLf & ByteArray(i)
   Next i
 End If

End Function

Sub testIsUnicodeStr()

 Dim str As String

 str = "test"

 MsgBox ShowStringAsBytes(str), , "original"

 MsgBox IsUnicodeStr(str), , "IsTextUnicode"

 str = StrConv(str, vbFromUnicode)

 MsgBox ShowStringAsBytes(str), , "ANSI"

 MsgBox IsUnicodeStr(str), , "IsTextUnicode"

 str = StrConv(str, vbUnicode)

 MsgBox ShowStringAsBytes(str), , "Unicode"

 MsgBox IsUnicodeStr(str), , "IsTextUnicode"

End Sub

Public Function IsUnicodeStr(sBuffer As String) As Boolean

 Const IS_TEXT_UNICODE_UNICODE_MASK = &HF

 'Returns True if sBuffer evaluates to a Unicode string
  Dim dwRtnFlags As Long

  'note we need a variable dwRtnFlags here as dwRtnFlags is [in] [out]
  '-------------------------------------------------------------------
  dwRtnFlags = IS_TEXT_UNICODE_UNICODE_MASK
  IsUnicodeStr = IsTextUnicode(ByVal StrPtr(sBuffer), Len(sBuffer) * 2,
dwRtnFlags)

End Function

RBS

>>It looks the API IsTextUnicode is not reliable even when it repeatedly
>>runs on the same string. So, it can give False or True on the same
[quoted text clipped - 4 lines]
>
> That said, it's a pointless call, since VB strings are Unicode.
Thorsten Albers - 20 Jun 2008 22:49 GMT
RB Smissaert <bartsmissaert@blueyonder.co.uk> schrieb im Beitrag
<OkcdjZw0IHA.6096@TK2MSFTNGP06.phx.gbl>...
>    IsUnicodeStr = IsTextUnicode(StrPtr(sBuffer), Len(sBuffer), dwRtnFlags)

- 'lpBuffer' here has to be passed 'ByVal' since a pointer to the first
character of the string is passed ('ByRef' would be a pointer to a pointer
to the string).
- In 'cb' has to be passed 'LenB(sBuffer)', i.e. the count of bytes, not
the count of characters.
- If used correctly with a VB string this procedure should return always
TRUE since VB strings are always Unicode. So it doesn't make much sense to
call this procedure with a VB string as the first argument...

In general you shouldn't rely on IsTextUnicode(). Instead you should use
the 'byte order mark' (FEFFh / FFFEh) to check for a Unicode text, and/or
you should let the user select ANSI or Unicode character processing.

Signature

----------------------------------------------------------------------
Thorsten Albers                               albers(a)uni-freiburg.de
----------------------------------------------------------------------

RB Smissaert - 20 Jun 2008 23:29 GMT
Thanks for the tips.

> Instead you should use the 'byte order mark' (FEFFh / FFFEh) to check for
> a Unicode text

How would that work?

RBS

> RB Smissaert <bartsmissaert@blueyonder.co.uk> schrieb im Beitrag
> <OkcdjZw0IHA.6096@TK2MSFTNGP06.phx.gbl>...
[quoted text clipped - 13 lines]
> the 'byte order mark' (FEFFh / FFFEh) to check for a Unicode text, and/or
> you should let the user select ANSI or Unicode character processing.
Jim Mack - 21 Jun 2008 03:07 GMT
> Thanks for the tips.
>
>> Instead you should use the 'byte order mark' (FEFFh / FFFEh) to
>> check for a Unicode text
>
> How would that work?

"Compatible" Unicode text files begin with a BOM, which is a 16-bit
value not otherwise used as a Unicode character (reserved). If you the
first character in the file is FEFF or FFFE, you have a Unicode text
file... further, which of those you see tells you if the characters
are in big- or little-endian order.

--
   Jim Mack
   MicroDexterity Inc
   www.microdexterity.com

> RBS
>
[quoted text clipped - 23 lines]
>> albers(a)uni-freiburg.de
>> ----------------------------------------------------------------------
RB Smissaert - 21 Jun 2008 07:25 GMT
OK, so would  a function like this pick it up:

Function FileUnicode(strFile As String) As String

 'Bytes         Encoding Form
 '----------------------------------
 '00 00 FE FF   UTF-32, big-endian
 'FF FE 00 00   UTF-32, little-endian
 'FE FF         UTF-16, big-endian
 'FF FE         UTF-16, little-endian
 'EF BB BF      UTF-8

 Dim hFile As Long
 Dim A As Byte
 Dim B As Byte
 Dim C As Byte
 Dim D As Byte

 On Error GoTo ERROROUT

 hFile = FreeFile

 Open strFile For Binary As #hFile

 Get #hFile, 1, A

 Select Case A
   Case 0
     Get #hFile, 2, B
     If B = 0 Then
       Get #hFile, 3, C
       If C = 254 Then
         Get #hFile, 4, D
         If D = 255 Then
           '00 00 FE FF
           FileUnicode = "UTF-32, big-endian"
         End If
       End If
     End If
   Case 239  'EF
     Get #hFile, 2, B
     If B = 187 Then
       Get #hFile, 3, C
       If C = 191 Then
         'EF BB BF
         FileUnicode = "UTF-8"
       End If
     End If
   Case 254  'FE
     Get #hFile, 2, B
     If B = 255 Then
       'FE FF
       FileUnicode = "UTF-16, big-endian"
     End If
   Case 255  'FF
     Get #hFile, 2, B
     If B = 254 Then
       Get #hFile, 3, C
       If C = 0 Then
         Get #hFile, 3, D
         If D = 0 Then
           'FF FE 00 00
           FileUnicode = "UTF-32, little-endian"
         End If
       Else
         'FF FE
         FileUnicode = "UTF-16, little-endian"
       End If
     End If
 End Select

ERROROUT:
 Close #hFile

End Function

I don't really need this, but was just playing with this to get the feel of
ANSI <> Unicode.
Did Google, but couldn't find VB code to pick this BOM up, so put together
the above.

RBS

>> Thanks for the tips.
>>
[quoted text clipped - 43 lines]
>>> -------------------------------------------------------------------
> ---
Jim Mack - 21 Jun 2008 13:28 GMT
> OK, so would  a function like this pick it up:

I didn't parse your code, but it's the right idea. Of the ones you
list, UTF-16 is the only one I ever see 'in the wild', so a simpler
test is just to read the first integer to see if it's -2 or -257.

Note that it's an affirmative test only: you can be sure it's UTF-16
if you see the BOM, but absence of a BOM means nothing.

--
       Jim

> Function FileUnicode(strFile As String) As String
>
[quoted text clipped - 124 lines]
>>>> -------------------------------------------------------------------
>> ---
RB Smissaert - 21 Jun 2008 18:01 GMT
Thanks again for the tips; got this all now.

RBS

>> OK, so would  a function like this pick it up:
>
[quoted text clipped - 138 lines]
> --
>>> ---
Dean Earley - 23 Jun 2008 08:46 GMT
> It looks the API IsTextUnicode is not reliable even when it repeatedly
> runs on the same string.
> So, it can give False or True on the same string. Am I doing something
> wrong or is this API indeed
> not reliable?

It is not:
http://blogs.msdn.com/michkap/archive/2005/01/30/363308.aspx
http://blogs.msdn.com/oldnewthing/archive/2007/04/17/2158334.aspx
http://blogs.msdn.com/oldnewthing/archive/2004/03/24/95235.aspx

The function has to guess to decide one way or the other.

Signature

Dean Earley (dean.earley@icode.co.uk)
i-Catcher Development Team

iCode Systems

Tony Proctor - 23 Jun 2008 11:23 GMT
As just about everyone else has said  <grin>, VB Strings are always held in
Unicode and so the call isn't very helpful

However, I would like to add that if you've imported some textual data and
you're trying to determine whether the encoding is Unicode or some other
SBCS/DBCS then it should not be imported into String variables. Putting
non-Unicode data into String variables breaks several rules and could cause
run-time errors. Such data should be imported into Byte arrays, and then
calling IsTextUnicode could be useful

   Tony Proctor

> It looks the API IsTextUnicode is not reliable even when it repeatedly
> runs on the same string.
[quoted text clipped - 22 lines]
>
> RBS
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.