< Back

Document Subject: PCDATA converter suitable for xml and rss text
Hint Short Cut: Add this to your code & documentation to help you find this page.
http://#XMLTextEncode or http://A555F9/nn.nsf/ByAlias/XMLTextEncode

XML and RSS are a bit fussy over characters in item or title values. Here is a lotusscript function to convert a string to entity-reference characters.




How to convert a string's dodgy characters  to HTML 4 character entity reference characters.

 

If you do not do this you get errors like:

An invalid character was found in text content. Error processing resource

You could do this like:

 If Instr(phrase$,"&") Then phrase$=replacesubstring(phrase$,"&","&amp;") '38
 If Instr(phrase$,Chr$(228)) Then phrase$=replacesubstring(phrase$,"ä","&#228;") '228  &auml;  
 If Instr(phrase$,"ö") Then phrase$=replacesubstring(phrase$,"ö","&#246;") '246  &ouml;
 If Instr(phrase$,"<") Then phrase$=replacesubstring(phrase$,"<","&lt;") '60
If Instr(phrase$,">") Then phrase$=replacesubstring(phrase$,">","&gt;") '62

but I found that &auml was not liked and also I found there were quite a few you have to deal with.

See: http://www.w3.org/TR/html401/sgml/entities.html

Here is my function:

Function PCDATA(Byval phrase$) As String
 
' This function searches the string passed in phrase for all
' occurences of the string with non PCDATA characters such as umlauts and ampersands. It then replaces
' the value with the &#xxx; versio
n . If an error occurs,
' the original value is returned
' This is useful for xml and rss feeds <item> ->  <title> tags
' esp when working for a Finnish company : )

' Written by Adam Foster

'Updated Nov 2010 to use Uni() instead of Asc() to cater for ellipsis ... etc

' Bug Fixed Jan2011 use uni(ch)=32 instead of ch=" " as " " can match uni(32) or uni(160) !!!!!

' Keep this link in to find out about code updates and fixes

' http://www.NotesNinjas.com/#XMLTextEncode
 
On Error Goto Errors
 Dim begin%, found%, oldPhrase$
 
'---- Make sure that we have a phrase to change, and a string to look for:
 If phrase$="" Then Goto TheEnd
 
'---- Save the original string
oldPhrase= phrase

dim a as long

dim ch as string
phrase$=""
 For a=1 To Len(oldphrase)
  ch=Mid$(oldphrase,a,1)
  If Uni(ch)=32 Or Uni(ch)>47 And Uni(ch)<58  Or _ '0-9
  Uni(ch)>64 And Uni(ch)<91  Or _  'A-Z
  Uni(ch)>96 And Uni(ch)<123 Then  'a-z
   phrase$=phrase$ & ch
  Else
   phrase$=phrase$ & "&#" & Uni(ch) & ";"    
  End If
 Next
 
PCDATA = phrase
 
TheEnd:
 Exit Function
 
Errors:
'---- Returning the original value of phrase
 PCDATA = oldPhrase
 Resume TheEnd
End Function

 

And here is the correpsonding function that converts strings back:

Function unPCDATA(Byval phrase$) As String
 
' This function searches the string passed in phrase for all
' occurences of &#xx; . It then replaces
' the value with the character. If an error occurs,
' the original value is returned
' This is useful for xml and rss feeds <item> ->  <title> tags
' esp when working for a Finnish company : )
 ' See http://www.notesninjas.com/A555F9/nn.nsf/ByAlias/XMLTextEncode
 

On Error Goto Errors
 Dim begin%, found%, oldPhrase$
 
'---- Make sure that we have a phrase to change, and a string to look for:
 If phrase$="" Then Goto TheEnd
 
'---- Save the original string
oldPhrase= phrase
 
phrase$=""
 For a=1 To Len(oldphrase)
  ch=Mid$(oldphrase,a,2)
  If ch="&#" Then
   ch=Mid$(oldphrase,a+2,Instr(a+2,oldphrase,"")-(a+2) )
   num = Cint ( ch)
   phrase$=phrase$ & Chr$(num)
   a=Instr(a+2,oldphrase,"")
  Else
   phrase$=phrase$ & Left(ch,1)
   
 End If
 Next

unPCDATA = phrase
 
TheEnd:
 Exit Function
 
Errors:
'---- Returning the original value of phrase
 unPCDATA = oldPhrase
 Resume TheEnd
End Function