Codesnippets > Code Snippets

xml parser

(1/5) > >>

phaelax:
I originally made this with DarkBasic and decided to port it over. I was able to make a few changes since GLB lets me use arrays in Types. There is a bug or two. It's adding the closing tag of a root node to the array when it shouldn't.  And the function which can return a tag's inner content doesn't work properly when it includes the inner content of all it's children. I don't have these issues in the DB version so I'm guessing it's something to do with GLB indices being zero-based while DB starts with 1. Or maybe I just copied something over wrong.  There's a text xml file here: http://zimnox.com/quiz.xml

Any time you call xmlReadFile() you should call xmlClear() first.

xmlReadFile(string)
xmlGetElementCount()
xmlGetTagName(int)
xmlGetAttirbuteValue$(int, string)
xmlAttributeExists(int, string)
xmlGetAttributeKey$(int, int)
xmlGetAttributeCount(int)
xmlGetTagContent$(int, bool)
xmlClear()


--- Code: (glbasic) ---// --------------------------------- //
// Project: XMLParser
// Author: Phaelax
// Start: Wednesday, January 26, 2011
// IDE Version: 8.078



TYPE AttributeSet
key$
value$
ENDTYPE

TYPE ElementObject
tagName$
parentElementId
content$
pos
parentPos
attributes[0] AS AttributeSet
ENDTYPE

GLOBAL escapes$[]
DIM escapes$[5][2]
escapes$[0][0] = "&lt;"  ; escapes$[0][1] = "<"
escapes$[1][0] = "&gt;"  ; escapes$[1][1] = ">"
escapes$[2][0] = "&amp;" ; escapes$[2][1] = "&"
escapes$[3][0] = "&apos;"; escapes$[3][1] = "'"
escapes$[4][0] = "&quot;"; escapes$[4][1] = CHR$(34)


GLOBAL xmlTags[] AS ElementObject
GLOBAL parseStack[]
DIM xmlTags[0]


//xmlReadFile("c:/quiz.xml")
xmlReadFile("C:/Documents AND Settings/Phaelax.NEWTON64/My Documents/GLBasic/zelda/zelda.gbap")
LOCAL key$
LOCAL y = 0
FOR i = 0 TO xmlGetElementCount()-1
PRINT i+": "+xmlTags[i].tagName$+" -> "+xmlGetTagContent$(i, FALSE), 50, y;INC y, 10
FOR j = 0 TO xmlGetAttributeCount(i)-1
key$ = xmlGetAttributeKey$(i, j)
PRINT key$ + " -> " + xmlGetAttributeValue$(i, key$), 100, y;INC y, 10
NEXT
NEXT


SHOWSCREEN
KEYWAIT
END









FUNCTION xmlReadFile:filename$
LOCAL xmlFileNo = 1
LOCAL L$, tagName$, c$, oldChar$, temp$, unparsedAttributes$
LOCAL matchOpenBracket, tagType, strLength, currentTag
OPENFILE(xmlFileNo, filename$, TRUE)

WHILE ENDOFFILE(xmlFileNo) = FALSE
READLINE xmlFileNo, L$
tagName$ = ""
matchOpenBracket = -1
tagType = 0
strLength = LEN(L$)

FOR i = 0 TO strLength-1
c$ = MID$(L$, i, 1)

//////////////////////////////////////////////
// open bracket found for new tag
//////////////////////////////////////////////
IF c$ = "<"
matchOpenBracket = i
tagType = 0
ENDIF

//////////////////////////////////////////////
// forward slash can either be part of a
// closing container tag, or closing an empty
//////////////////////////////////////////////
IF c$ = "/"
//////////////////////////////////////////////
// If part of a closing tag, the slash will be
// prefixed by the bracket (less-than sign)
//////////////////////////////////////////////
IF oldChar$ = "<" THEN tagType = 1
ENDIF

//////////////////////////////////////////////
// Closing bracket for a tag
//////////////////////////////////////////////
IF c$ = ">"

//////////////////////////////////////////////
// if character before closing bracket was
// a slash, then this bracket closed off an
// empty tag
//////////////////////////////////////////////
IF oldChar$ = "/"
tagType = 2
ELSE
//////////////////////////////////////////////
// "<? ?>" is part of the XML declaration
//////////////////////////////////////////////
IF oldChar$ = "?"
tagType = 2
ELSE
//////////////////////////////////////////////
// Normal close bracket, standard container
//////////////////////////////////////////////
ENDIF
ENDIF
//////////////////////////////////////////////
// If we closed off (completed) the opening
// tag's bracket, then it's open as the current
// container. Add this tag to the container stack
// for tracking the hierarchy and store a new
// tag element in the array
//////////////////////////////////////////////
IF tagType = 0

LOCAL e AS ElementObject
e.pos = matchOpenBracket
temp$ = MID$(L$, matchOpenBracket+1, i-matchOpenBracket-1)
e.tagName$ = TRIM$(UCASE$(LEFT$(temp$, pFindTagNameEndIndex(temp$))))
pParseXmlAttributes(e, TRIM$(RIGHT$(temp$, LEN(temp$)-LEN(e.tagName$))))
e.content$ = ""
//////////////////////////////////////////////
// A parent ID of -1 means it is the root node
//////////////////////////////////////////////
IF LEN(parseStack[]) <= 0
e.parentElementId = -1
ELSE
e.parentElementId = LEN(parseStack[])-1
//////////////////////////////////////////////
// The position within the parent tag's content
// where this tag's data is present
//////////////////////////////////////////////
e.parentPos = LEN(xmlTags[e.parentElementId].content$)
ENDIF
DIMPUSH xmlTags[], e

//////////////////////////////////////////////
// Add the index of the last tag element added
// to the xmlTags array to the stack. This keeps
// track of what container we're in
//////////////////////////////////////////////
DIMPUSH parseStack[], LEN(xmlTags[])-1
ENDIF
//////////////////////////////////////////////
// Closing tag was found, remove last container
// from stack
//////////////////////////////////////////////
IF tagType = 1
DIMDEL parseStack[], -1
ENDIF

//////////////////////////////////////////////
// This was an empty tag element. As they are
// not containers, nothing is added to the stack
// and nothing needs removed. Create a new
// element and add it to the xmlTags array.
//////////////////////////////////////////////
IF tagType = 2
LOCAL e AS ElementObject

//////////////////////////////////////////////
// Checks for special case with XML declaration
//////////////////////////////////////////////
IF oldChar$ <> "?"
temp$ = MID$(L$, matchOpenBracket+1, i-matchOpenBracket-2)
ELSE
temp$ = MID$(L$, matchOpenBracket+2, i-matchOpenBracket-3)
ENDIF

e.tagName$ = TRIM$(UCASE$(LEFT$(temp$, pFindTagNameEndIndex(temp$))))
pParseXmlAttributes(e, TRIM$(RIGHT$(temp$, LEN(temp$)-LEN(e.tagName$))))
e.content$ = ""
IF LEN(parseStack[]) <= 0
e.parentElementId = -1
ELSE
e.parentElementId = LEN(parseStack[])-1
//////////////////////////////////////////////
// The position within the parent tag's content
// where this tag's data begins
//////////////////////////////////////////////
e.parentPos = LEN(xmlTags[e.parentElementId].content$)
ENDIF
DIMPUSH xmlTags[], e
ENDIF

//////////////////////////////////////////////
// Start the whole process over again, the
// container has been closed.
//////////////////////////////////////////////
matchOpenBracket = -1

ELSE
IF matchOpenBracket = -1
LOCAL j = LEN(parseStack[])-1
currentTag = 0
IF j >= 0 THEN currentTag = parseStack[j]
IF currentTag > 0 AND currentTag <= LEN(xmlTags[])
IF LEN(xmlTags[currentTag].content$) > 0
xmlTags[currentTag].content$ = xmlTags[currentTag].content$ + c$
ELSE
IF ASC(c$) <> 32 AND ASC(c$) <> 9 THEN xmlTags[currentTag].content$ = xmlTags[currentTag].content$ + c$
ENDIF
ENDIF

ENDIF
ENDIF
//////////////////////////////////////////////
// Helps keep track of previous characters when
// checking for forward slashes, which are used
// to determine the type of tag
//////////////////////////////////////////////
oldChar$ = c$
NEXT
WEND

CLOSEFILE xmlFileNo
ENDFUNCTION



FUNCTION xmlClear:
REDIM xmlTags[0]
ENDFUNCTION



FUNCTION xmlGetElementCount:
RETURN LEN(xmlTags[])
ENDFUNCTION



FUNCTION xmlGetTagName$:elementId
RETURN xmlTags[elementId].tagName$
ENDFUNCTION



FUNCTION xmlGetAttributeValue$:elementId, key$
FOR j = 0 TO xmlGetAttributeCount(elementId)-1
IF xmlTags[elementId].attributes[j].key$ = key$ THEN RETURN xmlTags[elementId].attributes[j].value$
NEXT
ENDFUNCTION



FUNCTION xmlAttributeExists:elementId, key$
FOR j = 0 TO LEN(xmlTags[elementId].attributes[])-1
IF xmlTags[elementId].attributes[j].key$ = key$ THEN RETURN TRUE
NEXT
RETURN FALSE
ENDFUNCTION



FUNCTION xmlGetAttributeKey$:elementId, index
RETURN xmlTags[elementId].attributes[index].key$
ENDFUNCTION



FUNCTION xmlGetAttributeCount:elementId
RETURN LEN(xmlTags[elementId].attributes[])
ENDFUNCTION



FUNCTION xmlGetTagContent$:elementId, includeChildren
LOCAL content$ = xmlTags[elementId].content$
IF includeChildren = TRUE
LOCAL extendedLength = 0
FOR i = 0 TO LEN(xmlTags[])-1
IF xmlTags[i].parentElementId = elementId
content$ = pInsertString$(content$, xmlTags[i].content$, xmlTags[i].parentPos + extendedLength)
extendedLength = extendedLength + LEN(xmlTags[i].content$)
ENDIF
NEXT
ENDIF
RETURN content$
ENDFUNCTION


FUNCTION pParseXmlAttributes:element AS ElementObject, txt$
LOCAL s=0, x=0, s1=0, quote=34
LOCAL key$, value$

FOR j = 0 TO LEN(txt$)-1
x = INSTR(txt$, "=", s)
key$ = UCASE$(TRIM$(MID$(txt$, s, x-s)))
s = INSTR(txt$, CHR$(34), x)+1
s1 = INSTR(txt$, CHR$(39), x)+1

quote = 34
IF s1 > 0
IF s1 < s OR s < 1
s = s1
quote = 39
ENDIF
ENDIF
x = INSTR(txt$, CHR$(quote), s)

value$ = MID$(txt$, s, x-s)
FOR k = 0 TO BOUNDS(escapes$[], 0)-1
value$ = REPLACE$(value$, escapes$[k][0], escapes$[k][1])
NEXT

LOCAL a AS AttributeSet
a.key$ = key$
a.value$ = value$
DIMPUSH element.attributes[], a

s = x+1
j = x
NEXT
ENDFUNCTION



FUNCTION pFindTagNameEndIndex:tagLine$
LOCAL L = LEN(tagLine$)
FOR i = 0 TO L-1
IF MID$(tagLine$, i, 1) = " " THEN RETURN i
NEXT
RETURN L
ENDFUNCTION



FUNCTION pInsertString$:source$, seg$, pos
LOCAL t$ = LEFT$(source$, pos)
source$ = t$ + seg$ + RIGHT$(source$, LEN(source$)-LEN(t$))
RETURN source$
ENDFUNCTION

--- End code ---

Moru:
Lots of comments, nice! My xml-parser is not this complete so I will use yours instead :-)

Kitty Hello:
Can you parse the gpap files (GLBasic project files) with this? That would be... like awesome.

phaelax:
Theoretically it should parse the gbap files since they're xml.  Just tested it, but seems I have a bug parsing the attributes for closed tags. I'll work on it some more

Wampus:
Oh! Keep debugging.  :good:

This is rather awesome. To be able to parse xml in GLBasic would open up some interesting possibilities.

Navigation

[0] Message Index

[#] Next page

Go to full version