Codesnippets > Code Snippets
xml parser
phaelax:
I originally made this with DarkBasic and decided to port it over. I was able to make a few changes since GLB lets me use arrays in Types. There is a bug or two. It's adding the closing tag of a root node to the array when it shouldn't. And the function which can return a tag's inner content doesn't work properly when it includes the inner content of all it's children. I don't have these issues in the DB version so I'm guessing it's something to do with GLB indices being zero-based while DB starts with 1. Or maybe I just copied something over wrong. There's a text xml file here: http://zimnox.com/quiz.xml
Any time you call xmlReadFile() you should call xmlClear() first.
xmlReadFile(string)
xmlGetElementCount()
xmlGetTagName(int)
xmlGetAttirbuteValue$(int, string)
xmlAttributeExists(int, string)
xmlGetAttributeKey$(int, int)
xmlGetAttributeCount(int)
xmlGetTagContent$(int, bool)
xmlClear()
--- Code: (glbasic) ---// --------------------------------- //
// Project: XMLParser
// Author: Phaelax
// Start: Wednesday, January 26, 2011
// IDE Version: 8.078
TYPE AttributeSet
key$
value$
ENDTYPE
TYPE ElementObject
tagName$
parentElementId
content$
pos
parentPos
attributes[0] AS AttributeSet
ENDTYPE
GLOBAL escapes$[]
DIM escapes$[5][2]
escapes$[0][0] = "<" ; escapes$[0][1] = "<"
escapes$[1][0] = ">" ; escapes$[1][1] = ">"
escapes$[2][0] = "&" ; escapes$[2][1] = "&"
escapes$[3][0] = "'"; escapes$[3][1] = "'"
escapes$[4][0] = """; escapes$[4][1] = CHR$(34)
GLOBAL xmlTags[] AS ElementObject
GLOBAL parseStack[]
DIM xmlTags[0]
//xmlReadFile("c:/quiz.xml")
xmlReadFile("C:/Documents AND Settings/Phaelax.NEWTON64/My Documents/GLBasic/zelda/zelda.gbap")
LOCAL key$
LOCAL y = 0
FOR i = 0 TO xmlGetElementCount()-1
PRINT i+": "+xmlTags[i].tagName$+" -> "+xmlGetTagContent$(i, FALSE), 50, y;INC y, 10
FOR j = 0 TO xmlGetAttributeCount(i)-1
key$ = xmlGetAttributeKey$(i, j)
PRINT key$ + " -> " + xmlGetAttributeValue$(i, key$), 100, y;INC y, 10
NEXT
NEXT
SHOWSCREEN
KEYWAIT
END
FUNCTION xmlReadFile:filename$
LOCAL xmlFileNo = 1
LOCAL L$, tagName$, c$, oldChar$, temp$, unparsedAttributes$
LOCAL matchOpenBracket, tagType, strLength, currentTag
OPENFILE(xmlFileNo, filename$, TRUE)
WHILE ENDOFFILE(xmlFileNo) = FALSE
READLINE xmlFileNo, L$
tagName$ = ""
matchOpenBracket = -1
tagType = 0
strLength = LEN(L$)
FOR i = 0 TO strLength-1
c$ = MID$(L$, i, 1)
//////////////////////////////////////////////
// open bracket found for new tag
//////////////////////////////////////////////
IF c$ = "<"
matchOpenBracket = i
tagType = 0
ENDIF
//////////////////////////////////////////////
// forward slash can either be part of a
// closing container tag, or closing an empty
//////////////////////////////////////////////
IF c$ = "/"
//////////////////////////////////////////////
// If part of a closing tag, the slash will be
// prefixed by the bracket (less-than sign)
//////////////////////////////////////////////
IF oldChar$ = "<" THEN tagType = 1
ENDIF
//////////////////////////////////////////////
// Closing bracket for a tag
//////////////////////////////////////////////
IF c$ = ">"
//////////////////////////////////////////////
// if character before closing bracket was
// a slash, then this bracket closed off an
// empty tag
//////////////////////////////////////////////
IF oldChar$ = "/"
tagType = 2
ELSE
//////////////////////////////////////////////
// "<? ?>" is part of the XML declaration
//////////////////////////////////////////////
IF oldChar$ = "?"
tagType = 2
ELSE
//////////////////////////////////////////////
// Normal close bracket, standard container
//////////////////////////////////////////////
ENDIF
ENDIF
//////////////////////////////////////////////
// If we closed off (completed) the opening
// tag's bracket, then it's open as the current
// container. Add this tag to the container stack
// for tracking the hierarchy and store a new
// tag element in the array
//////////////////////////////////////////////
IF tagType = 0
LOCAL e AS ElementObject
e.pos = matchOpenBracket
temp$ = MID$(L$, matchOpenBracket+1, i-matchOpenBracket-1)
e.tagName$ = TRIM$(UCASE$(LEFT$(temp$, pFindTagNameEndIndex(temp$))))
pParseXmlAttributes(e, TRIM$(RIGHT$(temp$, LEN(temp$)-LEN(e.tagName$))))
e.content$ = ""
//////////////////////////////////////////////
// A parent ID of -1 means it is the root node
//////////////////////////////////////////////
IF LEN(parseStack[]) <= 0
e.parentElementId = -1
ELSE
e.parentElementId = LEN(parseStack[])-1
//////////////////////////////////////////////
// The position within the parent tag's content
// where this tag's data is present
//////////////////////////////////////////////
e.parentPos = LEN(xmlTags[e.parentElementId].content$)
ENDIF
DIMPUSH xmlTags[], e
//////////////////////////////////////////////
// Add the index of the last tag element added
// to the xmlTags array to the stack. This keeps
// track of what container we're in
//////////////////////////////////////////////
DIMPUSH parseStack[], LEN(xmlTags[])-1
ENDIF
//////////////////////////////////////////////
// Closing tag was found, remove last container
// from stack
//////////////////////////////////////////////
IF tagType = 1
DIMDEL parseStack[], -1
ENDIF
//////////////////////////////////////////////
// This was an empty tag element. As they are
// not containers, nothing is added to the stack
// and nothing needs removed. Create a new
// element and add it to the xmlTags array.
//////////////////////////////////////////////
IF tagType = 2
LOCAL e AS ElementObject
//////////////////////////////////////////////
// Checks for special case with XML declaration
//////////////////////////////////////////////
IF oldChar$ <> "?"
temp$ = MID$(L$, matchOpenBracket+1, i-matchOpenBracket-2)
ELSE
temp$ = MID$(L$, matchOpenBracket+2, i-matchOpenBracket-3)
ENDIF
e.tagName$ = TRIM$(UCASE$(LEFT$(temp$, pFindTagNameEndIndex(temp$))))
pParseXmlAttributes(e, TRIM$(RIGHT$(temp$, LEN(temp$)-LEN(e.tagName$))))
e.content$ = ""
IF LEN(parseStack[]) <= 0
e.parentElementId = -1
ELSE
e.parentElementId = LEN(parseStack[])-1
//////////////////////////////////////////////
// The position within the parent tag's content
// where this tag's data begins
//////////////////////////////////////////////
e.parentPos = LEN(xmlTags[e.parentElementId].content$)
ENDIF
DIMPUSH xmlTags[], e
ENDIF
//////////////////////////////////////////////
// Start the whole process over again, the
// container has been closed.
//////////////////////////////////////////////
matchOpenBracket = -1
ELSE
IF matchOpenBracket = -1
LOCAL j = LEN(parseStack[])-1
currentTag = 0
IF j >= 0 THEN currentTag = parseStack[j]
IF currentTag > 0 AND currentTag <= LEN(xmlTags[])
IF LEN(xmlTags[currentTag].content$) > 0
xmlTags[currentTag].content$ = xmlTags[currentTag].content$ + c$
ELSE
IF ASC(c$) <> 32 AND ASC(c$) <> 9 THEN xmlTags[currentTag].content$ = xmlTags[currentTag].content$ + c$
ENDIF
ENDIF
ENDIF
ENDIF
//////////////////////////////////////////////
// Helps keep track of previous characters when
// checking for forward slashes, which are used
// to determine the type of tag
//////////////////////////////////////////////
oldChar$ = c$
NEXT
WEND
CLOSEFILE xmlFileNo
ENDFUNCTION
FUNCTION xmlClear:
REDIM xmlTags[0]
ENDFUNCTION
FUNCTION xmlGetElementCount:
RETURN LEN(xmlTags[])
ENDFUNCTION
FUNCTION xmlGetTagName$:elementId
RETURN xmlTags[elementId].tagName$
ENDFUNCTION
FUNCTION xmlGetAttributeValue$:elementId, key$
FOR j = 0 TO xmlGetAttributeCount(elementId)-1
IF xmlTags[elementId].attributes[j].key$ = key$ THEN RETURN xmlTags[elementId].attributes[j].value$
NEXT
ENDFUNCTION
FUNCTION xmlAttributeExists:elementId, key$
FOR j = 0 TO LEN(xmlTags[elementId].attributes[])-1
IF xmlTags[elementId].attributes[j].key$ = key$ THEN RETURN TRUE
NEXT
RETURN FALSE
ENDFUNCTION
FUNCTION xmlGetAttributeKey$:elementId, index
RETURN xmlTags[elementId].attributes[index].key$
ENDFUNCTION
FUNCTION xmlGetAttributeCount:elementId
RETURN LEN(xmlTags[elementId].attributes[])
ENDFUNCTION
FUNCTION xmlGetTagContent$:elementId, includeChildren
LOCAL content$ = xmlTags[elementId].content$
IF includeChildren = TRUE
LOCAL extendedLength = 0
FOR i = 0 TO LEN(xmlTags[])-1
IF xmlTags[i].parentElementId = elementId
content$ = pInsertString$(content$, xmlTags[i].content$, xmlTags[i].parentPos + extendedLength)
extendedLength = extendedLength + LEN(xmlTags[i].content$)
ENDIF
NEXT
ENDIF
RETURN content$
ENDFUNCTION
FUNCTION pParseXmlAttributes:element AS ElementObject, txt$
LOCAL s=0, x=0, s1=0, quote=34
LOCAL key$, value$
FOR j = 0 TO LEN(txt$)-1
x = INSTR(txt$, "=", s)
key$ = UCASE$(TRIM$(MID$(txt$, s, x-s)))
s = INSTR(txt$, CHR$(34), x)+1
s1 = INSTR(txt$, CHR$(39), x)+1
quote = 34
IF s1 > 0
IF s1 < s OR s < 1
s = s1
quote = 39
ENDIF
ENDIF
x = INSTR(txt$, CHR$(quote), s)
value$ = MID$(txt$, s, x-s)
FOR k = 0 TO BOUNDS(escapes$[], 0)-1
value$ = REPLACE$(value$, escapes$[k][0], escapes$[k][1])
NEXT
LOCAL a AS AttributeSet
a.key$ = key$
a.value$ = value$
DIMPUSH element.attributes[], a
s = x+1
j = x
NEXT
ENDFUNCTION
FUNCTION pFindTagNameEndIndex:tagLine$
LOCAL L = LEN(tagLine$)
FOR i = 0 TO L-1
IF MID$(tagLine$, i, 1) = " " THEN RETURN i
NEXT
RETURN L
ENDFUNCTION
FUNCTION pInsertString$:source$, seg$, pos
LOCAL t$ = LEFT$(source$, pos)
source$ = t$ + seg$ + RIGHT$(source$, LEN(source$)-LEN(t$))
RETURN source$
ENDFUNCTION
--- End code ---
Moru:
Lots of comments, nice! My xml-parser is not this complete so I will use yours instead :-)
Kitty Hello:
Can you parse the gpap files (GLBasic project files) with this? That would be... like awesome.
phaelax:
Theoretically it should parse the gbap files since they're xml. Just tested it, but seems I have a bug parsing the attributes for closed tags. I'll work on it some more
Wampus:
Oh! Keep debugging. :good:
This is rather awesome. To be able to parse xml in GLBasic would open up some interesting possibilities.
Navigation
[0] Message Index
[#] Next page
Go to full version