Find duplicate images

Previous topic - Next topic

fuzzy70

I have just updated/cleaned up my Amiga font converter so it is now just 1 extended type & now can load Amiga colourfonts. The way that the Amiga stores it's characters in the font means that there is a generic character graphic for characters that are not used in the font (see attached).

I could iterate through every character & do a pixel-to-pixel check but that seems a long winded way of doing it, along with not being very efficient. There must be a better way but for some strange reason I can't think of one at the moment   :D.

Any ideas would be welcomed.

Lee
"Why don't you just make ten louder and make ten be the top number and make that a little louder?"
- "These go to eleven."

This Is Spinal Tap (1984)

Marmor

hm pixel by pixel is too slow ill guess

variante 1 : grab the part from the images and save it .
                  grab the next part and save it .
                 compare the files .

variante 2 : grab the part and store it with sprite2mem to memory
                  grab the second part and make the same as first .
                 compare the mem .

my first ideas .

fuzzy70

Thanks marmor, your 2nd idea sounds viable, you mean like create a checksum of the arrays that sprite2mem generates & compare that?.

Lee

Sent from my HTC Wildfire using Tapatalk 2

"Why don't you just make ten louder and make ten be the top number and make that a little louder?"
- "These go to eleven."

This Is Spinal Tap (1984)

Marmor

yap create a checksum by using inline c and maybe the pointer for the spritemem .
i dunno if glb will give you the pointer but if not ask kungphoo.


fuzzy70

Thanks again Marmor.

Think I saw something about arrays & pointers using inline somewhere on the forum.

I will do a search & if no luck then kungphoo will be consulted :D

Lee

Sent from my HTC Wildfire using Tapatalk 2

"Why don't you just make ten louder and make ten be the top number and make that a little louder?"
- "These go to eleven."

This Is Spinal Tap (1984)

Ian Price

I wrote a program in Div (IIRC) to do this many years ago. What it did was read the pixel values the four corners of each sprite (sprite size was declared) in a map and stored the resulting data in an array - any arrays that appeared duplicated tested more pixels in each image. It worked surprisingly well and was very fast too. I used it for ripping game maps (I did a lot of remakes back in the day) - it would place each new sprite onto a tileesheet AND spit out the resulting map data so I could use the data & sprites/tiles directly to recreate the map.
I came. I saw. I played.

fivesprites

I was going to suggest the same as Ian, but then I noticed that the unused cell size does *not* match to the size of each character.  For example, the letter M consumes more than one "unused" cell width.  So checking for the four corners wouldn't work in this case - loads of false positives.

//Andy

fuzzy70

Thanks Ian, I know the size of each character as it is stored in the original file & that data is what's used to rip each character from the font file.

I suppose I could sort the characters by their width & compare those ( all characters share a common Y size).

It's just been one of those dumb  moments that the more I stare at the code the less the solution presents itself lol.

Lee

Sent from my HTC Wildfire using Tapatalk 2

"Why don't you just make ten louder and make ten be the top number and make that a little louder?"
- "These go to eleven."

This Is Spinal Tap (1984)

fivesprites

Actually, I was just thinking that to reduce complexity, why not use a simple text file that describes the original format.

Any unused characters you would mark with a non-ascii element - if you see that in your converter, skip over *unused cell* width pixels.

Probably much easier that programatically looking at the pixels.

//Andy


fuzzy70

@Andy. Sorry your reply was posted as I was writing mine. Like I said in my last post it may be spanked as I know each characters width & can compare only those of the same width.

Lee

Sent from my HTC Wildfire using Tapatalk 2

"Why don't you just make ten louder and make ten be the top number and make that a little louder?"
- "These go to eleven."

This Is Spinal Tap (1984)

fivesprites

Well, I guess if you know the exact position of each cell then just look at the four corners as Ian suggested.  I doubt many fonts would consume all four of those or they'd look crap ;)

//Andy

fuzzy70

Quote from: fivesprites on 2013-May-02
Actually, I was just thinking that to reduce complexity, why not use a simple text file that describes the original format.

Any unused characters you would mark with a non-ascii element - if you see that in your converter, skip over *unused cell* width pixels.

Probably much easier that programatically looking at the pixels.

//Andy

The problem with that us that I do not know what characters are missing in the 1st place hence why I need to check for dupes.

Once converted the program will output a sprite sheet with a text file listing the available characters along with their position & size of each one.

Lee

Sent from my HTC Wildfire using Tapatalk 2

"Why don't you just make ten louder and make ten be the top number and make that a little louder?"
- "These go to eleven."

This Is Spinal Tap (1984)

Ian Price

#12
QuoteThe problem with that us that I do not know what characters are missing in the 1st place hence why I need to check for dupes.
Sorry - I'm a being bit dim here. What do you mean you don't know what characters are missing? Are you meaning that characters within a certain piece of text may be missing in BMP form (eg there's no "a" character in the BMP font but there is in the text etc.)?


[EDIT] I've added a FONT .PNG that I always use to create a new font - it has all the standard chars that I will probably need and number/position them correctly with my own bitmap font routine. I keep it in my GLB folder so i can find it easily (hence all the plus symbols).
I came. I saw. I played.

fuzzy70

I probably explained that bit wrong Ian sorry.

In the Amiga font file it tells you the lowest character & the highest I.e ASCII 32 to ASCII 255. As you can see from the pic not all characters have a defined graphic (like no lowercase ones in the example). For those not defined it stores a default image which in the example is a rectangle.

Basically it is a major waste of storage as they could have easily just stored an empty 1 pixel width graphic instead.

Lee

Sent from my HTC Wildfire using Tapatalk 2

"Why don't you just make ten louder and make ten be the top number and make that a little louder?"
- "These go to eleven."

This Is Spinal Tap (1984)

fuzzy70

I finally went with the checksum method for sorting the dupes. I did try Ian's suggestion initially & it went very well on fonts where each character was the same dimensions (like 8x8 & 9x10 etc) but came a bit unwieldy with fonts that had various size characters.

While all the fonts share a common height most are various width characters (12+ different widths in some cases) so it required sorting them into their sizes which over complicated things thanks to how the data is stored in the file. It is a very good method though Ian & thanks for suggesting it, I will definitely keep it mind for future if I need to compare images of the same dimensions like sprites/tiles for example :good:  .

The checksum method I used would hardly be described as bullet proof as all it does is multiply the current pixel value by the position then add it to a running total for that character as it is being read in by the function, i.e
Code (glbasic) Select
INC Checksm%,Count%*Pix%[Count]
There is a chance that it could calculate a value that is the same as another character that increases with small fonts that are monochrome but decreases with colourfonts & larger sizes.

As of yet I have not noticed a font that gives me false duplicates, but there are a lot of fonts around so I am not relying on the current implementation being 100% effective. The good thing about using a checksum is that the method for creating it can be changed easily by just replacing the one line of code above with another method or a call to a dedicated function without affecting the rest of the code.

The function to check the dupes just compares the checksums as follows
Code (glbasic) Select
FUNCTION check_for_dupes: Base%

LOCAL FirstChar%,CheckChar%,Chksum1%,Chksum2%

DIM self.IsDupe%[256]

FOR FirstChar% = self.LoChar TO self.HiChar-1

Chksum1%=self.Checksum[FirstChar%]
IF self.IsDupe%[FirstChar%] = FALSE

FOR CheckChar% = FirstChar%+1 TO self.HiChar
IF self.IsDupe%[CheckChar%] = FALSE
Chksum2%=self.Checksum[CheckChar%]
IF Chksum1%=Chksum2% THEN self.IsDupe%[CheckChar%] = TRUE
ENDIF
NEXT

ENDIF

NEXT

ENDFUNCTION


Thanks all for their suggestions to this problem, it is appreciated  :booze:

Lee
"Why don't you just make ten louder and make ten be the top number and make that a little louder?"
- "These go to eleven."

This Is Spinal Tap (1984)