Author Topic: Find duplicate images  (Read 4374 times)

Offline fuzzy70

  • Community Developer
  • Prof. Inline
  • ******
  • Posts: 828
  • Look left, Look right, LOOK OUT!!
    • View Profile
Find duplicate images
« on: 2013-May-02 »
I have just updated/cleaned up my Amiga font converter so it is now just 1 extended type & now can load Amiga colourfonts. The way that the Amiga stores it's characters in the font means that there is a generic character graphic for characters that are not used in the font (see attached).

I could iterate through every character & do a pixel-to-pixel check but that seems a long winded way of doing it, along with not being very efficient. There must be a better way but for some strange reason I can't think of one at the moment   :D.

Any ideas would be welcomed.

Lee
"Why don't you just make ten louder and make ten be the top number and make that a little louder?"
- "These go to eleven."

This Is Spinal Tap (1984)

Offline Marmor

  • Community Developer
  • Prof. Inline
  • ******
  • Posts: 929
  • 96A285CC
    • View Profile
    • my youtube channel
Re: Find duplicate images
« Reply #1 on: 2013-May-02 »
hm pixel by pixel is too slow ill guess

variante 1 : grab the part from the images and save it .
                  grab the next part and save it .
                 compare the files .

variante 2 : grab the part and store it with sprite2mem to memory
                  grab the second part and make the same as first .
                 compare the mem .

my first ideas .

Offline fuzzy70

  • Community Developer
  • Prof. Inline
  • ******
  • Posts: 828
  • Look left, Look right, LOOK OUT!!
    • View Profile
Re: Find duplicate images
« Reply #2 on: 2013-May-02 »
Thanks marmor, your 2nd idea sounds viable, you mean like create a checksum of the arrays that sprite2mem generates & compare that?.

Lee

Sent from my HTC Wildfire using Tapatalk 2

"Why don't you just make ten louder and make ten be the top number and make that a little louder?"
- "These go to eleven."

This Is Spinal Tap (1984)

Offline Marmor

  • Community Developer
  • Prof. Inline
  • ******
  • Posts: 929
  • 96A285CC
    • View Profile
    • my youtube channel
Re: Find duplicate images
« Reply #3 on: 2013-May-02 »
yap create a checksum by using inline c and maybe the pointer for the spritemem .
i dunno if glb will give you the pointer but if not ask kungphoo.


Offline fuzzy70

  • Community Developer
  • Prof. Inline
  • ******
  • Posts: 828
  • Look left, Look right, LOOK OUT!!
    • View Profile
Re: Find duplicate images
« Reply #4 on: 2013-May-02 »
Thanks again Marmor.

Think I saw something about arrays & pointers using inline somewhere on the forum.

I will do a search & if no luck then kungphoo will be consulted :D

Lee

Sent from my HTC Wildfire using Tapatalk 2

"Why don't you just make ten louder and make ten be the top number and make that a little louder?"
- "These go to eleven."

This Is Spinal Tap (1984)

Offline Ian Price

  • Administrator
  • Prof. Inline
  • *******
  • Posts: 4159
  • On the shoulders of giants.
    • View Profile
    • My Apps
Re: Find duplicate images
« Reply #5 on: 2013-May-02 »
I wrote a program in Div (IIRC) to do this many years ago. What it did was read the pixel values the four corners of each sprite (sprite size was declared) in a map and stored the resulting data in an array - any arrays that appeared duplicated tested more pixels in each image. It worked surprisingly well and was very fast too. I used it for ripping game maps (I did a lot of remakes back in the day) - it would place each new sprite onto a tileesheet AND spit out the resulting map data so I could use the data & sprites/tiles directly to recreate the map.
I came. I saw. I played.

Offline fivesprites

  • Mr. Polyvector
  • ***
  • Posts: 101
    • View Profile
    • FiveSprites
Re: Find duplicate images
« Reply #6 on: 2013-May-02 »
I was going to suggest the same as Ian, but then I noticed that the unused cell size does *not* match to the size of each character.  For example, the letter M consumes more than one "unused" cell width.  So checking for the four corners wouldn't work in this case - loads of false positives.

//Andy

Offline fuzzy70

  • Community Developer
  • Prof. Inline
  • ******
  • Posts: 828
  • Look left, Look right, LOOK OUT!!
    • View Profile
Re: Find duplicate images
« Reply #7 on: 2013-May-02 »
Thanks Ian, I know the size of each character as it is stored in the original file & that data is what's used to rip each character from the font file.

I suppose I could sort the characters by their width & compare those ( all characters share a common Y size).

It's just been one of those dumb  moments that the more I stare at the code the less the solution presents itself lol.

Lee

Sent from my HTC Wildfire using Tapatalk 2

"Why don't you just make ten louder and make ten be the top number and make that a little louder?"
- "These go to eleven."

This Is Spinal Tap (1984)

Offline fivesprites

  • Mr. Polyvector
  • ***
  • Posts: 101
    • View Profile
    • FiveSprites
Re: Find duplicate images
« Reply #8 on: 2013-May-02 »
Actually, I was just thinking that to reduce complexity, why not use a simple text file that describes the original format.

Any unused characters you would mark with a non-ascii element - if you see that in your converter, skip over *unused cell* width pixels.

Probably much easier that programatically looking at the pixels.

//Andy


Offline fuzzy70

  • Community Developer
  • Prof. Inline
  • ******
  • Posts: 828
  • Look left, Look right, LOOK OUT!!
    • View Profile
Re: Find duplicate images
« Reply #9 on: 2013-May-02 »
@Andy. Sorry your reply was posted as I was writing mine. Like I said in my last post it may be spanked as I know each characters width & can compare only those of the same width.

Lee

Sent from my HTC Wildfire using Tapatalk 2

"Why don't you just make ten louder and make ten be the top number and make that a little louder?"
- "These go to eleven."

This Is Spinal Tap (1984)

Offline fivesprites

  • Mr. Polyvector
  • ***
  • Posts: 101
    • View Profile
    • FiveSprites
Re: Find duplicate images
« Reply #10 on: 2013-May-02 »
Well, I guess if you know the exact position of each cell then just look at the four corners as Ian suggested.  I doubt many fonts would consume all four of those or they'd look crap ;)

//Andy

Offline fuzzy70

  • Community Developer
  • Prof. Inline
  • ******
  • Posts: 828
  • Look left, Look right, LOOK OUT!!
    • View Profile
Re: Re: Find duplicate images
« Reply #11 on: 2013-May-02 »
Actually, I was just thinking that to reduce complexity, why not use a simple text file that describes the original format.

Any unused characters you would mark with a non-ascii element - if you see that in your converter, skip over *unused cell* width pixels.

Probably much easier that programatically looking at the pixels.

//Andy

The problem with that us that I do not know what characters are missing in the 1st place hence why I need to check for dupes.

Once converted the program will output a sprite sheet with a text file listing the available characters along with their position & size of each one.

Lee

Sent from my HTC Wildfire using Tapatalk 2

"Why don't you just make ten louder and make ten be the top number and make that a little louder?"
- "These go to eleven."

This Is Spinal Tap (1984)

Offline Ian Price

  • Administrator
  • Prof. Inline
  • *******
  • Posts: 4159
  • On the shoulders of giants.
    • View Profile
    • My Apps
Re: Find duplicate images
« Reply #12 on: 2013-May-02 »
Quote
The problem with that us that I do not know what characters are missing in the 1st place hence why I need to check for dupes.
Sorry - I'm a being bit dim here. What do you mean you don't know what characters are missing? Are you meaning that characters within a certain piece of text may be missing in BMP form (eg there's no "a" character in the BMP font but there is in the text etc.)?


[EDIT] I've added a FONT .PNG that I always use to create a new font - it has all the standard chars that I will probably need and number/position them correctly with my own bitmap font routine. I keep it in my GLB folder so i can find it easily (hence all the plus symbols).
« Last Edit: 2013-May-02 by Ian Price »
I came. I saw. I played.

Offline fuzzy70

  • Community Developer
  • Prof. Inline
  • ******
  • Posts: 828
  • Look left, Look right, LOOK OUT!!
    • View Profile
Re: Find duplicate images
« Reply #13 on: 2013-May-02 »
I probably explained that bit wrong Ian sorry.

In the Amiga font file it tells you the lowest character & the highest I.e ASCII 32 to ASCII 255. As you can see from the pic not all characters have a defined graphic (like no lowercase ones in the example). For those not defined it stores a default image which in the example is a rectangle.

Basically it is a major waste of storage as they could have easily just stored an empty 1 pixel width graphic instead.

Lee

Sent from my HTC Wildfire using Tapatalk 2

"Why don't you just make ten louder and make ten be the top number and make that a little louder?"
- "These go to eleven."

This Is Spinal Tap (1984)

Offline fuzzy70

  • Community Developer
  • Prof. Inline
  • ******
  • Posts: 828
  • Look left, Look right, LOOK OUT!!
    • View Profile
Re: Find duplicate images
« Reply #14 on: 2013-May-04 »
I finally went with the checksum method for sorting the dupes. I did try Ian's suggestion initially & it went very well on fonts where each character was the same dimensions (like 8x8 & 9x10 etc) but came a bit unwieldy with fonts that had various size characters.

While all the fonts share a common height most are various width characters (12+ different widths in some cases) so it required sorting them into their sizes which over complicated things thanks to how the data is stored in the file. It is a very good method though Ian & thanks for suggesting it, I will definitely keep it mind for future if I need to compare images of the same dimensions like sprites/tiles for example :good:  .

The checksum method I used would hardly be described as bullet proof as all it does is multiply the current pixel value by the position then add it to a running total for that character as it is being read in by the function, i.e
Code: (glbasic) [Select]
INC Checksm%,Count%*Pix%[Count]There is a chance that it could calculate a value that is the same as another character that increases with small fonts that are monochrome but decreases with colourfonts & larger sizes.

As of yet I have not noticed a font that gives me false duplicates, but there are a lot of fonts around so I am not relying on the current implementation being 100% effective. The good thing about using a checksum is that the method for creating it can be changed easily by just replacing the one line of code above with another method or a call to a dedicated function without affecting the rest of the code.

The function to check the dupes just compares the checksums as follows
Code: (glbasic) [Select]
FUNCTION check_for_dupes: Base%

LOCAL FirstChar%,CheckChar%,Chksum1%,Chksum2%

DIM self.IsDupe%[256]

FOR FirstChar% = self.LoChar TO self.HiChar-1

Chksum1%=self.Checksum[FirstChar%]
IF self.IsDupe%[FirstChar%] = FALSE

FOR CheckChar% = FirstChar%+1 TO self.HiChar
IF self.IsDupe%[CheckChar%] = FALSE
Chksum2%=self.Checksum[CheckChar%]
IF Chksum1%=Chksum2% THEN self.IsDupe%[CheckChar%] = TRUE
ENDIF
NEXT

ENDIF

NEXT

ENDFUNCTION

Thanks all for their suggestions to this problem, it is appreciated  :booze:

Lee
"Why don't you just make ten louder and make ten be the top number and make that a little louder?"
- "These go to eleven."

This Is Spinal Tap (1984)