Jump to content
jasonjonesjones

HSH or PTM - How to choose the best fitter

Recommended Posts

Jason,

 

Thanks for your question. It is not as straight forward as it might seem, but I'll throw a few bits of info out and hopefully others can chime in. I'll start by saying that you can easily build an RTI using both and compare the results for yourself. The RTIBuilder allows you to open an existing project - so you can reprocess a data set without having to detect the spheres and highlights if you have already done that. (there are instructions for this in the user guide)

 

In terms of practical matters, PTM was the original format developed by Tom Malzbender and Dan Gelb at HPLabs and presented at SIGGRAPH in 2001. The HSH fitter was developed in 2007-2008 by a team at UC Santa Cruz under professor James Davis. Tom Malzbender also consulted. There are several viewers for PTMs and RTi out there, and a couple of them only support the ptm format (the original ptmviewer available from the HP website - and a java applet viewer developed by Cliff Lyons) The RTIViewer (available from the CHI website) supports both the .ptm and the .rti file formats (the HSH fitter creates a .rti file) However, the number of rendering modes for the hsh .rti files is limited to Specular Enhancement in the RTIViewer. This is due to limited time and budget on the project that developed both the hshfitter and the RTIViewer and not to any inherent issue in implementing the other modes.

 

In terms of the results themselves, we have found that the HSH approach is superior in terms of rendering more sculptural objects that have self shadowing - where in the ptm in specular enhancement mode you might get just a black area where there is a lot of shadowing, an HSH will have better data. Also HSH does a better job of rendering shiny material as shiny. PTMs are always matte. Sometimes a matte surface is easier to see what you are looking for - so like most things it is a trade-off. Finally, I will note that HSH files are generally bigger than lrgb ptms. PTMs use 6 coefiicients of a bi-quadratic polynomial to describe the surface normal. Tom Malzbender wanted a compact format for this. In HSH, this number is determined by the "order" that you choose when you create it. a first order HSH uses 4, a 2nd order 9 and a third order 16. So this affects the file size of the result, but it's also why the HSH can render shiny surfaces more accurately. There are papers available about PTM and HSH. The best place to find the PTM papers is on the HPLabs ptm page

http://www.hpl.hp.com/research/ptm/?jumpid=reg_r1002_usen_c-001_title_r0001:

 

for HSH - check out this paper:

http://users.soe.ucsc.edu/~prabath/wango_brdfseg.pdf

 

We hope to support more rendering modes for hsh based RTI files in a future version of RTIViewer.

 

Carla

  • Like 1

Share this post


Link to post
Share on other sites

I think Carla's comments are accurate. I don't think I can comment on the practicalities of using them, but in terms of the mathematics, the number of coefficients controls the "possible" quality of the result. Using too few will lose detail of the reflectance, but using too many will overfit and produce incorrect results. In this sense you can think of PTM as about the same as if there were HSH of order 1.5. I think for practical purposes they are all about the same and you can just try them and see what you like.

  • Like 2

Share this post


Link to post
Share on other sites

Thanks for chiming in James! I think that from my perspective having shot hundreds of RTIs that I think HSH definitely has some advantages, it's main disadvantage is that we don't have all the rendering modes in the viewer implemented for it. For shiny material, or sculptural material there is a demonstrable advantage in HSH. I'll still agree with James tough - try both and see what you think!

 

Carla

Share this post


Link to post
Share on other sites

Regarding Professor Davis' comment #3, can you expand on or provide further references to discussion of the problem of overfitting higher-order RTIs, and how one can tell which order-of-fit to use when processing RTIs using the HSH fitter? Are there particular circumstances when you would or wouldn't want to use a higher-order fit (assuming the time needed for fitting or file size constraints are not the primary concern)? If you process a set of images with a higher-order fit, would it be visually apparent when viewing the RTI files that the calculation of normals with too many coefficients is incorrect, or is it a technical error that might not be visually detectable?

Share this post


Link to post
Share on other sites

It's an interesting point about overfitting. When this was being developed they were also making 4th order HSH RTIs which use 25 coefficients per pixel. We decided not to include that in the options for RTIBuilder, because the file sizes are enormous and therefore can't load in many cases, and because of this potential of "overfitting." Maybe we can get James to chime in again on the overfitting issue.

 

Carla

Share this post


Link to post
Share on other sites

To continue about this question...

 

We can see on the paper entitled  "Efficient Robust Image Interpolation and Surface Properties using Polynomial Texture Mapping" by  Mingjing Zhang and Mark S Drew :

Another observation we show here is that in contrast to current thinking, using the original idea of polynomial terms in the lighting direction outperforms the use of hemispherical harmonics (HSH) for matte appearance modelling. 

http://jivp.eurasipjournals.com/content/pdf/1687-5281-2014-25.pdf

 

which is illustrated on this figure with the most widely used objective quality metric, the Peak Signal to Noise Ratio:

2s7xmrd.jpg

 

The higher the PSNR, the better the quality of the reconstructed image. However, I think (as Carla said) there is a demonstrable advantage in HSH. 

HSH visualises fine details such as scratches better than PTM. So the PSNR criteria doesn't permit to approximate the human perception.

 

Moreover, in this comparison, I suppose the original image was used in the calculation of the surface reflectance by pixel. I think it's more pertinent to compare the reconstructed image with the original image which is not used in the PTM and HSH calculation.

 

 

Do you have an idea to measure the performance of the RTI quality reconstruction in term of human perception? Do you know a metric to show HSH allows us to visualise fine details of the microgeometry?

  • Like 1

Share this post


Link to post
Share on other sites

You can think of this as a sparse data interpolation problem. We measure N lighting directions, and we have to fit M polynomial coefficients. The data is noisy, not properly bandpass filtered, has outliers, etc. If you think back to your freshman calculus or other numerical class, in an ideal world you need at least M knowns, to estimate M unknowns, so we have N>M. If this isn't true, then we have an overfitting problem. Of course the data isn't perfect, so as a rule of thumb, lets say we need (N/2)>M. 

 

It was asked above what overfitting is? See the image below for a visual/math way to think of this problem. In practice, it means that you will see "noise" appear when you put the lighting direction at any direction other than the ones you sampled. This is the extra wiggling in the plot on the right that isn't real data. This happens when you have too many polynomial terms.

 

zbNLd.png

 

Now on the question of PTM vs HSH, what are we changing, we are changing the choice of polynomial. They are both polynomials, but maybe one as a term for xy and the other has a term for x^2. The original PTM paper defined 6 terms. The plot given above from the Zhang et al paper has a definition that allows a variable number of terms. This is the first time I've seen PTM defined with a variable number of terms, and I think all existing fitters and viewers use the 6 term definition. HSH are Hemi-spherical Harmonics. They are also polynomials, but have a historical mathematical definition which includes 4 terms, 9 terms, 16 terms, etc. The question is which polynomial terms are best? Well, that depends on your data. We did some experiments in the 2007 time range of just trying random polynomial terms to see if we could do better than PTM or HSH, and indeed we could, but it was dependent on the images we tested, and we abandoned that research before finding a new set of terms which was always better. The plot above says that for whichever set of images was tested, the extended definition of PTM was better than HSH. 

 

In terms of real choices of tools you can use, you can have PTM-6 or HSH-4, HSH-9, HSH-16, there aren't widely available tools for anything else. Since we dont really have compression built into any of the tools, you can expect the file sizes to roughly scale with the number of terms. You can also expect to fit the data better with more terms. Since matte surfaces are more flat and specular surfaces have a bump at the highlight, then thinking about the plot above, we can see that more terms let us represent the bump of the specular highlight better.

 

The last point is about how to evaluate error. In papers we like our nice plots. We generally use some metric that comes down to a number. It might be RSME or some perceptually driven metric, but it always comes down to a "quality number". This is a gross simplification. In practice none of these methods know what the object *really* looks like between light directions, because we didnt capture an image there. So we are making up what it looks like. Its the space between the samples in the plot above, is it straight?  or curved? or has a wiggle? We just dont know. It might be true that the "quality number" thinks the wiggle has lowest error, but viewer A likes the straight fit and viewer B likes the curved fit. This is why I said earlier in this thread that you would have to look and see what you like. In the image processing papers people often use Structual Simimlarity (SSIM) when they want a human perceptual number, but its still just a "quality number" which still grossly simplifies the situation. In my experience its primarily a feel good for researchers to claim they are doing the right thing, but its not substantially different than RSME.

 

http://en.wikipedia.org/wiki/Structural_similarity

  • Like 4

Share this post


Link to post
Share on other sites
You can think of this as a sparse data interpolation problem. We measure N lighting directions, and we have to fit M polynomial coefficients. The data is noisy, not properly bandpass filtered, has outliers, etc. If you think back to your freshman calculus or other numerical class, in an ideal world you need at least M knowns, to estimate M unknowns, so we have N>M. If this isn't true, then we have an overfitting problem. Of course the data isn't perfect, so as a rule of thumb, lets say we need (N/2)>M. 

 

Yes, I remember it's the Nyquist-Shannon sampling theorem 

Thus, if we have a set of N observed intensities (i.e. luminance) at a pixel p, we have to prevent the surface reflectance overfitting in using a maximum of N/2 terms. Whichever fitter, some terms don't have a significant contribution as example the PTM term lu*lv (a2) whose the shape is a horse saddle.

 

I think HemiSpherical Harmonics allow us to approximate the surface reflectance with more shape complexity than the PTM-6. Polynomial Texture Mapping tends to smooth the surface reflectance. In the viewers, the result is small visible variations in the micro geometry between adjacents light directions due to the smoothing. Without specularity pic (or shadow), it's difficult for my brain to imagine the real shape of the surface or to resolve microscopic surface detail if I didn't see the real object before. 

 

In practice none of these methods know what the object *really* looks like between light directions, because we didnt capture an image there. So we are making up what it looks like. Its the space between the samples in the plot above, is it straight?  or curved? or has a wiggle? We just dont know.

 

 

We can take a set de N images of the object. We can put then a photo choosen arbitrarily to one side. We are making up what it looks like when we fit the data with only the N-1 images. Finally, we can compare the photo which didn't use in the calculation with the reconstructed image at the same direction. So, this protocole allows us to know the object *really*  looks like (in term of photorealism) between light directions. We can repeat this for all light directions of our complete set to have a significant comparison.

 

But I'm agree with you, some perceptually driven metric provide a gross simplification. About the Structural Similarity (SSIM), we can read on Wikipedia :

 

Some research papers such as "A comprehensive assessment of the structural similarity index" by Richard Dosselmann and Xue Dong Yang show that SSIM is actually not very precise (not as precise as it claims to be) and that SSIM provides quality scores which are no more correlated to human judgment than MSE (Mean Square Error) values.

 

Thanks

(Sorry for my English)

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...

×
×
  • Create New...