The sanitizer makes user-supplied text safe to store and re-display on the site.
Here are some guidlines about what is and is not allowed in text on the site:
- HTML is allowed, except for the following tags: <script>, <style>, <iframe>, <link>
- CSS is allowed inline, but as mentioned before, style tags are removed (this prevents users from tampering with the menus to mislead other users).
- JavaScript is not allowed.
- Spam is not allowed. We decide what is spam, but in general, feel free to promote yourself to fellow users in a socially-acceptable manner. If you use a spam-bot to send out the same message to a ton of users (especially those whom you do not know) we will most likely consider it spam.
- Hanging HTML tags (open tags without a closing tag) will be closed so that you do not break the page.
]*\/>/si", "", $code);
$code = preg_replace("/<$tag"."[^>]*>.*?(<\s*\/\s*$tag\s*>|$)/si", "", $code);
do{
$oldCode = $code;
$code = preg_replace("/<\/$tag>/si", "", $code); // strip any extra closing tags
} while($oldCode != $code);
}
return $code;
} // end sanitize_tag()
////
// Closes all hanging tags. Additionally closes self-closing tags such as hr's and br's to make it validate.
//
// Counts all open tags of each type and closes the extras, finds all hanging close tags and makes complementary open
// tags at the beginning.
////
function sanitize_hanging($code){
if(false !== strpos($code, "<")){
// SECURITY NOTES: (known vulnerability).
// JS: doing \bonload="alert('hi')", or \onload or just src="/img/pedlr.png"onload (with no spaces between the " and the onload) allows JS-injection.
//
// Beyond the comments, there still be a vulnerability if an attacker puts tags out of order, in which case many browsers would auto-close the first tag, then the extra closing-tag would let them break out of the page structure.
// For example example trick div content
would have the first div auto-closed, so the closing div may often end out the div one level above where intended. This could let the user get outside of their
// little sandbox.
// Another way to exploit this (other than poorly nested tags) is just poorly ordered tags in the sense that they could use closing tags to close out the previous divs, then later add some new opening tags (allowing any closing tags that appear naturally in the code later to be used as the real closing tags).
// This would allow them to break out of the existing format as well as style (or even remove) the divs below their content.
// Auto-close tags that must be auto-closing: {br,hr,img, input)
$code = preg_replace("/<\s*\/\s*(hr|br|input|img)\s*>/si", "", $code);
$code = preg_replace("/<(hr|br|input|img)( [^>]*?[^\/\s])?\s*>/si", "<$1$2/>", $code);
// Finds all open tags.
$matches = $openTags = $closeTags = array();
if(0 < preg_match_all("/<([a-z0-9-_]+)([^>]*?[^\/\s])?\s*>/si", $code, $matches)){
$tags = $matches[1];
foreach($tags as $currTag){
if(isset($openTags[$currTag])){
$openTags[$currTag]++;
} else {
$openTags[$currTag] = 1;
}
}
}
// Finds all closing tags.
$matches = array();
if(0 < preg_match_all("/<\s*\/\s*([a-z0-9-_]+)\s*>/si", $code, $matches)){
$tags = $matches[1];
foreach($tags as $currTag){
if(isset($closeTags[$currTag])){
$closeTags[$currTag]++;
} else {
$closeTags[$currTag] = 1;
}
}
}
// Finds imbalanced tags and matches them accordingly.
foreach($openTags as $tag=>$count){
$other = getVal($closeTags, $tag, 0);
while($other < $count){
$code .= "$tag>\n";
$other++;
}
}
foreach($closeTags as $tag=>$count){
$other = getVal($openTags, $tag, 0);
while($other < $count){
$code = "<$tag>$code";
$other++;
}
}
}
return $code;
} // end sanitize_hanging()
////
// Removes INLINE JavaScript. Does not remove the script tags (since that should be done elsewhere to remove ALL script tags).
////
function sanitize_inlineJS($code){
$code = preg_replace("/(<[^>]*)([\s]on[^>]*)(>)/si", "$1$3", $code);
$code = preg_replace("/(<[^>]*[\"'\s])(javascript:[^>]*?)([\"'\s]?>)/si", "$1$3", $code);
return $code;
} // end sanitize_js()
////
// Removes content deemed to be spam.
////
function sanitize_spam($code){
// NOTE: We don't know of any spam yet. Can something like Askimet be plugged in here?
return $code;
} // end sanitize_spam()
?>