http_rewrite — Module for rewriting HTTP content and headers
The primary purpose of this module is to rewrite links (URLs) for proxying. The configuration is divided in two sections: request and response for dealing with the HTTP request and response respectively.
Each section consists of rule and content elements. Each rule must be given a name (attribute "name") and these are referred to from content elements. The content defines what rules are invoked.
Each rule consists of one or more rewrite elements. The rewrite specifies a regular expression for matching content in the attribute "from" and the corresponding attribute "from" specifies the result. The "to" result may refer to named groups in any "from" pattern already executed. For example, in the response section a rule may refer to both groups in the response already executed and all rules executed in the request section.
Each content section takes exactly one "type" attribute, which specifies what area is inspected for rewriting. Type may be one of "html" (for HTML content), "headers" for HTTP headers or "quoted-literal" for Java Script type of content. The content section takes one or more "within" elements. That specifies where inside the content, each rule is being executed. All within must have a "rule" attribute that specifies the rule section to be invoked (rule@name as mentioned earlier).
For "html" content, the within element takes also attributes "tag" and "attr", that specifies tag and attributes to be inspected. The attr attributes takes one or more attributes (comma-separated). If no "tag" is given, the rule is performed on all attributes with the name given.
For "headers" content, the within element takes "header" or "reqline" + the "rule" attribute. For "header", the rule is performed on all HTTP headers with the name in header. For "reqline", the HTTP Request line is rewritten.
For "quoted-literal" content, the within element takes only a "rule" attribute and the rule is performed on all content.
# Metaproxy XML config file schemas
#
# Copyright (C) Index Data
# See the LICENSE file for details.
namespace mp = "http://indexdata.com/metaproxy"
rewrite = element mp:rewrite {
attribute from { xsd:string },
attribute to { xsd:string }
}
rule = element mp:rule {
attribute name { xsd:string },
rewrite*
}
within = element mp:within {
attribute tag { xsd:string }?,
attribute attr { xsd:string }?,
attribute type { xsd:string }?,
attribute header { xsd:string }?,
attribute reqline { xsd:string }?,
attribute rule { xsd:string }
}
content = element mp:content {
attribute type { xsd:string },
attribute mime { xsd:string }?,
within*
}
section = (rule | content)*
filter_http_rewrite =
attribute type { "http_rewrite" },
attribute id { xsd:NCName }?,
attribute name { xsd:NCName }?,
element mp:request {
attribute verbose { xsd:string },
section
}?,
element mp:response {
attribute verbose { xsd:string },
section
}?
Configuration:
<filter type="http_rewrite"> <request verbose="1"> <!-- save pxhost and pxpath for later --> <rule name="url"> <rewrite from='(?<proto>https?://)(?<pxhost>[^ /?#]+)/(?<pxpath>[^ /]+)/(?<host>[^ /]+)(?<path>[^ ]*)' to='${proto}${host}${path}' /> <rewrite from='(?:Host: )(.*)' to='Host: ${host}' /> </rule> <content type="headers"> <within reqline="1" rule="url"/> </content> </request> <response verbose="1"> <!-- rewrite "back" - using pxhost and pxpath --> <rule name="url"> <rewrite from='(?<proto>https?://)(?<host>[^/?# "'>]+)/(?<path>[^ "'>]+)' to='${proto}${pxhost}/${pxpath}/${host}/${path}' /> </rule> <content type="headers"> <within header="link" rule="url"/> </content> <content type="html"> <within tag="script" attr="#text" type="quoted-literal" rule="url"/> <within attr="href,src" rule="url"/> <within attr="onclick" type="quoted-literal" rule="url"/> </content> </response> </filter>