web analytics

New R Package For HTTP Headers Hashing – Source: securityboulevard.com

Rate this post

Source: securityboulevard.com – Author: hrbrmstr

HTTP Headers Hashing (HHHash) is a technique developed by Alexandre Dulaunoy to generate a fingerprint of an HTTP server based on the headers it returns. It employs one-way hashing to generate a hash value from the list of header keys returned by the server. The HHHash value is calculated by concatenating the list of headers returned, ordered by sequence, with each header value separated by a colon. The SHA256 of this concatenated list is then taken to generate the HHHash value. HHHash incorporates a version identifier to enable updates to new hashing functions.

While effective, HHHash’s performance relies heavily on the characteristics of the HTTP requests, so correlations are typically only established using the same crawler parameters. Locality-sensitive hashing (LSH) could be used to calculate distances between sets of headers for more efficient comparisons. There are some limitations with some LSH algorithms (such as the need to pad content to a minimum byte length) that make the initial use of SHA256 hashes a bit more straightforward.

Alexandre made a Python library for it, and I cranked out an R package for it as well.

There are three functions exposed by {hhhash}:

  • build_hash_from_response: Build a hash from headers in a curl


    response object
  • build_hash_from_url: Build a hash from headers retrieved from a URL
  • hash_headers: Build a hash from a vector of HTTP header keys

The build_hash_from_url function relies on {curl} vs {httr} since {httr} uses curl::parse_headers() which (rightfully so) lowercases the header keys. We need to preserve both order and case for the hash to be useful.

Here is some sample usage:

remotes::install_github("hrbrmstr/hhhash")

library(hhhash)

build_hash_from_url("https://www.circl.lu/")
## [1] "hhh:1:78f7ef0651bac1a5ea42ed9d22242ed8725f07815091032a34ab4e30d3c3cefc"

res <- curl::curl_fetch_memory("https://www.circl.lu/", curl::new_handle())

build_hash_from_response(res)
## [1] "hhh:1:78f7ef0651bac1a5ea42ed9d22242ed8725f07815091032a34ab4e30d3c3cefc"

c(
  "Date", "Server", "Strict-Transport-Security",
  "Last-Modified", "ETag", "Accept-Ranges",
  "Content-Length", "Content-Security-Policy",
  "X-Content-Type-Options", "X-Frame-Options",
  "X-XSS-Protection", "Content-Type"
) -> keys

hash_headers(keys)
## [1] "hhh:1:78f7ef0651bac1a5ea42ed9d22242ed8725f07815091032a34ab4e30d3c3cefc"

*** This is a Security Bloggers Network syndicated blog from rud.is authored by hrbrmstr. Read the original post at: https://rud.is/b/2023/07/09/new-r-package-hhhash/

Original Post URL: https://securityboulevard.com/2023/07/new-r-package-for-http-headers-hashing/

Category & Tags: Security Bloggers Network,R – Security Bloggers Network,R

LinkedIn
Twitter
Facebook
WhatsApp
Email

advisor pick´S post

More Latest Published Posts