nokogirivshtml5-php
Nokogiri is a Ruby gem that provides a simple and powerful way to parse and search XML and HTML documents. It is built on top of the underlying C library libxml2, which is known for its speed and reliability.
Nokogiri provides a simple and intuitive API for parsing and searching XML and HTML documents, and it is widely used in the Ruby ecosystem for web scraping and data extraction.
One of the main features of Nokogiri is its ability to search and navigate through XML and HTML documents using a CSS or XPath selectors.
Nokogiri also provides a variety of other features that can simplify the process of working with XML and HTML documents. It can automatically handle character encodings and normalize documents, it can parse and search large documents with low memory usage, and it can validate documents against a DTD or schema.
HTML5 is a standards-compliant HTML5 parser and writer written entirely in PHP. It is stable and used in many production websites, and has well over five million downloads.
HTML5 provides the following features:
- An HTML5 serializer
- Support for PHP namespaces
- Composer support
- Event-based (SAX-like) parser
- A DOM tree builder
- Interoperability with QueryPath
- Runs on PHP 5.3.0 or newer
Note that html5-php is a low-level HTML parser and does not feature any query features like CSS selectors.
Highlights
Example Use
require 'nokogiri'
html_string = '<html><head><title>Page Title</title></head><body><h1 class="header-class">Hello World!</h1><p>This is a sample webpage.</p></body></html>'
# Parse the HTML string
doc = Nokogiri::HTML(html_string)
# Extract the class attribute of h1 tag using CSS selector
h1_class = doc.css("h1")[0]['class']
# or XPath
h1_class = doc.xpath("//h1")[0]['class']
puts "H1 class: #{h1_class}"
<?php
// Assuming you installed from Composer:
require "vendor/autoload.php";
use Masterminds\HTML5;
// An example HTML document:
$html = <<< 'HERE'
  <html>
  <head>
    <title>TEST</title>
  </head>
  <body id='foo'>
    <h1>Hello World</h1>
    <p>This is a test of the HTML5 parser.</p>
  </body>
  </html>
HERE;
// Parse the document. $dom is a DOMDocument.
$html5 = new HTML5();
$dom = $html5->loadHTML($html);
// Render it as HTML5:
print $html5->saveHTML($dom);
// Or save it to a file:
$html5->save($dom, 'out.html');