Dangl.TextConverter
Compatibility
This project targets netstandard1.3, netstandard2.0 and net45 and net40. Due to .Net 4.5.2 being the currently latest supported version
by Microsoft and the xUnit test suite, no tests are run for net45 and net451. No tests are run for .NET Core below the 2.0 release.
The .NET 4.0 target is for compatibility reasons, it is not tested and requires .NET compilers for version 4.5 or newer to properly function.
Project Configuration
If this project is consumed in a project using the full .Net framework with a newer version of
Antlr4.Runtime, the necessary AssemblyBindingRedirects are not automatically generated with the current
dotnet CLI tooling. This is scheduled to be fixed with the 2.0 release. In the meantime, the following should
be added to the consumers csproj:
<PropertyGroup Condition=" '$(TargetFramework)' == 'net461' ">
  <AutoGenerateBindingRedirects>true</AutoGenerateBindingRedirects>
  <GenerateBindingRedirectsOutputType>true</GenerateBindingRedirectsOutputType>
</PropertyGroup>
The Condition=" '$(TargetFramework)' == 'net461' " attribute may be changed as necessary or removed.
Split Html by Class
The Dangl.TextConverter.Html.HtmlToText.ConvertHtmlToPlaintextAndSplitByClassname(string html, string[] classNamesToSplit) method will transform Html to plain text and additionally split it by class names. It will return a list of objects looking like this:
{
    public string Text { get; }
    public HtmlNode HtmlNode {get;}
}
You can access the complete HtmlNode it was split on and get all the classes, attributes and other data you might need. HtmlNode is null for segments it did not split on the classname.
Example
The following Html:
<p>
  Intro
  <div id="077b8e46-31e6-45f5-b1b0-a8210e48259b" class="text-addition text-addition-owner" text-addition-label="12">
    <div class="text-addition-body">
      <span>Body</span>
    </div>
  </div>
  Outro
</p>
split on text-addition returns the following:
[
  {
    "Text": "Intro",
    "HtmlNode": null
  },
  {
    "Text": "Body",
    "HtmlNode": { "ClassNames": [ "text-addition", "text-addition-owner" ] }
  },
  {
    "Text": "Outro",
    "HtmlNode": null
  }
]
Transform Rtf Text to Segmented PlainText
By using the Dangl.TextConverter.Rtf.RtfToText.ConvertRtfToSegmentedText(string rtfInput) method, Rtf text is converted to plain text and segmented
by Rtf bookmarks. This will return text segments that contain plain text representations of the texts as well as tags to indicate the opening and closing of bookmarks.
This is used, for example, in the GAEB & AVA .Net Libraries by DanglIT to work with text additions in GAEB 2000 files.
the following Rtf text:
{\rtf1The value is {\bkmkstart TA31}to be entered{\bkmkend TA31}}
will return the following segments (simplified example for demonstration):
[
  {
    "ClassName": "RtfTextSegment",
    "Text": "The value is "
  },
  {
    "ClassName": "RtfBookmarkStartSegment",
    "Identifier": "TA31"
  },
  {
    "ClassName": "RtfTextSegment",
    "Text": "to be entered"
  },
  {
    "ClassName": "RtfBookmarkEndSegment",
    "Identifier": "TA31"
  }
]
Rtf Line Start Normalization
The extension public static string StringLineStartNormalizationExtensions.NormalizeLineStarts(this string source) can be used to fix strings that are indented below the second line.
For example, the German GAEB 2000 standard uses data formats similar to this:
#begin[Field]This string starts here
             But on the second line it's indented!
             That should be normalized!
#end[Field]
If you extract the string between the #begin[Field] and the #end[Field] tags, you get something like this:
This string starts here
             But on the second line it's indented!
             That should be normalized!
All but the first lines are indented. To fix such strings, the extension method can be used.
HtmlAgilityPackCompatibility
With version 1.10.0, some breaking API changes were introduced to the HtmlAgilityPack. The handling of self-closing or not-closed <p> tags was changed. This setting is controlled by the
static HtmlDocument.DisableBehavaiorTagP property, which defaults to false. Internally, the HtmlToText class sets this property to false in its static method calls to maintain
compatibility with downstream packages by DanglIT. If you independently rely on the new behavior, please make sure to wrap calls to Dangl.TextConverter's HtmlToText class
and make restore the option to its previous state.
private static void EnsureLegacyBehavior()
{
    if (HtmlDocument.DisableBehavaiorTagP)
    {
        HtmlDocument.DisableBehavaiorTagP = false;
    }
}
With v1.11.27 of HtmlAgilityPack (see https://github.com/zzzprojects/html-agility-pack/releases/tag/v1.11.27), a feature was introduced to recognize a opened and closed br tag like <br></br> as two instances of the br tag, since the br tag itelf is always self closing. This didn't introduce any problems in our downstream projects, so we're not correcting it. However, the code below should document how the previous behavior can be turned back on
if (HtmlNode.ElementsFlags.ContainsKey("br"))
{
    HtmlNode.ElementsFlags.Remove("br");
    // The default adds two flags: HtmlElementFlag.Empty | HtmlElementFlag.Closed
    // Where 'Closed' means it's always treated as self-closed
    HtmlNode.ElementsFlags.Add("br", HtmlElementFlag.Empty);
}
IRtfImageConverter
With v2.0.0, the interface IRtfImageConverter was introduced. When converting between RTF and other formats, it's sometimes required to convert some image types to other formats, e.g. GIF is not supported in RTF, and DIB (Device Independent Bitmap) is hardly supported anywhere outside of RTF. The new package Dangl.TextConverter.RtfImageConversion provides implementations for those features.
Assembly Strong Naming & Usage in Signed Applications
This module produces strong named assemblies when compiled. When consumers of this package require strongly named assemblies, for example when they
themselves are signed, the outputs should work as-is.
The key file to create the strong name is adjacent to the csproj file in the root of the source project. Please note that this does not increase
security or provide tamper-proof binaries, as the key is available in the source code per 
Microsoft guidelines