Dangl.TextConverter

    Build Status NuGet MyGet

    Built with Nuke

    Online Documentation
    Changelog

    Compatibility

    This project targets netstandard1.3, netstandard2.0 and net45. Due to .Net 4.5.2 being the currently latest supported version by Microsoft and the xUnit test suite, no tests are run for net45 and net451. No tests are run for .NET Core below the 2.0 release.

    Project Configuration

    If this project is consumed in a project using the full .Net framework with a newer version of Antlr4.Runtime, the necessary AssemblyBindingRedirects are not automatically generated with the current dotnet CLI tooling. This is scheduled to be fixed with the 2.0 release. In the meantime, the following should be added to the consumers csproj:

    <PropertyGroup Condition=" '$(TargetFramework)' == 'net461' ">
      <AutoGenerateBindingRedirects>true</AutoGenerateBindingRedirects>
      <GenerateBindingRedirectsOutputType>true</GenerateBindingRedirectsOutputType>
    </PropertyGroup>
    

    The Condition=" '$(TargetFramework)' == 'net461' " attribute may be changed as necessary or removed.

    Split Html by Class

    The Dangl.TextConverter.Html.HtmlToText.ConvertHtmlToPlaintextAndSplitByClassname(string html, string[] classNamesToSplit) method will transform Html to plain text and additionally split it by class names. It will return a list of objects looking like this:

    {
        public string Text { get; }
        public HtmlNode HtmlNode {get;}
    }
    

    You can access the complete HtmlNode it was split on and get all the classes, attributes and other data you might need. HtmlNode is null for segments it did not split on the classname.

    Example

    The following Html:

    <p>
      Intro
      <div id="077b8e46-31e6-45f5-b1b0-a8210e48259b" class="text-addition text-addition-owner" text-addition-label="12">
        <div class="text-addition-body">
          <span>Body</span>
        </div>
      </div>
      Outro
    </p>
    

    split on text-addition returns the following:

    [
      {
        "Text": "Intro",
        "HtmlNode": null
      },
      {
        "Text": "Body",
        "HtmlNode": { "ClassNames": [ "text-addition", "text-addition-owner" ] }
      },
      {
        "Text": "Outro",
        "HtmlNode": null
      }
    ]
    

    Transform Rtf Text to Segmented PlainText

    By using the Dangl.TextConverter.Rtf.RtfToText.ConvertRtfToSegmentedText(string rtfInput) method, Rtf text is converted to plain text and segmented by Rtf bookmarks. This will return text segments that contain plain text representations of the texts as well as tags to indicate the opening and closing of bookmarks. This is used, for example, in the GAEB & AVA .Net Libraries by DanglIT to work with text additions in GAEB 2000 files.

    the following Rtf text:

    {\rtf1The value is {\bkmkstart TA31}to be entered{\bkmkend TA31}}
    

    will return the following segments (simplified example for demonstration):

    [
      {
        "ClassName": "RtfTextSegment",
        "Text": "The value is "
      },
      {
        "ClassName": "RtfBookmarkStartSegment",
        "Identifier": "TA31"
      },
      {
        "ClassName": "RtfTextSegment",
        "Text": "to be entered"
      },
      {
        "ClassName": "RtfBookmarkEndSegment",
        "Identifier": "TA31"
      }
    ]
    
    • Improve this Doc
    Back to top © Dangl IT - Georg Dangl