Excel

Contar el número de palabras en una cadena

¿Alguna vez has necesitado contar el número de palabras en una cadena?

Exploremos algunas formas en las que podemos lograrlo.

Uso de VBA simple (más o menos, pero no del todo)

A continuación se muestran 2 funciones que se pueden emplear para obtener el recuento de palabras de una cadena. Sin embargo, esto solo funciona correctamente si se sigue la estructura de oración adecuada. Además, ambos enfoques se basan en el uso de mi función TrimMultipleSpaces() que, a su vez, se basa en RegEx de todos modos. Por eso, le recomendaría que simplemente use uno de los 2 enfoques RegEx presentados en la siguiente sección, ya que son más precisos/confiables y solo realizan una llamada RegEx singular.

'---------------------------------------------------------------------------------------
' Procedure : CountWords
' Author    : Daniel Pineault, CARDA Consultants Inc.
' Website   : 
' Purpose   : Counts the number of word in a string.
'               Good for basic usage, but RegEx/MSHTML approaches are more reliable
' Copyright : The following is release as Attribution-ShareAlike 4.0 International
'             (CC BY-SA 4.0) - 
' Req'd Refs: None required
' Dependencies: TrimMultipleSpaces()
'               
'
' Input Variables:
' ~~~~~~~~~~~~~~~~
' sInput    : String to count the words of
'
' Usage:
' ~~~~~~
' ? CountWords("  Just a quick test     !?")
'   Returns -> 4
'
' ? CountWords("  Just a quick test     !?Another sentence would go here.")
'   Returns -> 9
'
' ? CountWords("  Just a quick test     !?Another sentence would go here .")
'   Returns -> 10  '********incorrect
'
' Revision History:
' Rev       Date(yyyy-mm-dd)        Description
' **************************************************************************************
' 1         2024-03-23
'---------------------------------------------------------------------------------------
Public Function CountWords(ByVal vInput As Variant) As Long
On Error GoTo Error_Handler

    If Len(Trim(vInput & vbNullString)) = 0 Then GoTo Error_Handler_Exit
    vInput = TrimMultipleSpaces(vInput)    'Correct for multiple spaces
    vInput = Trim(vInput)
    CountWords = UBound(Split(vInput, " ")) + 1
    
Error_Handler_Exit:
    On Error Resume Next
    Exit Function
 
Error_Handler:
    MsgBox "The following error has occurred" & vbCrLf & vbCrLf & _
           "Error Source: CountWords" & vbCrLf & _
           "Error Number: " & Err.Number & vbCrLf & _
           "Error Description: " & Err.Description & _
           Switch(Erl = 0, "", Erl  0, vbCrLf & "Line No: " & Erl) _
           , vbOKOnly + vbCritical, "An Error has Occurred!"
    Resume Error_Handler_Exit
End Function
'---------------------------------------------------------------------------------------
' Procedure : CountWords
' Author    : Daniel Pineault, CARDA Consultants Inc.
' Website   : 
' Purpose   : Word count that ignores punctuation
'               Good for basic usage, but RegEx/MSHTML approaches are more reliable
' Copyright : The following is release as Attribution-ShareAlike 4.0 International
'             (CC BY-SA 4.0) - 
' Req'd Refs: None required
' Dependencies: TrimMultipleSpaces()
'               
'
' Input Variables:
' ~~~~~~~~~~~~~~~~
' sInput    : String to count the words of
'
' Usage:
' ~~~~~~
' ? CountWords("  Just a quick test     !?")
'   Returns -> 4
'
' ? CountWords("  Just a quick test     !?Another sentence would go here.")
'   Returns -> 9
'
' ? CountWords("  Just a quick test     !?Another sentence would go here .")
'   Returns -> 10  '********incorrect
'
' Revision History:
' Rev       Date(yyyy-mm-dd)        Description
' **************************************************************************************
' 1         2024-03-23
'---------------------------------------------------------------------------------------
Public Function CountWords(ByVal vInput As Variant) As Long
On Error GoTo Error_Handler
    Dim lCharsLeft            As Long

    If Len(Trim(vInput & vbNullString)) = 0 Then GoTo Error_Handler_Exit
    vInput = TrimMultipleSpaces(vInput)    'Correct for multiple spaces
    vInput = Trim(vInput)
    lCharsLeft = Len(Replace(vInput, " ", ""))
    If lCharsLeft = 0 Then
        CountWords = 0
    Else
        CountWords = Len(vInput) - Len(Replace(vInput, " ", "")) + 1
    End If
 
Error_Handler_Exit:
    On Error Resume Next
    Exit Function
 
Error_Handler:
    MsgBox "The following error has occurred" & vbCrLf & vbCrLf & _
           "Error Source: CountWords" & vbCrLf & _
           "Error Number: " & Err.Number & vbCrLf & _
           "Error Description: " & Err.Description & _
           Switch(Erl = 0, "", Erl  0, vbCrLf & "Line No: " & Erl) _
           , vbOKOnly + vbCritical, "An Error has Occurred!"
    Resume Error_Handler_Exit
End Function

¡Usando el poder de las expresiones regulares!

Dado que las dos funciones anteriores requerían expresiones regulares y no eran precisas en todos los casos, exploré la posibilidad de usar expresiones regulares puras para lograrlo. ¡En expresiones regulares, es increíblemente fácil obtener un recuento de palabras!

Solo necesitamos usar el patrón simple ‘\w+(-\w+)*’ y luego contar la cantidad de coincidencias encontradas. Es muy sencillo. RegEx hace todo el trabajo pesado y se ocupa de todo lo demás por nosotros.

Uso de la biblioteca RegEx de VBScript

'---------------------------------------------------------------------------------------
' Procedure : RegEx_CountWords
' Author    : Daniel Pineault, CARDA Consultants Inc.
' Website   : 
' Purpose   : Count the number of word in a string. (ignores punctuation)
' Copyright : The following is release as Attribution-ShareAlike 4.0 International
'             (CC BY-SA 4.0) - 
' Req'd Refs: Late Binding  -> None required
'             Early Binding -> Microsoft VBScript Regular Expressions 5.5
'
' Input Variables:
' ~~~~~~~~~~~~~~~~
' sInput    : String to count the words of
'
' Usage:
' ~~~~~~
' ? RegEx_CountWords("  Just a quick test     !?")
'   Returns -> 4
'
' ? RegEx_CountWords("  Just a quick test     !?Another sentence would go here.")
'   Returns -> 9
'
' Revision History:
' Rev       Date(yyyy-mm-dd)        Description
' **************************************************************************************
' 1         2024-03-23
'---------------------------------------------------------------------------------------
Public Function RegEx_CountWords(ByVal vInput As Variant) As Long
    On Error GoTo Error_Handler

    #Const RegEx_EarlyBind = False   'True => Early Binding / False => Late Binding
    #If RegEx_EarlyBind = True Then
        Dim oRegEx            As VBScript_RegExp_55.RegExp
        Dim oMatches          As VBScript_RegExp_55.MatchCollection

        Set oRegEx = New VBScript_RegExp_55.RegExp
    #Else
        Dim oRegEx            As Object
        Dim oMatches          As Object

        Set oRegEx = CreateObject("VBScript.RegExp")
    #End If

    If Not IsNull(vInput) Then
        With oRegEx
            .Pattern = "\w+(-\w+)*"    'Remove multi-space and leading/trailing spaces
            .Global = True
            .IgnoreCase = True
            .MultiLine = True

            Set oMatches = .Execute(vInput)
        End With

        RegEx_CountWords = oMatches.Count
    Else
        RegEx_CountWords = 0
    End If

Error_Handler_Exit:
    On Error Resume Next
    Set oMatches = Nothing
    Set oRegEx = Nothing
    Exit Function

Error_Handler:
    MsgBox "The following error has occured" & vbCrLf & vbCrLf & _
           "Error Number: " & Err.Number & vbCrLf & _
           "Error Source: RegEx_CountWords" & vbCrLf & _
           "Error Description: " & Err.Description & _
           Switch(Erl = 0, "", Erl  0, vbCrLf & "Line No: " & Erl) _
           , vbOKOnly + vbCritical, "An Error has Occured!"
    Resume Error_Handler_Exit
End Function

Uso de la biblioteca MSHTML RegEx

Planificando con un poco de anticipación cuándo se retirarían las expresiones regulares de VBScript, también desarrollé una versión alternativa utilizando la biblioteca de objetos HTML de Microsoft.

'---------------------------------------------------------------------------------------
' Procedure : MSHTML_CountWords
' Author    : Daniel Pineault, CARDA Consultants Inc.
' Website   : 
' Purpose   : Count the number of word in a string. (ignores punctuation)
'               Version of RegEx using MSHTML to get around VBScript deprecation
' Copyright : The following is release as Attribution-ShareAlike 4.0 International
'             (CC BY-SA 4.0) - 
' Req'd Refs: Late Binding  -> None required
'             Early Binding -> Microsoft HTML Object Library
'
' Input Variables:
' ~~~~~~~~~~~~~~~~
' sInput    : String to count the words of
'
' Usage:
' ~~~~~~
' ? MSHTML_CountWords("  Just a quick test     !?")
'   Returns -> 4
'
' ? MSHTML_CountWords("  Just a quick test     !?Another sentence would go here.")
'   Returns -> 9
'
' Revision History:
' Rev       Date(yyyy-mm-dd)        Description
' **************************************************************************************
' 1         2024-03-23
'---------------------------------------------------------------------------------------
Public Function MSHTML_CountWords(ByVal vInput As Variant) As Long
    On Error GoTo Error_Handler
    #Const HTMLFile_EarlyBind = False     'True => Early Binding / False => Late Binding
    #If HTMLFile_EarlyBind = True Then
        Dim oHTMLFile         As MSHTML.HTMLDocument
        Dim oElem             As MSHTML.HTMLGenericElement

        Set oHTMLFile = New MSHTML.HTMLDocument
    #Else
        Dim oHTMLFile         As Object
        Dim oElem             As Object

        Set oHTMLFile = CreateObject("HTMLFile")
    #End If

    If Len(Trim(vInput & vbNullString)) = 0 Then GoTo Error_Handler_Exit

    oHTMLFile.body.innerHTML = ""    'Clear the default 

line Set oElem = oHTMLFile.createElement("p") Call oElem.setAttribute("id", "result") Call oHTMLFile.body.appendChild(oElem) Set oElem = oHTMLFile.createElement("script") Call oElem.setAttribute("type", "text/javascript") 'Not strictly necessary as this is the default value anyways oElem.innerText = "function regEx_WordCount() {" & _ " var regEx_Pattern = /\w+(-\w+)*/igm;" & _ " var myInput="" & vInput & "";" & _ " var regEx_Matches = myInput.match(regEx_Pattern);" & _ " document.getElementById('result').innerText = regEx_Matches.length;" & _ " return regEx_Matches.length;" & _ "}" Call oHTMLFile.body.appendChild(oElem) Call oHTMLFile.parentWindow.execScript("regEx_WordCount();") MSHTML_CountWords = oHTMLFile.getElementById("result").innerText Error_Handler_Exit: On Error Resume Next Set oElem = Nothing Set oHTMLFile = Nothing Exit Function Error_Handler: MsgBox "The following error has occurred" & vbCrLf & vbCrLf & _ "Error Source: MSHTML_CountWords" & vbCrLf & _ "Error Number: " & Err.Number & vbCrLf & _ "Error Description: " & Err.Description & _ Switch(Erl = 0, "", Erl 0, vbCrLf & "Line No: " & Erl) _ , vbOKOnly + vbCritical, "An Error has Occurred!" Resume Error_Handler_Exit End Function

Uso de la automatización de palabras

'---------------------------------------------------------------------------------------
' Procedure : Word_CountWords
' Author    : Daniel Pineault, CARDA Consultants Inc.
' Website   : 
' Purpose   : Count the number of word in a string via Word automation
' Copyright : The following is release as Attribution-ShareAlike 4.0 International
'             (CC BY-SA 4.0) - 
' Req'd Refs: Late Binding  -> None required
'             Early Binding -> Microsoft Word XX.X Object Library
'
' Input Variables:
' ~~~~~~~~~~~~~~~~
' sInput    : String to count the words of
'
' Usage:
' ~~~~~~
' ? Word_CountWords("  Just a quick test     !?")
'   Returns -> 5   !!!WRONG!!!  Come one Microsoft.
'
' ? Word_CountWords("  Just a quick test     !?Another sentence would go here.")
'   Returns -> 9
'
' Revision History:
' Rev       Date(yyyy-mm-dd)        Description
' **************************************************************************************
' 1         2024-03-23
'---------------------------------------------------------------------------------------
Public Function Word_CountWords(ByVal sInput As String) As Long
On Error GoTo Error_Handler
    #Const Word_EarlyBind = False    'True => Early Binding / False => Late Binding
                                     'Microsoft Word XX.X Object Library
    #If Word_EarlyBind = True Then
        Dim oWord             As Word.Application
        Dim oDoc              As Word.Document
        
        Set oWord = New Word.Application
    #Else
        Dim oWord             As Object
        Dim oDoc              As Object
        Const wdStatisticWords = 0
        
        Set oWord = CreateObject("Word.Application")
    #End If

    oWord.Visible = False
    Set oDoc = oWord.Documents.Add
    oDoc.Activate
    oWord.Selection.TypeText sInput
    'Word_CountWords = oDoc.Words.Count - 1 'Incorrect value that take into account irrelevant stuff!  KB 212705
    Word_CountWords = oDoc.ComputeStatistics(wdStatisticWords)
 
Error_Handler_Exit:
    On Error Resume Next
    oWord.Quit False
    Set oWord = Nothing
    Exit Function
 
Error_Handler:
    oWord.Visible = True ' Don't leave a hidden instance!
    MsgBox "The following error has occurred" & vbCrLf & vbCrLf & _
           "Error Source: Word_CountWords" & vbCrLf & _
           "Error Number: " & Err.Number & vbCrLf & _
           "Error Description: " & Err.Description & _
           Switch(Erl = 0, "", Erl  0, vbCrLf & "Line No: " & Erl) _
           , vbOKOnly + vbCritical, "An Error has Occurred!"
    Resume Error_Handler_Exit
End Function

Lamentablemente, el recuento que devuelve Word puede ser incorrecto. Consulte el siguiente artículo para obtener más detalles:

Así que ahí lo tienes. Al final del día, tienes dos funciones RegEx confiables que puedes usar para obtener el recuento de palabras en cualquier cadena. Más allá de eso, te recomendaría no usar VBA simple o automatización de Word.

Publicaciones relacionadas

Deja una respuesta

Tu dirección de correo electrónico no será publicada. Los campos obligatorios están marcados con *

Botón volver arriba