I am trying to parse the child in a tree identify by an ID. This is how my XML looks like:
<Reporte xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<nombre>PML</nombre>
<proceso>MDA</proceso>
<sistema>BCS</sistema>
<area>PÚBLICA</area>
<Resultados>
<Nodo>
<clv_nodo>07CAB-115</clv_nodo>
<Valores>
<Valor>
<fecha>2017-03-15</fecha>
<hora>1</hora>
<pml>1688.02</pml>
<pml_ene>1638.38</pml_ene>
<pml_per>49.64</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>2</hora>
<pml>1446.18</pml>
<pml_ene>1405.81</pml_ene>
<pml_per>40.36</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>3</hora>
<pml>1389.31</pml>
<pml_ene>1351.85</pml_ene>
<pml_per>37.46</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>4</hora>
<pml>1337.1</pml>
<pml_ene>1301.93</pml_ene>
<pml_per>35.17</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>5</hora>
<pml>1532.75</pml>
<pml_ene>1492.39</pml_ene>
<pml_per>40.36</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>6</hora>
<pml>1729.85</pml>
<pml_ene>1683.15</pml_ene>
<pml_per>46.71</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>7</hora>
<pml>1698.2</pml>
<pml_ene>1650.29</pml_ene>
<pml_per>47.92</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>8</hora>
<pml>1700.84</pml>
<pml_ene>1649.62</pml_ene>
<pml_per>51.23</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>9</hora>
<pml>1708.53</pml>
<pml_ene>1652.13</pml_ene>
<pml_per>56.4</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>10</hora>
<pml>1798.48</pml>
<pml_ene>1735.19</pml_ene>
<pml_per>63.29</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>11</hora>
<pml>1656.64</pml>
<pml_ene>1595.8</pml_ene>
<pml_per>60.84</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>12</hora>
<pml>1712.9</pml>
<pml_ene>1648.41</pml_ene>
<pml_per>64.48</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>13</hora>
<pml>1787.72</pml>
<pml_ene>1719.13</pml_ene>
<pml_per>68.59</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>14</hora>
<pml>1851.01</pml>
<pml_ene>1779.59</pml_ene>
<pml_per>71.43</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>15</hora>
<pml>1950.51</pml>
<pml_ene>1873.83</pml_ene>
<pml_per>76.67</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>16</hora>
<pml>1661.94</pml>
<pml_ene>1595.87</pml_ene>
<pml_per>66.07</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>17</hora>
<pml>1740.8</pml>
<pml_ene>1671.24</pml_ene>
<pml_per>69.56</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>18</hora>
<pml>1895.51</pml>
<pml_ene>1820.19</pml_ene>
<pml_per>75.32</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>19</hora>
<pml>2074.18</pml>
<pml_ene>1990.16</pml_ene>
<pml_per>84.02</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>20</hora>
<pml>1959.91</pml>
<pml_ene>1878.22</pml_ene>
<pml_per>81.7</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>21</hora>
<pml>1791.66</pml>
<pml_ene>1719.24</pml_ene>
<pml_per>72.43</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>22</hora>
<pml>1986.59</pml>
<pml_ene>1909.79</pml_ene>
<pml_per>76.8</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>23</hora>
<pml>1709.51</pml>
<pml_ene>1648.21</pml_ene>
<pml_per>61.29</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>24</hora>
<pml>1539.04</pml>
<pml_ene>1488.58</pml_ene>
<pml_per>50.47</pml_per>
<pml_cng>0</pml_cng>
</Valor>
</Valores>
</Nodo>
<Nodo>
<clv_nodo>07BLE-115</clv_nodo>
<Valores>
<Valor>
<fecha>2017-03-15</fecha>
<hora>1</hora>
<pml>1646.19</pml>
<pml_ene>1638.38</pml_ene>
<pml_per>7.81</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>2</hora>
<pml>1413.36</pml>
<pml_ene>1405.81</pml_ene>
<pml_per>7.55</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>3</hora>
<pml>1358.96</pml>
<pml_ene>1351.85</pml_ene>
<pml_per>7.11</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>4</hora>
<pml>1308.71</pml>
<pml_ene>1301.93</pml_ene>
<pml_per>6.77</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>5</hora>
<pml>1499.65</pml>
<pml_ene>1492.39</pml_ene>
<pml_per>7.27</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>6</hora>
<pml>1690.93</pml>
<pml_ene>1683.15</pml_ene>
<pml_per>7.78</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>7</hora>
<pml>1659.78</pml>
<pml_ene>1650.29</pml_ene>
<pml_per>9.49</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>8</hora>
<pml>1661.41</pml>
<pml_ene>1649.62</pml_ene>
<pml_per>11.79</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>9</hora>
<pml>1662.14</pml>
<pml_ene>1652.13</pml_ene>
<pml_per>10</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>10</hora>
<pml>1744.71</pml>
<pml_ene>1735.19</pml_ene>
<pml_per>9.52</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>11</hora>
<pml>1604.34</pml>
<pml_ene>1595.8</pml_ene>
<pml_per>8.54</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>12</hora>
<pml>1657.17</pml>
<pml_ene>1648.41</pml_ene>
<pml_per>8.76</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>13</hora>
<pml>1728.18</pml>
<pml_ene>1719.13</pml_ene>
<pml_per>9.05</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>14</hora>
<pml>1789.15</pml>
<pml_ene>1779.59</pml_ene>
<pml_per>9.56</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>15</hora>
<pml>1883.75</pml>
<pml_ene>1873.83</pml_ene>
<pml_per>9.92</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>16</hora>
<pml>1599.54</pml>
<pml_ene>1595.87</pml_ene>
<pml_per>3.67</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>17</hora>
<pml>1674.73</pml>
<pml_ene>1671.24</pml_ene>
<pml_per>3.49</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>18</hora>
<pml>1823.54</pml>
<pml_ene>1820.19</pml_ene>
<pml_per>3.35</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>19</hora>
<pml>1992.57</pml>
<pml_ene>1990.16</pml_ene>
<pml_per>2.41</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>20</hora>
<pml>1885.48</pml>
<pml_ene>1878.22</pml_ene>
<pml_per>7.26</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>21</hora>
<pml>1726.7</pml>
<pml_ene>1719.24</pml_ene>
<pml_per>7.46</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>22</hora>
<pml>1914.07</pml>
<pml_ene>1909.79</pml_ene>
<pml_per>4.28</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>23</hora>
<pml>1653.54</pml>
<pml_ene>1648.21</pml_ene>
<pml_per>5.32</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>24</hora>
<pml>1495.33</pml>
<pml_ene>1488.58</pml_ene>
<pml_per>6.75</pml_per>
<pml_cng>0</pml_cng>
</Valor>
</Valores>
</Nodo>
</Resultados>
<status>OK</status>
</Reporte>
I am trying to parse "Nodo" data by node index and expand it to the number of its child "Valor" elements, then column bind with "Valor" data.
I am literally using the description from this post:
[R XML - combining parent and child nodes into data frame
So far my code looks like this:
library(XML)
library(plyr)
library(purrr)
api <- function(path) {
url1 <- modify_url("https://ws01.cenace.gob.mx", port = "8082", path = path)
GET(url1)
}
resp <- api("/SWPML/SIM/BCS/MDA/07CAB-115,07BLE-115/2017/03/15/2017/03/15/XML")
url1 <- xmlParse(resp)
mtg_num <- length(xpathSApply(url1, "//Nodo"))
#I am using same names as example
#meeting_list is a 0X0 list
meeting_list <- lapply(seq(mtg_num), function(i) {
races_num <- length(xpathSApply(url1, sprintf("//Resultados[%s]/Valores", i)))
data.frame(
meeting_id = rep(xpathSApply(url1, sprintf("//clv_nodo", i)), races_num)
)
})
final_df <- cbind(do.call(rbind, meeting_list),
xmlToDataFrame(nodes = getNodeSet(url1, "//Valores/Valor")),
XML:::xmlAttrsToDataFrame(getNodeSet(url1, "//Valores/Valor")))
Since meeting_list is 0X0 when I try to bind it to "Valor" it gives me an error:
Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 0, 48
In addition: Warning message:
In combineNamedVectors(lapply(doc, xmlAttrs), attrs, omit, ...) :
no elements to combine across records
How can I possible parse correctly "clv_nodo" and expand it to the 48 values in "Valores".
My desire data frame looks like this:
clv_nodo fecha hora pml pml_ene pml_per pml_cng
1 07CAB-115 2017-03-15 1 1688.02 1638.38 49.64 0
2 07CAB-115 2017-03-15 2 1446.18 1405.81 40.36 0
3 07CAB-115 2017-03-15 3 1389.31 1351.85 37.46 0
4 07CAB-115 2017-03-15 4 1337.1 1301.93 35.17 0
5 07CAB-115 2017-03-15 5 1532.75 1492.39 40.36 0
6 07CAB-115 2017-03-15 6 1729.85 1683.15 46.71 0
7 07CAB-115 2017-03-15 7 1698.2 1650.29 47.92 0
8 07CAB-115 2017-03-15 8 1700.84 1649.62 51.23 0
9 07CAB-115 2017-03-15 9 1708.53 1652.13 56.4 0
10 07CAB-115 2017-03-15 10 1798.48 1735.19 63.29 0
11 07CAB-115 2017-03-15 11 1656.64 1595.8 60.84 0
12 07CAB-115 2017-03-15 12 1712.9 1648.41 64.48 0
13 07CAB-115 2017-03-15 13 1787.72 1719.13 68.59 0
14 07CAB-115 2017-03-15 14 1851.01 1779.59 71.43 0
15 07CAB-115 2017-03-15 15 1950.51 1873.83 76.67 0
16 07CAB-115 2017-03-15 16 1661.94 1595.87 66.07 0
17 07CAB-115 2017-03-15 17 1740.8 1671.24 69.56 0
18 07CAB-115 2017-03-15 18 1895.51 1820.19 75.32 0
19 07CAB-115 2017-03-15 19 2074.18 1990.16 84.02 0
20 07CAB-115 2017-03-15 20 1959.91 1878.22 81.7 0
21 07CAB-115 2017-03-15 21 1791.66 1719.24 72.43 0
22 07CAB-115 2017-03-15 22 1986.59 1909.79 76.8 0
23 07CAB-115 2017-03-15 23 1709.51 1648.21 61.29 0
24 07CAB-115 2017-03-15 24 1539.04 1488.58 50.47 0
25 07BLE-115 2017-03-15 1 1646.19 1638.38 7.81 0
26 07BLE-115 2017-03-15 2 1413.36 1405.81 7.55 0
27 07BLE-115 2017-03-15 3 1358.96 1351.85 7.11 0
28 07BLE-115 2017-03-15 4 1308.71 1301.93 6.77 0
29 07BLE-115 2017-03-15 5 1499.65 1492.39 7.27 0
30 07BLE-115 2017-03-15 6 1690.93 1683.15 7.78 0
31 07BLE-115 2017-03-15 7 1659.78 1650.29 9.49 0
32 07BLE-115 2017-03-15 8 1661.41 1649.62 11.79 0
33 07BLE-115 2017-03-15 9 1662.14 1652.13 10 0
34 07BLE-115 2017-03-15 10 1744.71 1735.19 9.52 0
35 07BLE-115 2017-03-15 11 1604.34 1595.8 8.54 0
36 07BLE-115 2017-03-15 12 1657.17 1648.41 8.76 0
37 07BLE-115 2017-03-15 13 1728.18 1719.13 9.05 0
38 07BLE-115 2017-03-15 14 1789.15 1779.59 9.56 0
39 07BLE-115 2017-03-15 15 1883.75 1873.83 9.92 0
40 07BLE-115 2017-03-15 16 1599.54 1595.87 3.67 0
41 07BLE-115 2017-03-15 17 1674.73 1671.24 3.49 0
42 07BLE-115 2017-03-15 18 1823.54 1820.19 3.35 0
43 07BLE-115 2017-03-15 19 1992.57 1990.16 2.41 0
44 07BLE-115 2017-03-15 20 1885.48 1878.22 7.26 0
45 07BLE-115 2017-03-15 21 1726.7 1719.24 7.46 0
46 07BLE-115 2017-03-15 22 1914.07 1909.79 4.28 0
47 07BLE-115 2017-03-15 23 1653.54 1648.21 5.32 0
48 07BLE-115 2017-03-15 24 1495.33 1488.58 6.75 0
UPDATE Final answer
Dave2e's code worked perfectly for the two IDS that I am parsing in this example (07CAB-115, 07BLE-115) but I need to parse 2500 different IDS. Some of them have empty nodes at a "Valores" node. For example for this one:
<Nodo>
<clv_nodo>07ASJ-115</clv_nodo>
<Valores>
so when I ran Dave2e code I got:
Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 1, 0
This is because in the last part I get a nested list with the last one being 0x0 df<- bind_rows(valornodes) then I was binding the ID with this empty list. The solution, as Dave2e suggested, was filtering clvs by the empty nodes in Valores. In this way the match between the nested list and the list of ID's is correct. This is the final code:
api <- function(path) {
url1 <- modify_url("https://ws01.cenace.gob.mx", port = "8082", path = path)
GET(url1)
}
resp <- api("/SWPML/SIM/BCS/MDA/07CAB-115,07BLE-115,07CAD-115,07ASJ-115/2017/03/26/2017/04/01/XML")
z <- read_xml(resp)
parents <-xml_find_all(z, ".//Nodo")
dfs<-lapply(parents, function(node){
#find clvs name
clvs <-xml_find_all(node, ".//clv_nodo") %>% xml_text()
#Find all children
valors <- node %>% xml_find_all(".//Valor")
#Find all children in Valores
val <- node %>% xml_find_all(".//Valores")
#Filter clvs by empty nodes in Valores
clvs <- clvs[xml_length(val)>0]
#remove cases where the valors nodes have no children nodes
valors <- valors[xml_length(valors)>0]
valornodes <- lapply(valors, function(node){
#get values and names
values <- xml_children(node) %>% xml_text()
names <- xml_children(node) %>% xml_name()
#make data.frame and name the columns
tempdf<- data.frame(t(values), stringsAsFactors = FALSE)
names(tempdf) <- names
tempdf
})
#made data frame with all of results
df<- bind_rows(valornodes)
df<- cbind(clvs,df)
df
})
q <- do.call(rbind.data.frame, dfs)