[00:02] *** simon816 has quit IRC (ZNC 1.6.5 - http://znc.in) [00:04] *** simon8162 has quit IRC (ZNC 1.7.5 - https://znc.in) [00:05] *** simon816 has joined #archiveteam-ot [00:13] *** simon816 has quit IRC (ZNC 1.7.5 - https://znc.in) [00:13] *** simon816 has joined #archiveteam-ot [00:27] *** simon816 has quit IRC (ZNC 1.7.5 - https://znc.in) [00:28] *** simon816 has joined #archiveteam-ot [03:10] *** DogsRNice has quit IRC (Read error: Connection reset by peer) [03:12] *** DigiDigi has quit IRC (Read error: Operation timed out) [03:12] *** Larsenv has quit IRC (Read error: Operation timed out) [03:13] *** jrwr has quit IRC (Ping timeout: 264 seconds) [03:16] *** Larsenv has joined #archiveteam-ot [03:18] *** nyany has quit IRC (Read error: Operation timed out) [03:19] *** MrRadar has quit IRC (Remote host closed the connection) [03:21] *** voltagex has quit IRC (Ping timeout: 360 seconds) [03:23] *** Igloo has quit IRC (Read error: Operation timed out) [03:23] *** nyany has joined #archiveteam-ot [03:23] *** swebb_ has joined #archiveteam-ot [03:24] *** Igloo has joined #archiveteam-ot [03:24] *** voltagex has joined #archiveteam-ot [03:24] *** DigiDigi has joined #archiveteam-ot [03:24] *** swebb has quit IRC (Ping timeout: 360 seconds) [03:24] *** swebb_ is now known as swebb [03:26] *** MrRadar has joined #archiveteam-ot [03:27] *** jrwr has joined #archiveteam-ot [03:30] *** OrIdow6 has quit IRC (Read error: Connection reset by peer) [03:34] *** OrIdow6 has joined #archiveteam-ot [03:35] *** ShellyRol has quit IRC (Read error: Connection reset by peer) [03:36] *** ShellyRol has joined #archiveteam-ot [03:40] *** Fusl_ has joined #archiveteam-ot [04:02] *** Fusl_ has quit IRC (Ping timeout: 864 seconds) [04:39] *** Stiletto has quit IRC (Ping timeout: 276 seconds) [04:41] *** Stiletto has joined #archiveteam-ot [04:45] *** qw3rty_ has joined #archiveteam-ot [04:50] *** qw3rty__ has quit IRC (Ping timeout: 276 seconds) [05:24] *** thuban2 has joined #archiveteam-ot [05:27] *** thuban1 has quit IRC (Read error: Operation timed out) [06:43] *** nataraj_ has joined #archiveteam-ot [06:46] *** wp494 has quit IRC (LOUD UNNECESSARY QUIT MESSAGES) [07:00] *** wp494 has joined #archiveteam-ot [08:51] *** revi has quit IRC (Quit: Updating details, brb) [08:51] *** revi has joined #archiveteam-ot [09:01] *** Craigle has quit IRC (Quit: The Lounge - https://thelounge.chat) [10:56] *** BlueMax has quit IRC (Read error: Connection reset by peer) [11:03] is there a way to get the list of captured domains for a given TLD in IA? [11:07] If it's an extremely small TLD, you can use the CDX server, e.g. curl "https://web.archive.org/cdx/search?url=*.com" [11:09] Will only get you 1.5 million captures (that's captures, not URLs or domains), which remain static regardless of pagination, filtering, etc. parameters, last time I checked, so, again, probably not too useful unless it's something extremely obscure [11:14] that's a start, thanks [11:17] (Interestingly enough, it's apparently possible to query the CDX server with a query consisting of nothing but "*", which gets the expected bad URLs - the lexically first URL in the WBM is apparently "http://%E5", in case anyone wanted to know) [11:17] hehe [11:17] I tried ie *a.com, no cigar [11:23] the CDX files are not available for download? [11:25] Not for the whole WBM [11:26] so one potential route to go down might be to scrape IA collections for .cdx files, then scrape those for URLs? [11:27] You could - most IA items in the WBM (as I recall, and have seen) have their CDX files hidden, but a minority - including most (I think) of what AT puts in - has public CDX files [11:28] that's consistent with what I observed browsing random collections just now [11:29] OrIdow6: since you seem to be in the know about IA: I've got ~400 gb WARC archives, is there a good place to contribute them so they eventually end up in IA? [11:29] (or at least somewhere where other people can use them) [11:37] My knowledge of this is very limited, and other people will have more direct experience with it (asking in #archiveteam-bs might be a good idea, if you have more questions on this), but I think that unless they're warcs of e.g. nothing but a site that was under your control anyway, they won't be put into the WBM if you're some random untrusted person, lest the IA risk falsification of history [11:39] That being said, you CAN upload them as regular IA items with mediatype "web", in which case they'll be put (automatically, I think) into the "Outsider WARCs" collection at https://archive.org/details/warczone - I don't think these do anything but sit around at the moment, but they'll be around (hopefully) for people in the future [11:45] *** davis1 has quit IRC (Ping timeout: 258 seconds) [11:47] *** davis1 has joined #archiveteam-ot [11:49] apache2: https://archive.fart.website/bin/irclogger_log/archiveteam-bs?date=2019-11-13,Wed&sel=299#l295 https://archive.fart.website/bin/irclogger_log/archiveteam-bs?date=2019-01-29,Tue&sel=142#l138 [11:50] thank you [11:53] You're welcome [12:34] *** VADemon_ has joined #archiveteam-ot [12:34] *** VADemon has quit IRC (Read error: Connection reset by peer) [13:09] *** MaximeleG has joined #archiveteam-ot [13:31] *** Craigle has joined #archiveteam-ot [13:32] *** Mateon1 has quit IRC (Remote host closed the connection) [13:33] *** Mateon1 has joined #archiveteam-ot [15:14] *** VADemon_ has quit IRC (left4dead) [16:11] *** nataraj_ has quit IRC (Read error: Operation timed out) [16:18] *** Panasonic has joined #archiveteam-ot [16:19] *** Ravenloft has quit IRC (Read error: Connection reset by peer) [16:21] *** DogsRNice has joined #archiveteam-ot [18:11] *** igloo259 has joined #archiveteam-ot [18:14] *** igloo25 has quit IRC (Ping timeout: 745 seconds) [18:32] *** DogsRNice has quit IRC (Ping timeout: 276 seconds) [18:42] *** MaximeleG has quit IRC (Quit: MaximeleG) [19:54] *** schbirid has joined #archiveteam-ot [20:09] any suggestions for a simple tool to visually track postgres table count(*)s over time? [20:09] with almost zero setup [20:28] *** Flashfire has quit IRC (Remote host closed the connection) [20:28] *** kiska has quit IRC (Remote host closed the connection) [20:29] *** kiska has joined #archiveteam-ot [20:29] *** Flashfire has joined #archiveteam-ot [20:30] *** svchfoo3 sets mode: +o kiska [20:30] *** svchfoo1 sets mode: +o kiska [20:43] *** dashcloud has quit IRC (Quit: http://quassel-irc.org - Chat comfortably. Anywhere.) [20:51] *** dashcloud has joined #archiveteam-ot [21:02] *** nataraj_ has joined #archiveteam-ot [21:40] *** schbirid has quit IRC (Quit: Leaving) [21:40] *** godane has joined #archiveteam-ot [22:42] *** Jens has quit IRC (Remote host closed the connection) [22:43] *** Jens has joined #archiveteam-ot [22:50] *** mal has quit IRC (Quit: mal) [22:59] *** thuban3 has joined #archiveteam-ot [23:02] *** thuban2 has quit IRC (Read error: Operation timed out) [23:02] *** mal has joined #archiveteam-ot [23:18] *** BlueMax has joined #archiveteam-ot [23:34] *** DigiDigi has quit IRC (Remote host closed the connection) [23:38] *** DigiDigi has joined #archiveteam-ot [23:45] *** icedice has quit IRC (Read error: Operation timed out)