#archiveteam-ot 2020-02-17,Mon

↑back Search

Time Nickname Message
00:02 🔗 simon816 has quit IRC (ZNC 1.6.5 - http://znc.in)
00:04 🔗 simon8162 has quit IRC (ZNC 1.7.5 - https://znc.in)
00:05 🔗 simon816 has joined #archiveteam-ot
00:13 🔗 simon816 has quit IRC (ZNC 1.7.5 - https://znc.in)
00:13 🔗 simon816 has joined #archiveteam-ot
00:27 🔗 simon816 has quit IRC (ZNC 1.7.5 - https://znc.in)
00:28 🔗 simon816 has joined #archiveteam-ot
03:10 🔗 DogsRNice has quit IRC (Read error: Connection reset by peer)
03:12 🔗 DigiDigi has quit IRC (Read error: Operation timed out)
03:12 🔗 Larsenv has quit IRC (Read error: Operation timed out)
03:13 🔗 jrwr has quit IRC (Ping timeout: 264 seconds)
03:16 🔗 Larsenv has joined #archiveteam-ot
03:18 🔗 nyany has quit IRC (Read error: Operation timed out)
03:19 🔗 MrRadar has quit IRC (Remote host closed the connection)
03:21 🔗 voltagex has quit IRC (Ping timeout: 360 seconds)
03:23 🔗 Igloo has quit IRC (Read error: Operation timed out)
03:23 🔗 nyany has joined #archiveteam-ot
03:23 🔗 swebb_ has joined #archiveteam-ot
03:24 🔗 Igloo has joined #archiveteam-ot
03:24 🔗 voltagex has joined #archiveteam-ot
03:24 🔗 DigiDigi has joined #archiveteam-ot
03:24 🔗 swebb has quit IRC (Ping timeout: 360 seconds)
03:24 🔗 swebb_ is now known as swebb
03:26 🔗 MrRadar has joined #archiveteam-ot
03:27 🔗 jrwr has joined #archiveteam-ot
03:30 🔗 OrIdow6 has quit IRC (Read error: Connection reset by peer)
03:34 🔗 OrIdow6 has joined #archiveteam-ot
03:35 🔗 ShellyRol has quit IRC (Read error: Connection reset by peer)
03:36 🔗 ShellyRol has joined #archiveteam-ot
03:40 🔗 Fusl_ has joined #archiveteam-ot
04:02 🔗 Fusl_ has quit IRC (Ping timeout: 864 seconds)
04:39 🔗 Stiletto has quit IRC (Ping timeout: 276 seconds)
04:41 🔗 Stiletto has joined #archiveteam-ot
04:45 🔗 qw3rty_ has joined #archiveteam-ot
04:50 🔗 qw3rty__ has quit IRC (Ping timeout: 276 seconds)
05:24 🔗 thuban2 has joined #archiveteam-ot
05:27 🔗 thuban1 has quit IRC (Read error: Operation timed out)
06:43 🔗 nataraj_ has joined #archiveteam-ot
06:46 🔗 wp494 has quit IRC (LOUD UNNECESSARY QUIT MESSAGES)
07:00 🔗 wp494 has joined #archiveteam-ot
08:51 🔗 revi has quit IRC (Quit: Updating details, brb)
08:51 🔗 revi has joined #archiveteam-ot
09:01 🔗 Craigle has quit IRC (Quit: The Lounge - https://thelounge.chat)
10:56 🔗 BlueMax has quit IRC (Read error: Connection reset by peer)
11:03 🔗 apache2 is there a way to get the list of captured domains for a given TLD in IA?
11:07 🔗 OrIdow6 If it's an extremely small TLD, you can use the CDX server, e.g. curl "https://web.archive.org/cdx/search?url=*.com"
11:09 🔗 OrIdow6 Will only get you 1.5 million captures (that's captures, not URLs or domains), which remain static regardless of pagination, filtering, etc. parameters, last time I checked, so, again, probably not too useful unless it's something extremely obscure
11:14 🔗 apache2 that's a start, thanks
11:17 🔗 OrIdow6 (Interestingly enough, it's apparently possible to query the CDX server with a query consisting of nothing but "*", which gets the expected bad URLs - the lexically first URL in the WBM is apparently "http://%E5", in case anyone wanted to know)
11:17 🔗 apache2 hehe
11:17 🔗 apache2 I tried ie *a.com, no cigar
11:23 🔗 apache2 the CDX files are not available for download?
11:25 🔗 OrIdow6 Not for the whole WBM
11:26 🔗 apache2 so one potential route to go down might be to scrape IA collections for .cdx files, then scrape those for URLs?
11:27 🔗 OrIdow6 You could - most IA items in the WBM (as I recall, and have seen) have their CDX files hidden, but a minority - including most (I think) of what AT puts in - has public CDX files
11:28 🔗 apache2 that's consistent with what I observed browsing random collections just now
11:29 🔗 apache2 OrIdow6: since you seem to be in the know about IA: I've got ~400 gb WARC archives, is there a good place to contribute them so they eventually end up in IA?
11:29 🔗 apache2 (or at least somewhere where other people can use them)
11:37 🔗 OrIdow6 My knowledge of this is very limited, and other people will have more direct experience with it (asking in #archiveteam-bs might be a good idea, if you have more questions on this), but I think that unless they're warcs of e.g. nothing but a site that was under your control anyway, they won't be put into the WBM if you're some random untrusted person, lest the IA risk falsification of history
11:39 🔗 OrIdow6 That being said, you CAN upload them as regular IA items with mediatype "web", in which case they'll be put (automatically, I think) into the "Outsider WARCs" collection at https://archive.org/details/warczone - I don't think these do anything but sit around at the moment, but they'll be around (hopefully) for people in the future
11:45 🔗 davis1 has quit IRC (Ping timeout: 258 seconds)
11:47 🔗 davis1 has joined #archiveteam-ot
11:49 🔗 OrIdow6 apache2: https://archive.fart.website/bin/irclogger_log/archiveteam-bs?date=2019-11-13,Wed&sel=299#l295 https://archive.fart.website/bin/irclogger_log/archiveteam-bs?date=2019-01-29,Tue&sel=142#l138
11:50 🔗 apache2 thank you
11:53 🔗 OrIdow6 You're welcome
12:34 🔗 VADemon_ has joined #archiveteam-ot
12:34 🔗 VADemon has quit IRC (Read error: Connection reset by peer)
13:09 🔗 MaximeleG has joined #archiveteam-ot
13:31 🔗 Craigle has joined #archiveteam-ot
13:32 🔗 Mateon1 has quit IRC (Remote host closed the connection)
13:33 🔗 Mateon1 has joined #archiveteam-ot
15:14 🔗 VADemon_ has quit IRC (left4dead)
16:11 🔗 nataraj_ has quit IRC (Read error: Operation timed out)
16:18 🔗 Panasonic has joined #archiveteam-ot
16:19 🔗 Ravenloft has quit IRC (Read error: Connection reset by peer)
16:21 🔗 DogsRNice has joined #archiveteam-ot
18:11 🔗 igloo259 has joined #archiveteam-ot
18:14 🔗 igloo25 has quit IRC (Ping timeout: 745 seconds)
18:32 🔗 DogsRNice has quit IRC (Ping timeout: 276 seconds)
18:42 🔗 MaximeleG has quit IRC (Quit: MaximeleG)
19:54 🔗 schbirid has joined #archiveteam-ot
20:09 🔗 schbirid any suggestions for a simple tool to visually track postgres table count(*)s over time?
20:09 🔗 schbirid with almost zero setup
20:28 🔗 Flashfire has quit IRC (Remote host closed the connection)
20:28 🔗 kiska has quit IRC (Remote host closed the connection)
20:29 🔗 kiska has joined #archiveteam-ot
20:29 🔗 Flashfire has joined #archiveteam-ot
20:30 🔗 svchfoo3 sets mode: +o kiska
20:30 🔗 svchfoo1 sets mode: +o kiska
20:43 🔗 dashcloud has quit IRC (Quit: http://quassel-irc.org - Chat comfortably. Anywhere.)
20:51 🔗 dashcloud has joined #archiveteam-ot
21:02 🔗 nataraj_ has joined #archiveteam-ot
21:40 🔗 schbirid has quit IRC (Quit: Leaving)
21:40 🔗 godane has joined #archiveteam-ot
22:42 🔗 Jens has quit IRC (Remote host closed the connection)
22:43 🔗 Jens has joined #archiveteam-ot
22:50 🔗 mal has quit IRC (Quit: mal)
22:59 🔗 thuban3 has joined #archiveteam-ot
23:02 🔗 thuban2 has quit IRC (Read error: Operation timed out)
23:02 🔗 mal has joined #archiveteam-ot
23:18 🔗 BlueMax has joined #archiveteam-ot
23:34 🔗 DigiDigi has quit IRC (Remote host closed the connection)
23:38 🔗 DigiDigi has joined #archiveteam-ot
23:45 🔗 icedice has quit IRC (Read error: Operation timed out)

irclogger-viewer