Time |
Nickname |
Message |
00:02
🔗
|
|
simon816 has quit IRC (ZNC 1.6.5 - http://znc.in) |
00:04
🔗
|
|
simon8162 has quit IRC (ZNC 1.7.5 - https://znc.in) |
00:05
🔗
|
|
simon816 has joined #archiveteam-ot |
00:13
🔗
|
|
simon816 has quit IRC (ZNC 1.7.5 - https://znc.in) |
00:13
🔗
|
|
simon816 has joined #archiveteam-ot |
00:27
🔗
|
|
simon816 has quit IRC (ZNC 1.7.5 - https://znc.in) |
00:28
🔗
|
|
simon816 has joined #archiveteam-ot |
03:10
🔗
|
|
DogsRNice has quit IRC (Read error: Connection reset by peer) |
03:12
🔗
|
|
DigiDigi has quit IRC (Read error: Operation timed out) |
03:12
🔗
|
|
Larsenv has quit IRC (Read error: Operation timed out) |
03:13
🔗
|
|
jrwr has quit IRC (Ping timeout: 264 seconds) |
03:16
🔗
|
|
Larsenv has joined #archiveteam-ot |
03:18
🔗
|
|
nyany has quit IRC (Read error: Operation timed out) |
03:19
🔗
|
|
MrRadar has quit IRC (Remote host closed the connection) |
03:21
🔗
|
|
voltagex has quit IRC (Ping timeout: 360 seconds) |
03:23
🔗
|
|
Igloo has quit IRC (Read error: Operation timed out) |
03:23
🔗
|
|
nyany has joined #archiveteam-ot |
03:23
🔗
|
|
swebb_ has joined #archiveteam-ot |
03:24
🔗
|
|
Igloo has joined #archiveteam-ot |
03:24
🔗
|
|
voltagex has joined #archiveteam-ot |
03:24
🔗
|
|
DigiDigi has joined #archiveteam-ot |
03:24
🔗
|
|
swebb has quit IRC (Ping timeout: 360 seconds) |
03:24
🔗
|
|
swebb_ is now known as swebb |
03:26
🔗
|
|
MrRadar has joined #archiveteam-ot |
03:27
🔗
|
|
jrwr has joined #archiveteam-ot |
03:30
🔗
|
|
OrIdow6 has quit IRC (Read error: Connection reset by peer) |
03:34
🔗
|
|
OrIdow6 has joined #archiveteam-ot |
03:35
🔗
|
|
ShellyRol has quit IRC (Read error: Connection reset by peer) |
03:36
🔗
|
|
ShellyRol has joined #archiveteam-ot |
03:40
🔗
|
|
Fusl_ has joined #archiveteam-ot |
04:02
🔗
|
|
Fusl_ has quit IRC (Ping timeout: 864 seconds) |
04:39
🔗
|
|
Stiletto has quit IRC (Ping timeout: 276 seconds) |
04:41
🔗
|
|
Stiletto has joined #archiveteam-ot |
04:45
🔗
|
|
qw3rty_ has joined #archiveteam-ot |
04:50
🔗
|
|
qw3rty__ has quit IRC (Ping timeout: 276 seconds) |
05:24
🔗
|
|
thuban2 has joined #archiveteam-ot |
05:27
🔗
|
|
thuban1 has quit IRC (Read error: Operation timed out) |
06:43
🔗
|
|
nataraj_ has joined #archiveteam-ot |
06:46
🔗
|
|
wp494 has quit IRC (LOUD UNNECESSARY QUIT MESSAGES) |
07:00
🔗
|
|
wp494 has joined #archiveteam-ot |
08:51
🔗
|
|
revi has quit IRC (Quit: Updating details, brb) |
08:51
🔗
|
|
revi has joined #archiveteam-ot |
09:01
🔗
|
|
Craigle has quit IRC (Quit: The Lounge - https://thelounge.chat) |
10:56
🔗
|
|
BlueMax has quit IRC (Read error: Connection reset by peer) |
11:03
🔗
|
apache2 |
is there a way to get the list of captured domains for a given TLD in IA? |
11:07
🔗
|
OrIdow6 |
If it's an extremely small TLD, you can use the CDX server, e.g. curl "https://web.archive.org/cdx/search?url=*.com" |
11:09
🔗
|
OrIdow6 |
Will only get you 1.5 million captures (that's captures, not URLs or domains), which remain static regardless of pagination, filtering, etc. parameters, last time I checked, so, again, probably not too useful unless it's something extremely obscure |
11:14
🔗
|
apache2 |
that's a start, thanks |
11:17
🔗
|
OrIdow6 |
(Interestingly enough, it's apparently possible to query the CDX server with a query consisting of nothing but "*", which gets the expected bad URLs - the lexically first URL in the WBM is apparently "http://%E5", in case anyone wanted to know) |
11:17
🔗
|
apache2 |
hehe |
11:17
🔗
|
apache2 |
I tried ie *a.com, no cigar |
11:23
🔗
|
apache2 |
the CDX files are not available for download? |
11:25
🔗
|
OrIdow6 |
Not for the whole WBM |
11:26
🔗
|
apache2 |
so one potential route to go down might be to scrape IA collections for .cdx files, then scrape those for URLs? |
11:27
🔗
|
OrIdow6 |
You could - most IA items in the WBM (as I recall, and have seen) have their CDX files hidden, but a minority - including most (I think) of what AT puts in - has public CDX files |
11:28
🔗
|
apache2 |
that's consistent with what I observed browsing random collections just now |
11:29
🔗
|
apache2 |
OrIdow6: since you seem to be in the know about IA: I've got ~400 gb WARC archives, is there a good place to contribute them so they eventually end up in IA? |
11:29
🔗
|
apache2 |
(or at least somewhere where other people can use them) |
11:37
🔗
|
OrIdow6 |
My knowledge of this is very limited, and other people will have more direct experience with it (asking in #archiveteam-bs might be a good idea, if you have more questions on this), but I think that unless they're warcs of e.g. nothing but a site that was under your control anyway, they won't be put into the WBM if you're some random untrusted person, lest the IA risk falsification of history |
11:39
🔗
|
OrIdow6 |
That being said, you CAN upload them as regular IA items with mediatype "web", in which case they'll be put (automatically, I think) into the "Outsider WARCs" collection at https://archive.org/details/warczone - I don't think these do anything but sit around at the moment, but they'll be around (hopefully) for people in the future |
11:45
🔗
|
|
davis1 has quit IRC (Ping timeout: 258 seconds) |
11:47
🔗
|
|
davis1 has joined #archiveteam-ot |
11:49
🔗
|
OrIdow6 |
apache2: https://archive.fart.website/bin/irclogger_log/archiveteam-bs?date=2019-11-13,Wed&sel=299#l295 https://archive.fart.website/bin/irclogger_log/archiveteam-bs?date=2019-01-29,Tue&sel=142#l138 |
11:50
🔗
|
apache2 |
thank you |
11:53
🔗
|
OrIdow6 |
You're welcome |
12:34
🔗
|
|
VADemon_ has joined #archiveteam-ot |
12:34
🔗
|
|
VADemon has quit IRC (Read error: Connection reset by peer) |
13:09
🔗
|
|
MaximeleG has joined #archiveteam-ot |
13:31
🔗
|
|
Craigle has joined #archiveteam-ot |
13:32
🔗
|
|
Mateon1 has quit IRC (Remote host closed the connection) |
13:33
🔗
|
|
Mateon1 has joined #archiveteam-ot |
15:14
🔗
|
|
VADemon_ has quit IRC (left4dead) |
16:11
🔗
|
|
nataraj_ has quit IRC (Read error: Operation timed out) |
16:18
🔗
|
|
Panasonic has joined #archiveteam-ot |
16:19
🔗
|
|
Ravenloft has quit IRC (Read error: Connection reset by peer) |
16:21
🔗
|
|
DogsRNice has joined #archiveteam-ot |
18:11
🔗
|
|
igloo259 has joined #archiveteam-ot |
18:14
🔗
|
|
igloo25 has quit IRC (Ping timeout: 745 seconds) |
18:32
🔗
|
|
DogsRNice has quit IRC (Ping timeout: 276 seconds) |
18:42
🔗
|
|
MaximeleG has quit IRC (Quit: MaximeleG) |
19:54
🔗
|
|
schbirid has joined #archiveteam-ot |
20:09
🔗
|
schbirid |
any suggestions for a simple tool to visually track postgres table count(*)s over time? |
20:09
🔗
|
schbirid |
with almost zero setup |
20:28
🔗
|
|
Flashfire has quit IRC (Remote host closed the connection) |
20:28
🔗
|
|
kiska has quit IRC (Remote host closed the connection) |
20:29
🔗
|
|
kiska has joined #archiveteam-ot |
20:29
🔗
|
|
Flashfire has joined #archiveteam-ot |
20:30
🔗
|
|
svchfoo3 sets mode: +o kiska |
20:30
🔗
|
|
svchfoo1 sets mode: +o kiska |
20:43
🔗
|
|
dashcloud has quit IRC (Quit: http://quassel-irc.org - Chat comfortably. Anywhere.) |
20:51
🔗
|
|
dashcloud has joined #archiveteam-ot |
21:02
🔗
|
|
nataraj_ has joined #archiveteam-ot |
21:40
🔗
|
|
schbirid has quit IRC (Quit: Leaving) |
21:40
🔗
|
|
godane has joined #archiveteam-ot |
22:42
🔗
|
|
Jens has quit IRC (Remote host closed the connection) |
22:43
🔗
|
|
Jens has joined #archiveteam-ot |
22:50
🔗
|
|
mal has quit IRC (Quit: mal) |
22:59
🔗
|
|
thuban3 has joined #archiveteam-ot |
23:02
🔗
|
|
thuban2 has quit IRC (Read error: Operation timed out) |
23:02
🔗
|
|
mal has joined #archiveteam-ot |
23:18
🔗
|
|
BlueMax has joined #archiveteam-ot |
23:34
🔗
|
|
DigiDigi has quit IRC (Remote host closed the connection) |
23:38
🔗
|
|
DigiDigi has joined #archiveteam-ot |
23:45
🔗
|
|
icedice has quit IRC (Read error: Operation timed out) |