Time |
Nickname |
Message |
02:42
🔗
|
|
BlueMax has joined #archiveteam-bs |
03:18
🔗
|
|
BlueMax has quit IRC (Quit: Leaving) |
03:43
🔗
|
|
odemg has quit IRC (Ping timeout: 265 seconds) |
03:47
🔗
|
|
qw3rty118 has joined #archiveteam-bs |
03:52
🔗
|
|
qw3rty117 has quit IRC (Read error: Operation timed out) |
03:55
🔗
|
|
odemg has joined #archiveteam-bs |
04:00
🔗
|
godane |
so i found this : http://www.shasej.org/gakkaishi/archive/archive.asp?Yers=7 |
04:24
🔗
|
|
Sokar has quit IRC (Ping timeout: 615 seconds) |
04:26
🔗
|
|
BlueMax has joined #archiveteam-bs |
04:37
🔗
|
|
Sokar has joined #archiveteam-bs |
05:01
🔗
|
|
godane has quit IRC (Read error: Operation timed out) |
05:20
🔗
|
|
godane has joined #archiveteam-bs |
05:49
🔗
|
|
m007a83 has quit IRC (Quit: Fuck you Comcast) |
05:52
🔗
|
|
Mateon1 has quit IRC (Read error: Operation timed out) |
05:54
🔗
|
|
Mateon1 has joined #archiveteam-bs |
05:59
🔗
|
|
wyatt8740 has quit IRC (Read error: Operation timed out) |
06:08
🔗
|
godane |
SketchCow: something you may like : https://archive.org/details/Virus_Bulletin-1989-07 |
06:09
🔗
|
godane |
i couldn't find it on archive.org so i'm uploading |
06:51
🔗
|
|
eientei95 has quit IRC (Quit: ZNC 1.7.2+deb2 - https://znc.in) |
07:02
🔗
|
|
susudo has joined #archiveteam-bs |
07:19
🔗
|
|
susudo has quit IRC (Quit: Page closed) |
09:02
🔗
|
|
deevious has quit IRC (Read error: Connection reset by peer) |
09:02
🔗
|
|
deevious has joined #archiveteam-bs |
09:06
🔗
|
JAA |
Sanqui: What's your goal exactly? (from #archiveteam) |
09:21
🔗
|
|
martinlig has joined #archiveteam-bs |
09:28
🔗
|
Sanqui |
JAA: Take a website archived with ArchiveBot, and get a list of all *.wz.cz domains from it. For example. |
09:32
🔗
|
JAA |
Sanqui: As long as the job was run without --no-offsite-links, i.e. those domains were retrieved, I'd use either the meta WARC or IA's CDX. Both are fairly easy to parse with grep and/or awk. If the URLs were ignored, only the meta WARC will work. |
09:33
🔗
|
Igloo |
What is the difference between the CDX and meta? |
09:33
🔗
|
JAA |
If the URLs were hard-ignored by --no-offsite-links, --no-parent, or some other wpull option, then the only way would be to parse the data WARC. Have fun with that... |
09:33
🔗
|
Igloo |
I have a problem with newsgrabber megawarcs, sometimes they don#'t get a CDX |
09:33
🔗
|
JAA |
Igloo: The CDX is an index of all response records in the WARC. The meta WARC contains the wpull log. |
09:34
🔗
|
JAA |
The CDX files are generated by the IA derive after upload. Each WARC in an item with mediatype:web gets one CDX, plus there's one item-wide index. |
09:35
🔗
|
Igloo |
ok, I wonder why some don't get that |
09:35
🔗
|
Igloo |
I need to re-write the dedupe anyway |
09:36
🔗
|
JAA |
Have an example? |
09:38
🔗
|
Igloo |
I do but not right now :-) |
09:38
🔗
|
Igloo |
Poolside baby |
09:40
🔗
|
JAA |
Ah, right, enjoy it! |
09:48
🔗
|
|
BlueMax has quit IRC (Read error: Connection reset by peer) |
10:54
🔗
|
|
eientei95 has joined #archiveteam-bs |
10:54
🔗
|
|
eientei95 has quit IRC (Handshake flooding) |
10:56
🔗
|
|
eientei95 has joined #archiveteam-bs |
10:56
🔗
|
|
eientei95 has quit IRC (Handshake flooding) |
10:58
🔗
|
|
eientei95 has joined #archiveteam-bs |
11:11
🔗
|
dxrt |
SketchCow: Those ground zero photos of yours on IA/a non-flickr download? |
11:13
🔗
|
Flashfire |
Another Ban Wave happened on Reddit |
11:14
🔗
|
dxrt |
Also someone uploaded https://archive.org/details/WTC-ISO - back story https://www.reddit.com/r/DataHoarder/comments/c2vi3b/2389_never_before_seen_photos_of_ground_zero_in/ern2es1/ |
11:21
🔗
|
|
wyatt8740 has joined #archiveteam-bs |
12:02
🔗
|
|
deevious has quit IRC (Quit: deevious) |
12:11
🔗
|
|
martinlig has quit IRC (Quit: Connection closed for inactivity) |
12:42
🔗
|
|
ColdIce has quit IRC (Quit: The Lounge - https://thelounge.chat) |
13:45
🔗
|
|
Tenebrae has quit IRC (Remote host closed the connection) |
13:47
🔗
|
|
Tenebrae has joined #archiveteam-bs |
14:25
🔗
|
|
DogsRNice has joined #archiveteam-bs |
15:11
🔗
|
|
zhongfu_ has joined #archiveteam-bs |
15:17
🔗
|
|
zhongfu has quit IRC (Ping timeout: 615 seconds) |
15:24
🔗
|
Fusl |
https://www.mendeley.com/campaign/about-climate-change someone here wanna go ahead and write a script for grabbing those? |
16:18
🔗
|
|
atbk has quit IRC (Quit: ZNC - https://znc.in) |
16:19
🔗
|
|
atbk has joined #archiveteam-bs |
16:27
🔗
|
|
odemgi_ has quit IRC (Remote host closed the connection) |
16:50
🔗
|
|
Dj-Wawa has quit IRC (Quit: Connection closed for inactivity) |
17:15
🔗
|
|
zhongfu_ has quit IRC (Read error: Connection reset by peer) |
17:18
🔗
|
|
zhongfu has joined #archiveteam-bs |
17:37
🔗
|
|
Mateon1 has quit IRC (Remote host closed the connection) |
17:37
🔗
|
|
Mateon1 has joined #archiveteam-bs |
18:00
🔗
|
|
VADemon has joined #archiveteam-bs |
18:20
🔗
|
|
Mateon1 has quit IRC (Read error: Operation timed out) |
18:20
🔗
|
|
Mateon1 has joined #archiveteam-bs |
19:02
🔗
|
SketchCow |
Anyone want to take a shot at grabbing the video file? |
19:02
🔗
|
SketchCow |
https://commerce.veritone.com/search/asset/18501328 |
19:02
🔗
|
Fusl |
commerce.veritone.com’s server IP address could not be found. |
19:03
🔗
|
Kaz |
https://cdnt3mt-a.akamaihd.net/2B1/0DA/FF2/2B10DAFF2_001_xp-wmte.f4v?__gda__=1561158164_f675578419cb54dbe4a01ccaf9ec14de |
19:04
🔗
|
Kaz |
Fusl: https://www.irccloud.com/pastebin/lV0VJguR/ |
19:05
🔗
|
Fusl |
Kaz: http://xor.meo.ws/cZqg5z86rfZWsxnGvuWhvOj-w47n5nJY.txt |
19:06
🔗
|
Kaz |
either CF is wrong, or everyone else is |
19:06
🔗
|
Kaz |
hell I actually don't remember who i'm forwarding to atm |
19:09
🔗
|
SketchCow |
Kaz: Good one, got it |
19:10
🔗
|
Fusl |
Kaz: veritone.com. requires DNSSEC signing but commerce.veritone.com. points with a CNAME to commerce.pd.dmh.wzplatform.com. which is not DNSSEC signed, cloudflare is correct and everyone else is wrong here |
19:11
🔗
|
Fusl |
https://dnssec-analyzer.verisignlabs.com/commerce.veritone.com |
19:15
🔗
|
Kaz |
Fusl: guess I'm forwarding to google then |
19:38
🔗
|
|
CoolCanuk has joined #archiveteam-bs |
20:27
🔗
|
|
DogsRNice has quit IRC (Read error: Connection reset by peer) |
20:31
🔗
|
|
killsushi has joined #archiveteam-bs |
20:34
🔗
|
|
qw3rty119 has joined #archiveteam-bs |
20:39
🔗
|
|
qw3rty118 has quit IRC (Ping timeout: 600 seconds) |
20:50
🔗
|
|
qw3rty119 has quit IRC (Nettalk6 - www.ntalk.de) |
20:59
🔗
|
|
fredgido has quit IRC (Read error: Connection reset by peer) |
20:59
🔗
|
|
fredgido has joined #archiveteam-bs |
21:00
🔗
|
h3ndr1k |
Fusl: I am curious what is up with your /bd/0x5000c... paths. Which filesystem generates such a structure? |
21:06
🔗
|
Kaz |
mergerfs iirc |
21:31
🔗
|
CoolCanuk |
would be nice if archivebot automatically knew "oh, it's a twitter/fb url, let me assign an ignoreset" |
21:32
🔗
|
Igloo |
Sometimes we don't need the ignoreset though |
21:32
🔗
|
Igloo |
s/need/want |
21:33
🔗
|
CoolCanuk |
oh ok |
22:06
🔗
|
godane |
SketchCow : 90s rave & jungle cassetes tapes : http://artmeetsscience.co.uk/tapes/ |
22:13
🔗
|
|
Atom__ has joined #archiveteam-bs |
22:19
🔗
|
|
Atom-- has quit IRC (Read error: Operation timed out) |
22:26
🔗
|
|
fredgido has quit IRC (Remote host closed the connection) |
22:26
🔗
|
|
Fionera has quit IRC (Read error: Connection reset by peer) |
22:26
🔗
|
Fusl_ |
h3ndr1k: /bd/ means backing device. its just an ext4 mounted with the wwn-id of each disk and partition from /dev/disk/by-id/ and mergerfs uses that to stripe it into /data |
22:26
🔗
|
|
yano has quit IRC (Read error: Connection reset by peer) |
22:27
🔗
|
|
Fionera has joined #archiveteam-bs |
22:28
🔗
|
|
fredgido has joined #archiveteam-bs |
22:28
🔗
|
|
yano has joined #archiveteam-bs |
22:28
🔗
|
|
TigerbotH has quit IRC (Read error: Connection reset by peer) |
22:29
🔗
|
|
PotcFdk has quit IRC (Ping timeout: 600 seconds) |
22:30
🔗
|
|
chungone_ has joined #archiveteam-bs |
22:31
🔗
|
|
jspiros_ has quit IRC (Read error: Operation timed out) |
22:31
🔗
|
|
paul2520 has quit IRC (Write error: Broken pipe) |
22:31
🔗
|
|
nightpoo- has quit IRC (Write error: Broken pipe) |
22:31
🔗
|
|
ndiddy has quit IRC (Write error: Broken pipe) |
22:31
🔗
|
|
nightpool has joined #archiveteam-bs |
22:31
🔗
|
|
sep332 has quit IRC (Read error: Operation timed out) |
22:39
🔗
|
|
logchfoo3 starts logging #archiveteam-bs at Fri Jun 21 22:39:21 2019 |
22:39
🔗
|
|
logchfoo3 has joined #archiveteam-bs |
22:39
🔗
|
|
abstract has joined #archiveteam-bs |
22:39
🔗
|
|
jodizzle has quit IRC (Ping timeout: 246 seconds) |
22:39
🔗
|
|
Coderjo_ has quit IRC (Ping timeout: 600 seconds) |
22:39
🔗
|
|
nothere has quit IRC (Ping timeout: 600 seconds) |
22:39
🔗
|
|
betamax_ has joined #archiveteam-bs |
22:39
🔗
|
|
anarcat has quit IRC (Ping timeout: 600 seconds) |
22:39
🔗
|
|
ivan has quit IRC (Read error: Operation timed out) |
22:41
🔗
|
|
squires has quit IRC (Ping timeout: 600 seconds) |
22:41
🔗
|
|
asdf0101 has quit IRC (Read error: Operation timed out) |
22:41
🔗
|
|
fredgido has quit IRC (Read error: Operation timed out) |
22:41
🔗
|
|
step has quit IRC (Ping timeout: 600 seconds) |
22:43
🔗
|
|
betamax has quit IRC (Read error: Operation timed out) |
22:44
🔗
|
|
arkiver has joined #archiveteam-bs |
22:45
🔗
|
|
mistym has joined #archiveteam-bs |
22:45
🔗
|
|
dxrt_ has quit IRC (Ping timeout: 600 seconds) |
22:47
🔗
|
|
Fusl has joined #archiveteam-bs |
22:47
🔗
|
|
Fusl_ sets mode: +o Fusl |
22:48
🔗
|
|
LordNigh2 has joined #archiveteam-bs |
22:48
🔗
|
|
joepie91 has joined #archiveteam-bs |
22:50
🔗
|
|
Lord_Nigh has quit IRC (Read error: Operation timed out) |
22:50
🔗
|
|
LordNigh2 is now known as Lord_Nigh |
22:50
🔗
|
|
anarcat has joined #archiveteam-bs |
22:51
🔗
|
|
jodizzle has joined #archiveteam-bs |
22:54
🔗
|
|
Zebranky has joined #archiveteam-bs |
22:55
🔗
|
h3ndr1k |
Fusl: Thanks, interesting. I would have guessed some cluster file system, because it look quite complicated. But then I'm working with ceph and it never looks that complicated, its just a single mount point for one cephfs. But maybe some crazy file system :) |
22:56
🔗
|
h3ndr1k |
Never worked with mergerfs though |
23:01
🔗
|
|
kode54 has quit IRC (Quit: The Lounge - https://thelounge.chat) |
23:03
🔗
|
|
kiska1 has joined #archiveteam-bs |
23:03
🔗
|
|
kode54 has joined #archiveteam-bs |
23:03
🔗
|
|
svchfoo3 sets mode: +o kiska1 |
23:04
🔗
|
|
paul2520 has joined #archiveteam-bs |
23:06
🔗
|
|
mr_archiv has joined #archiveteam-bs |
23:08
🔗
|
|
GDorn__ has joined #archiveteam-bs |
23:16
🔗
|
|
BlueMax has joined #archiveteam-bs |
23:36
🔗
|
|
c4rc4s has joined #archiveteam-bs |
23:36
🔗
|
|
svchfoo1 has joined #archiveteam-bs |
23:36
🔗
|
|
Fusl sets mode: +o svchfoo1 |
23:37
🔗
|
|
ivan has joined #archiveteam-bs |
23:37
🔗
|
|
simon816 has joined #archiveteam-bs |
23:37
🔗
|
|
Fusl sets mode: +o ivan |
23:37
🔗
|
|
asdf0101 has joined #archiveteam-bs |
23:37
🔗
|
|
markedL has joined #archiveteam-bs |
23:37
🔗
|
|
cfarquhar has joined #archiveteam-bs |
23:37
🔗
|
|
JAA has joined #archiveteam-bs |
23:37
🔗
|
|
Fusl sets mode: +o JAA |
23:37
🔗
|
|
AlsoJAA sets mode: +o JAA |
23:58
🔗
|
|
CoolCanuk has quit IRC (Quit: Connection closed for inactivity) |