#archiveteam-bs 2019-06-21,Fri

↑back Search

Time Nickname Message
02:42 🔗 BlueMax has joined #archiveteam-bs
03:18 🔗 BlueMax has quit IRC (Quit: Leaving)
03:43 🔗 odemg has quit IRC (Ping timeout: 265 seconds)
03:47 🔗 qw3rty118 has joined #archiveteam-bs
03:52 🔗 qw3rty117 has quit IRC (Read error: Operation timed out)
03:55 🔗 odemg has joined #archiveteam-bs
04:00 🔗 godane so i found this : http://www.shasej.org/gakkaishi/archive/archive.asp?Yers=7
04:24 🔗 Sokar has quit IRC (Ping timeout: 615 seconds)
04:26 🔗 BlueMax has joined #archiveteam-bs
04:37 🔗 Sokar has joined #archiveteam-bs
05:01 🔗 godane has quit IRC (Read error: Operation timed out)
05:20 🔗 godane has joined #archiveteam-bs
05:49 🔗 m007a83 has quit IRC (Quit: Fuck you Comcast)
05:52 🔗 Mateon1 has quit IRC (Read error: Operation timed out)
05:54 🔗 Mateon1 has joined #archiveteam-bs
05:59 🔗 wyatt8740 has quit IRC (Read error: Operation timed out)
06:08 🔗 godane SketchCow: something you may like : https://archive.org/details/Virus_Bulletin-1989-07
06:09 🔗 godane i couldn't find it on archive.org so i'm uploading
06:51 🔗 eientei95 has quit IRC (Quit: ZNC 1.7.2+deb2 - https://znc.in)
07:02 🔗 susudo has joined #archiveteam-bs
07:19 🔗 susudo has quit IRC (Quit: Page closed)
09:02 🔗 deevious has quit IRC (Read error: Connection reset by peer)
09:02 🔗 deevious has joined #archiveteam-bs
09:06 🔗 JAA Sanqui: What's your goal exactly? (from #archiveteam)
09:21 🔗 martinlig has joined #archiveteam-bs
09:28 🔗 Sanqui JAA: Take a website archived with ArchiveBot, and get a list of all *.wz.cz domains from it. For example.
09:32 🔗 JAA Sanqui: As long as the job was run without --no-offsite-links, i.e. those domains were retrieved, I'd use either the meta WARC or IA's CDX. Both are fairly easy to parse with grep and/or awk. If the URLs were ignored, only the meta WARC will work.
09:33 🔗 Igloo What is the difference between the CDX and meta?
09:33 🔗 JAA If the URLs were hard-ignored by --no-offsite-links, --no-parent, or some other wpull option, then the only way would be to parse the data WARC. Have fun with that...
09:33 🔗 Igloo I have a problem with newsgrabber megawarcs, sometimes they don#'t get a CDX
09:33 🔗 JAA Igloo: The CDX is an index of all response records in the WARC. The meta WARC contains the wpull log.
09:34 🔗 JAA The CDX files are generated by the IA derive after upload. Each WARC in an item with mediatype:web gets one CDX, plus there's one item-wide index.
09:35 🔗 Igloo ok, I wonder why some don't get that
09:35 🔗 Igloo I need to re-write the dedupe anyway
09:36 🔗 JAA Have an example?
09:38 🔗 Igloo I do but not right now :-)
09:38 🔗 Igloo Poolside baby
09:40 🔗 JAA Ah, right, enjoy it!
09:48 🔗 BlueMax has quit IRC (Read error: Connection reset by peer)
10:54 🔗 eientei95 has joined #archiveteam-bs
10:54 🔗 eientei95 has quit IRC (Handshake flooding)
10:56 🔗 eientei95 has joined #archiveteam-bs
10:56 🔗 eientei95 has quit IRC (Handshake flooding)
10:58 🔗 eientei95 has joined #archiveteam-bs
11:11 🔗 dxrt SketchCow: Those ground zero photos of yours on IA/a non-flickr download?
11:13 🔗 Flashfire Another Ban Wave happened on Reddit
11:14 🔗 dxrt Also someone uploaded https://archive.org/details/WTC-ISO - back story https://www.reddit.com/r/DataHoarder/comments/c2vi3b/2389_never_before_seen_photos_of_ground_zero_in/ern2es1/
11:21 🔗 wyatt8740 has joined #archiveteam-bs
12:02 🔗 deevious has quit IRC (Quit: deevious)
12:11 🔗 martinlig has quit IRC (Quit: Connection closed for inactivity)
12:42 🔗 ColdIce has quit IRC (Quit: The Lounge - https://thelounge.chat)
13:45 🔗 Tenebrae has quit IRC (Remote host closed the connection)
13:47 🔗 Tenebrae has joined #archiveteam-bs
14:25 🔗 DogsRNice has joined #archiveteam-bs
15:11 🔗 zhongfu_ has joined #archiveteam-bs
15:17 🔗 zhongfu has quit IRC (Ping timeout: 615 seconds)
15:24 🔗 Fusl https://www.mendeley.com/campaign/about-climate-change someone here wanna go ahead and write a script for grabbing those?
16:18 🔗 atbk has quit IRC (Quit: ZNC - https://znc.in)
16:19 🔗 atbk has joined #archiveteam-bs
16:27 🔗 odemgi_ has quit IRC (Remote host closed the connection)
16:50 🔗 Dj-Wawa has quit IRC (Quit: Connection closed for inactivity)
17:15 🔗 zhongfu_ has quit IRC (Read error: Connection reset by peer)
17:18 🔗 zhongfu has joined #archiveteam-bs
17:37 🔗 Mateon1 has quit IRC (Remote host closed the connection)
17:37 🔗 Mateon1 has joined #archiveteam-bs
18:00 🔗 VADemon has joined #archiveteam-bs
18:20 🔗 Mateon1 has quit IRC (Read error: Operation timed out)
18:20 🔗 Mateon1 has joined #archiveteam-bs
19:02 🔗 SketchCow Anyone want to take a shot at grabbing the video file?
19:02 🔗 SketchCow https://commerce.veritone.com/search/asset/18501328
19:02 🔗 Fusl commerce.veritone.com’s server IP address could not be found.
19:03 🔗 Kaz https://cdnt3mt-a.akamaihd.net/2B1/0DA/FF2/2B10DAFF2_001_xp-wmte.f4v?__gda__=1561158164_f675578419cb54dbe4a01ccaf9ec14de
19:04 🔗 Kaz Fusl: https://www.irccloud.com/pastebin/lV0VJguR/
19:05 🔗 Fusl Kaz: http://xor.meo.ws/cZqg5z86rfZWsxnGvuWhvOj-w47n5nJY.txt
19:06 🔗 Kaz either CF is wrong, or everyone else is
19:06 🔗 Kaz hell I actually don't remember who i'm forwarding to atm
19:09 🔗 SketchCow Kaz: Good one, got it
19:10 🔗 Fusl Kaz: veritone.com. requires DNSSEC signing but commerce.veritone.com. points with a CNAME to commerce.pd.dmh.wzplatform.com. which is not DNSSEC signed, cloudflare is correct and everyone else is wrong here
19:11 🔗 Fusl https://dnssec-analyzer.verisignlabs.com/commerce.veritone.com
19:15 🔗 Kaz Fusl: guess I'm forwarding to google then
19:38 🔗 CoolCanuk has joined #archiveteam-bs
20:27 🔗 DogsRNice has quit IRC (Read error: Connection reset by peer)
20:31 🔗 killsushi has joined #archiveteam-bs
20:34 🔗 qw3rty119 has joined #archiveteam-bs
20:39 🔗 qw3rty118 has quit IRC (Ping timeout: 600 seconds)
20:50 🔗 qw3rty119 has quit IRC (Nettalk6 - www.ntalk.de)
20:59 🔗 fredgido has quit IRC (Read error: Connection reset by peer)
20:59 🔗 fredgido has joined #archiveteam-bs
21:00 🔗 h3ndr1k Fusl: I am curious what is up with your /bd/0x5000c... paths. Which filesystem generates such a structure?
21:06 🔗 Kaz mergerfs iirc
21:31 🔗 CoolCanuk would be nice if archivebot automatically knew "oh, it's a twitter/fb url, let me assign an ignoreset"
21:32 🔗 Igloo Sometimes we don't need the ignoreset though
21:32 🔗 Igloo s/need/want
21:33 🔗 CoolCanuk oh ok
22:06 🔗 godane SketchCow : 90s rave & jungle cassetes tapes : http://artmeetsscience.co.uk/tapes/
22:13 🔗 Atom__ has joined #archiveteam-bs
22:19 🔗 Atom-- has quit IRC (Read error: Operation timed out)
22:26 🔗 fredgido has quit IRC (Remote host closed the connection)
22:26 🔗 Fionera has quit IRC (Read error: Connection reset by peer)
22:26 🔗 Fusl_ h3ndr1k: /bd/ means backing device. its just an ext4 mounted with the wwn-id of each disk and partition from /dev/disk/by-id/ and mergerfs uses that to stripe it into /data
22:26 🔗 yano has quit IRC (Read error: Connection reset by peer)
22:27 🔗 Fionera has joined #archiveteam-bs
22:28 🔗 fredgido has joined #archiveteam-bs
22:28 🔗 yano has joined #archiveteam-bs
22:28 🔗 TigerbotH has quit IRC (Read error: Connection reset by peer)
22:29 🔗 PotcFdk has quit IRC (Ping timeout: 600 seconds)
22:30 🔗 chungone_ has joined #archiveteam-bs
22:31 🔗 jspiros_ has quit IRC (Read error: Operation timed out)
22:31 🔗 paul2520 has quit IRC (Write error: Broken pipe)
22:31 🔗 nightpoo- has quit IRC (Write error: Broken pipe)
22:31 🔗 ndiddy has quit IRC (Write error: Broken pipe)
22:31 🔗 nightpool has joined #archiveteam-bs
22:31 🔗 sep332 has quit IRC (Read error: Operation timed out)
22:39 🔗 logchfoo3 starts logging #archiveteam-bs at Fri Jun 21 22:39:21 2019
22:39 🔗 logchfoo3 has joined #archiveteam-bs
22:39 🔗 abstract has joined #archiveteam-bs
22:39 🔗 jodizzle has quit IRC (Ping timeout: 246 seconds)
22:39 🔗 Coderjo_ has quit IRC (Ping timeout: 600 seconds)
22:39 🔗 nothere has quit IRC (Ping timeout: 600 seconds)
22:39 🔗 betamax_ has joined #archiveteam-bs
22:39 🔗 anarcat has quit IRC (Ping timeout: 600 seconds)
22:39 🔗 ivan has quit IRC (Read error: Operation timed out)
22:41 🔗 squires has quit IRC (Ping timeout: 600 seconds)
22:41 🔗 asdf0101 has quit IRC (Read error: Operation timed out)
22:41 🔗 fredgido has quit IRC (Read error: Operation timed out)
22:41 🔗 step has quit IRC (Ping timeout: 600 seconds)
22:43 🔗 betamax has quit IRC (Read error: Operation timed out)
22:44 🔗 arkiver has joined #archiveteam-bs
22:45 🔗 mistym has joined #archiveteam-bs
22:45 🔗 dxrt_ has quit IRC (Ping timeout: 600 seconds)
22:47 🔗 Fusl has joined #archiveteam-bs
22:47 🔗 Fusl_ sets mode: +o Fusl
22:48 🔗 LordNigh2 has joined #archiveteam-bs
22:48 🔗 joepie91 has joined #archiveteam-bs
22:50 🔗 Lord_Nigh has quit IRC (Read error: Operation timed out)
22:50 🔗 LordNigh2 is now known as Lord_Nigh
22:50 🔗 anarcat has joined #archiveteam-bs
22:51 🔗 jodizzle has joined #archiveteam-bs
22:54 🔗 Zebranky has joined #archiveteam-bs
22:55 🔗 h3ndr1k Fusl: Thanks, interesting. I would have guessed some cluster file system, because it look quite complicated. But then I'm working with ceph and it never looks that complicated, its just a single mount point for one cephfs. But maybe some crazy file system :)
22:56 🔗 h3ndr1k Never worked with mergerfs though
23:01 🔗 kode54 has quit IRC (Quit: The Lounge - https://thelounge.chat)
23:03 🔗 kiska1 has joined #archiveteam-bs
23:03 🔗 kode54 has joined #archiveteam-bs
23:03 🔗 svchfoo3 sets mode: +o kiska1
23:04 🔗 paul2520 has joined #archiveteam-bs
23:06 🔗 mr_archiv has joined #archiveteam-bs
23:08 🔗 GDorn__ has joined #archiveteam-bs
23:16 🔗 BlueMax has joined #archiveteam-bs
23:36 🔗 c4rc4s has joined #archiveteam-bs
23:36 🔗 svchfoo1 has joined #archiveteam-bs
23:36 🔗 Fusl sets mode: +o svchfoo1
23:37 🔗 ivan has joined #archiveteam-bs
23:37 🔗 simon816 has joined #archiveteam-bs
23:37 🔗 Fusl sets mode: +o ivan
23:37 🔗 asdf0101 has joined #archiveteam-bs
23:37 🔗 markedL has joined #archiveteam-bs
23:37 🔗 cfarquhar has joined #archiveteam-bs
23:37 🔗 JAA has joined #archiveteam-bs
23:37 🔗 Fusl sets mode: +o JAA
23:37 🔗 AlsoJAA sets mode: +o JAA
23:58 🔗 CoolCanuk has quit IRC (Quit: Connection closed for inactivity)

irclogger-viewer