#archiveteam-bs 2018-09-26,Wed

↑back Search

Time Nickname Message
00:00 🔗 closure has joined #archiveteam-bs
00:47 🔗 godane SketchCow: any news?
01:01 🔗 closure has quit IRC (Read error: Connection reset by peer)
01:04 🔗 closure has joined #archiveteam-bs
01:26 🔗 m007a83 has joined #archiveteam-bs
01:44 🔗 godane so i'm looking thur the news archives of ina.fr
01:46 🔗 godane i think its a mix of different news sources with there archive or at least from what i can tell with journal du page here: http://www.ina.fr/journal-anniversaire
01:48 🔗 godane if its mix then i will have do it based what i get from them and put ids as ina-journal-du-${y}-${m}-${d} with original file name most likely put into a text file
01:48 🔗 godane only say that cause of french chars may cause some problems with uploading item
01:58 🔗 closure has quit IRC (Read error: Operation timed out)
02:01 🔗 closure has joined #archiveteam-bs
02:28 🔗 godane the video file will be named will be put into a text file or maybe the id page will be put into a text file
02:29 🔗 godane i say that cause i did put --add-metadata to all of these videos using youtube-dl so full original name is metadata of video
03:00 🔗 closure has quit IRC (Ping timeout: 252 seconds)
03:01 🔗 closure has joined #archiveteam-bs
03:30 🔗 bitBaron has quit IRC (Quit: Bye.)
03:36 🔗 kyounko why do websites force you to interact with javascript to load site data?
03:42 🔗 archodg_ has joined #archiveteam-bs
03:43 🔗 adinbied moufu, are you still around? I'm unsure what you meant by printing the url in the download_child_p callback
03:44 🔗 odemg has quit IRC (Ping timeout: 246 seconds)
03:44 🔗 archodg__ has quit IRC (Ping timeout: 252 seconds)
03:48 🔗 ivan kyounko: reducing initial page render time (whether intentionally or because of bad architecture), javascript-first developers, getting better information on what people are actually reading
03:50 🔗 ivan the javascript developer rationale goes something like "we'll need some of these dynamic loading features anyway, might as well require javascript to avoid duplicating this rendering logic on the server"
03:58 🔗 odemg has joined #archiveteam-bs
03:59 🔗 closure has quit IRC (Read error: Operation timed out)
04:04 🔗 closure has joined #archiveteam-bs
05:00 🔗 closure has quit IRC (Read error: Connection reset by peer)
05:00 🔗 closure_ has joined #archiveteam-bs
05:28 🔗 Odd0002_ has joined #archiveteam-bs
05:33 🔗 Odd0002 has quit IRC (Read error: Operation timed out)
05:33 🔗 Odd0002_ is now known as Odd0002
06:00 🔗 closure_ has quit IRC (Read error: Connection reset by peer)
08:04 🔗 kyounko ivan: would it be insane to do things like the 1995 web? that still works
08:05 🔗 kyounko i haven't been to craigslist in years, but in 2014 it seemed "ancient"
09:32 🔗 BlueMax has quit IRC (Quit: Leaving)
11:00 🔗 REiN^ has joined #archiveteam-bs
11:05 🔗 plue has quit IRC (Quit: leaving)
11:06 🔗 plue has joined #archiveteam-bs
12:15 🔗 Mateon1 has quit IRC (Ping timeout: 633 seconds)
12:18 🔗 Mateon1 has joined #archiveteam-bs
12:36 🔗 moufu adinbied: https://github.com/alard/wget-lua/wiki/Wget-with-Lua-hooks#download_child_p
12:37 🔗 moufu adinbied: something like if not verdict then io.stdout:write(urlpos["url"]["url"].." rejected ("..reason..")\n"); io.stdout:flush() end might work, haven't tested though
13:25 🔗 closure has joined #archiveteam-bs
14:00 🔗 closure has quit IRC (Read error: Connection reset by peer)
14:05 🔗 closure has joined #archiveteam-bs
14:25 🔗 joepie91_ kyounko: there's a whole new generation of 'web developers' who genuinely do not understand what a browser can do out of the box, and who believe that writing 'frontend JS' is the only way to build a website today
14:25 🔗 joepie91_ and that is not an exaggeration
14:25 🔗 joepie91_ I have to talk people off this ledge on an almost-daily basis
14:25 🔗 joepie91_ it's fucking infuriating
14:26 🔗 kiska It makes archiving... _difficult_
14:26 🔗 joepie91_ not just that
14:26 🔗 joepie91_ it makes everything worse
14:26 🔗 joepie91_ contrary to popular belief, such JS-heavy sites are actually *considerably* slower than plain HTML and CSS with forms and links
14:26 🔗 joepie91_ because you're breaking half the browser's optimizations
14:27 🔗 joepie91_ it also typically breaks the browser's standard behaviour and controls (and why wouldn't it? new generation of web devs doesn't even know they *exist*)
14:27 🔗 joepie91_ effectively reimplementing half a browser client-side, poorly
15:00 🔗 closure has quit IRC (Ping timeout: 252 seconds)
15:01 🔗 closure has joined #archiveteam-bs
15:06 🔗 plue has quit IRC (Quit: leaving)
15:07 🔗 plue has joined #archiveteam-bs
15:42 🔗 Sanky is now known as Sanqui
16:00 🔗 closure has quit IRC (Ping timeout: 268 seconds)
16:18 🔗 closure has joined #archiveteam-bs
16:32 🔗 adinbied I'm still quite unfamiliar with Lua - I'm still wanting to learn more at some point, but for the moment, moufu, would you be able to create some sort of implementation?
17:05 🔗 closure has quit IRC (Read error: Connection reset by peer)
17:08 🔗 closure_ has joined #archiveteam-bs
17:48 🔗 moufu adinbied: https://0x0.st/s36_.diff
17:53 🔗 plue has quit IRC (Quit: leaving)
17:59 🔗 closure_ has quit IRC (Read error: Connection reset by peer)
18:23 🔗 wp494 has quit IRC (Read error: Operation timed out)
18:24 🔗 wp494 has joined #archiveteam-bs
18:45 🔗 adinbied moufu, that didn't do it either - it's still not grabbing images/site resources
18:49 🔗 closure has joined #archiveteam-bs
18:53 🔗 schbirid has joined #archiveteam-bs
18:59 🔗 closure has quit IRC (Ping timeout: 246 seconds)
19:01 🔗 moufu what site are you testing it on?
19:03 🔗 moufu it seems to work when I run wget-lua with the arguments from pipeline.py manually (not sure how to test the pipeline itself since it returns tracker error) on some random sites found on google
19:06 🔗 plue has joined #archiveteam-bs
19:06 🔗 adinbied So I've got a tracker dev-env VM running with the items as defined here: https://github.com/adinbied/angelfire-items
19:08 🔗 adinbied Maybe it's something to do with the way the items and queue are being handled in my particular case?
19:15 🔗 tuluu_ has quit IRC (Read error: Connection refused)
19:16 🔗 tuluu has joined #archiveteam-bs
19:17 🔗 moufu okay I can reproduce it when wrabbing from sitemap.xml
19:19 🔗 moufu oh I might know what the problem is
19:20 🔗 moufu yeah works now, forgot to pass link_expect_html
19:20 🔗 moufu https://0x0.st/s3IO.diff
19:21 🔗 moufu adinbied: ↑
19:22 🔗 adinbied Aha!
19:22 🔗 adinbied Thank you so much! I'm still learning as I go, and was getting hung up on why it just wouldn't work
19:23 🔗 adinbied Thanks again for the help moufu! It's greatly appreciated!
19:25 🔗 moufu np, I should've remembered html pages don't get parsed without that option
19:46 🔗 closure_ has joined #archiveteam-bs
20:01 🔗 closure_ has quit IRC (Read error: Connection reset by peer)
20:03 🔗 schbirid has quit IRC (Remote host closed the connection)
21:56 🔗 closure has joined #archiveteam-bs
22:01 🔗 closure has quit IRC (Read error: Connection reset by peer)
22:01 🔗 closure_ has joined #archiveteam-bs
22:04 🔗 zhongfu has quit IRC (Quit: No Ping reply in 180 seconds.)
22:05 🔗 zhongfu has joined #archiveteam-bs
22:18 🔗 BlueMax has joined #archiveteam-bs
22:32 🔗 closure_ has quit IRC (Read error: Connection reset by peer)
22:34 🔗 closure has joined #archiveteam-bs
22:37 🔗 Stiletto has quit IRC (Read error: Operation timed out)
23:00 🔗 closure_ has joined #archiveteam-bs
23:00 🔗 closure has quit IRC (Read error: Connection reset by peer)
23:33 🔗 closure_ has quit IRC (Read error: Connection reset by peer)
23:38 🔗 closure has joined #archiveteam-bs

irclogger-viewer