Time |
Nickname |
Message |
00:00
🔗
|
|
closure has joined #archiveteam-bs |
00:47
🔗
|
godane |
SketchCow: any news? |
01:01
🔗
|
|
closure has quit IRC (Read error: Connection reset by peer) |
01:04
🔗
|
|
closure has joined #archiveteam-bs |
01:26
🔗
|
|
m007a83 has joined #archiveteam-bs |
01:44
🔗
|
godane |
so i'm looking thur the news archives of ina.fr |
01:46
🔗
|
godane |
i think its a mix of different news sources with there archive or at least from what i can tell with journal du page here: http://www.ina.fr/journal-anniversaire |
01:48
🔗
|
godane |
if its mix then i will have do it based what i get from them and put ids as ina-journal-du-${y}-${m}-${d} with original file name most likely put into a text file |
01:48
🔗
|
godane |
only say that cause of french chars may cause some problems with uploading item |
01:58
🔗
|
|
closure has quit IRC (Read error: Operation timed out) |
02:01
🔗
|
|
closure has joined #archiveteam-bs |
02:28
🔗
|
godane |
the video file will be named will be put into a text file or maybe the id page will be put into a text file |
02:29
🔗
|
godane |
i say that cause i did put --add-metadata to all of these videos using youtube-dl so full original name is metadata of video |
03:00
🔗
|
|
closure has quit IRC (Ping timeout: 252 seconds) |
03:01
🔗
|
|
closure has joined #archiveteam-bs |
03:30
🔗
|
|
bitBaron has quit IRC (Quit: Bye.) |
03:36
🔗
|
kyounko |
why do websites force you to interact with javascript to load site data? |
03:42
🔗
|
|
archodg_ has joined #archiveteam-bs |
03:43
🔗
|
adinbied |
moufu, are you still around? I'm unsure what you meant by printing the url in the download_child_p callback |
03:44
🔗
|
|
odemg has quit IRC (Ping timeout: 246 seconds) |
03:44
🔗
|
|
archodg__ has quit IRC (Ping timeout: 252 seconds) |
03:48
🔗
|
ivan |
kyounko: reducing initial page render time (whether intentionally or because of bad architecture), javascript-first developers, getting better information on what people are actually reading |
03:50
🔗
|
ivan |
the javascript developer rationale goes something like "we'll need some of these dynamic loading features anyway, might as well require javascript to avoid duplicating this rendering logic on the server" |
03:58
🔗
|
|
odemg has joined #archiveteam-bs |
03:59
🔗
|
|
closure has quit IRC (Read error: Operation timed out) |
04:04
🔗
|
|
closure has joined #archiveteam-bs |
05:00
🔗
|
|
closure has quit IRC (Read error: Connection reset by peer) |
05:00
🔗
|
|
closure_ has joined #archiveteam-bs |
05:28
🔗
|
|
Odd0002_ has joined #archiveteam-bs |
05:33
🔗
|
|
Odd0002 has quit IRC (Read error: Operation timed out) |
05:33
🔗
|
|
Odd0002_ is now known as Odd0002 |
06:00
🔗
|
|
closure_ has quit IRC (Read error: Connection reset by peer) |
08:04
🔗
|
kyounko |
ivan: would it be insane to do things like the 1995 web? that still works |
08:05
🔗
|
kyounko |
i haven't been to craigslist in years, but in 2014 it seemed "ancient" |
09:32
🔗
|
|
BlueMax has quit IRC (Quit: Leaving) |
11:00
🔗
|
|
REiN^ has joined #archiveteam-bs |
11:05
🔗
|
|
plue has quit IRC (Quit: leaving) |
11:06
🔗
|
|
plue has joined #archiveteam-bs |
12:15
🔗
|
|
Mateon1 has quit IRC (Ping timeout: 633 seconds) |
12:18
🔗
|
|
Mateon1 has joined #archiveteam-bs |
12:36
🔗
|
moufu |
adinbied: https://github.com/alard/wget-lua/wiki/Wget-with-Lua-hooks#download_child_p |
12:37
🔗
|
moufu |
adinbied: something like if not verdict then io.stdout:write(urlpos["url"]["url"].." rejected ("..reason..")\n"); io.stdout:flush() end might work, haven't tested though |
13:25
🔗
|
|
closure has joined #archiveteam-bs |
14:00
🔗
|
|
closure has quit IRC (Read error: Connection reset by peer) |
14:05
🔗
|
|
closure has joined #archiveteam-bs |
14:25
🔗
|
joepie91_ |
kyounko: there's a whole new generation of 'web developers' who genuinely do not understand what a browser can do out of the box, and who believe that writing 'frontend JS' is the only way to build a website today |
14:25
🔗
|
joepie91_ |
and that is not an exaggeration |
14:25
🔗
|
joepie91_ |
I have to talk people off this ledge on an almost-daily basis |
14:25
🔗
|
joepie91_ |
it's fucking infuriating |
14:26
🔗
|
kiska |
It makes archiving... _difficult_ |
14:26
🔗
|
joepie91_ |
not just that |
14:26
🔗
|
joepie91_ |
it makes everything worse |
14:26
🔗
|
joepie91_ |
contrary to popular belief, such JS-heavy sites are actually *considerably* slower than plain HTML and CSS with forms and links |
14:26
🔗
|
joepie91_ |
because you're breaking half the browser's optimizations |
14:27
🔗
|
joepie91_ |
it also typically breaks the browser's standard behaviour and controls (and why wouldn't it? new generation of web devs doesn't even know they *exist*) |
14:27
🔗
|
joepie91_ |
effectively reimplementing half a browser client-side, poorly |
15:00
🔗
|
|
closure has quit IRC (Ping timeout: 252 seconds) |
15:01
🔗
|
|
closure has joined #archiveteam-bs |
15:06
🔗
|
|
plue has quit IRC (Quit: leaving) |
15:07
🔗
|
|
plue has joined #archiveteam-bs |
15:42
🔗
|
|
Sanky is now known as Sanqui |
16:00
🔗
|
|
closure has quit IRC (Ping timeout: 268 seconds) |
16:18
🔗
|
|
closure has joined #archiveteam-bs |
16:32
🔗
|
adinbied |
I'm still quite unfamiliar with Lua - I'm still wanting to learn more at some point, but for the moment, moufu, would you be able to create some sort of implementation? |
17:05
🔗
|
|
closure has quit IRC (Read error: Connection reset by peer) |
17:08
🔗
|
|
closure_ has joined #archiveteam-bs |
17:48
🔗
|
moufu |
adinbied: https://0x0.st/s36_.diff |
17:53
🔗
|
|
plue has quit IRC (Quit: leaving) |
17:59
🔗
|
|
closure_ has quit IRC (Read error: Connection reset by peer) |
18:23
🔗
|
|
wp494 has quit IRC (Read error: Operation timed out) |
18:24
🔗
|
|
wp494 has joined #archiveteam-bs |
18:45
🔗
|
adinbied |
moufu, that didn't do it either - it's still not grabbing images/site resources |
18:49
🔗
|
|
closure has joined #archiveteam-bs |
18:53
🔗
|
|
schbirid has joined #archiveteam-bs |
18:59
🔗
|
|
closure has quit IRC (Ping timeout: 246 seconds) |
19:01
🔗
|
moufu |
what site are you testing it on? |
19:03
🔗
|
moufu |
it seems to work when I run wget-lua with the arguments from pipeline.py manually (not sure how to test the pipeline itself since it returns tracker error) on some random sites found on google |
19:06
🔗
|
|
plue has joined #archiveteam-bs |
19:06
🔗
|
adinbied |
So I've got a tracker dev-env VM running with the items as defined here: https://github.com/adinbied/angelfire-items |
19:08
🔗
|
adinbied |
Maybe it's something to do with the way the items and queue are being handled in my particular case? |
19:15
🔗
|
|
tuluu_ has quit IRC (Read error: Connection refused) |
19:16
🔗
|
|
tuluu has joined #archiveteam-bs |
19:17
🔗
|
moufu |
okay I can reproduce it when wrabbing from sitemap.xml |
19:19
🔗
|
moufu |
oh I might know what the problem is |
19:20
🔗
|
moufu |
yeah works now, forgot to pass link_expect_html |
19:20
🔗
|
moufu |
https://0x0.st/s3IO.diff |
19:21
🔗
|
moufu |
adinbied: ↑ |
19:22
🔗
|
adinbied |
Aha! |
19:22
🔗
|
adinbied |
Thank you so much! I'm still learning as I go, and was getting hung up on why it just wouldn't work |
19:23
🔗
|
adinbied |
Thanks again for the help moufu! It's greatly appreciated! |
19:25
🔗
|
moufu |
np, I should've remembered html pages don't get parsed without that option |
19:46
🔗
|
|
closure_ has joined #archiveteam-bs |
20:01
🔗
|
|
closure_ has quit IRC (Read error: Connection reset by peer) |
20:03
🔗
|
|
schbirid has quit IRC (Remote host closed the connection) |
21:56
🔗
|
|
closure has joined #archiveteam-bs |
22:01
🔗
|
|
closure has quit IRC (Read error: Connection reset by peer) |
22:01
🔗
|
|
closure_ has joined #archiveteam-bs |
22:04
🔗
|
|
zhongfu has quit IRC (Quit: No Ping reply in 180 seconds.) |
22:05
🔗
|
|
zhongfu has joined #archiveteam-bs |
22:18
🔗
|
|
BlueMax has joined #archiveteam-bs |
22:32
🔗
|
|
closure_ has quit IRC (Read error: Connection reset by peer) |
22:34
🔗
|
|
closure has joined #archiveteam-bs |
22:37
🔗
|
|
Stiletto has quit IRC (Read error: Operation timed out) |
23:00
🔗
|
|
closure_ has joined #archiveteam-bs |
23:00
🔗
|
|
closure has quit IRC (Read error: Connection reset by peer) |
23:33
🔗
|
|
closure_ has quit IRC (Read error: Connection reset by peer) |
23:38
🔗
|
|
closure has joined #archiveteam-bs |